Community Discovery in Complex (and Dynamic) Networks

In our everyday life, we are used to picture social relations - as well as countless other interactions - as mutable, dynamic and often short-term bonds. Research studies have extensively shown that networks inferred from social ties can be used to observe, characterize and forecast different aspects of human activities (such as predict new connections [1] or the usage of online communication services [2]) and that, in order to correctly describe the social phenomena we study, it is often mandatory to consider them using different granularity of temporal abstraction. Indeed, thanks to the explosion of human generated data, the temporal dimension has started playing a fundamental role for both the modeling and analysis of phenomena that can be described using complex networks.
In order to cope with the need for more accurate analysis of evolving social phenomena, during the last years, at the CNR-ISTI Knowledge Discovery and Data Mining Laboratory (KDD Lab) we focused our attention on the study of dynamic networks.

Among all the network related tasks Community Discovery is universally considered as one of the most challenging. Indeed, countless algorithms have been proposed to identify communities on static networks (such as our DEMON [3]), i.e. networks that do not change in time: however, real world phenomena are often dynamic (consider for instance the interactions described by call graphs or online social networks). When analyzing such scenarios, static community discovery algorithms fail to identify partitions semantically consistent with the temporal information expressed by the data.
To cope with this limitation we defined a novel approach TILES [4] that, working on a fully dynamic representation of social graphs - modeled as edge streams in which nodes and edges among them can appear and vanish as time goes by - enables the identification of both evolving communities and the events that describe their lifecycle (birth/merges/splits/death). Our algorithm, differently from the ones proposed so far (for which we provide a classification and review in [5]), does not discretize the dynamic problem into an ordered sequence of static ones, conversely TILES is designed to work on atomic local perturbations of the dynamic network (i.e. node/edge appearance/vanishing) using them to continuously update the identified communities. Such design choice solves two major issues introduced by previous approaches: (i) it avoids the need to identify meaningful temporal annotated network snapshots (which is a context dependent problem), and (ii) it allows to track community dynamics across time without adopting matching strategies.
We showed that our approach is able to outperform state of art competitors in providing interpretable dynamics of overlapping communities on real world case studies built upon communication networks having different semantics (call graphs, Facebook comments, Sina Weibo direct messages).

So far one of the most adopted strategy to evaluate community discovery algorithms has involved the comparison of the topologies they are able to identify with ground truth ones. Given the lack of large real world datasets having annotated ground truth partitions, in the last decade several network generators where defined so to guarantee controlled environments for evaluation purposes.
Since Dynamic Community Discovery is a relatively novel, growing, task in the complex network analysis field, we also designed an ad-hoc benchmark to assess the quality of dynamic partitions. In [6,7] we proposed our own generator, RDyn, able to generate dynamic synthetic graphs mimicking both real world characteristics (power law distribution for both degree and community sizes, low average shortest path, high local clustering coefficient) as well as community dynamics. This latter peculiarity makes RDyn, to the best of our knowledge, one of the first generative models explicitly designed to evaluate dynamic community discovery approaches.

[1] Rossetti, Giulio; Guidotti, Riccardo; Miliou, Ioanna; Pedreschi, Dino; Giannotti, Fosca
"A Supervised Approach for Intra-/Inter-Community Interaction Prediction in Dynamic Social Networks", Social Network Analysis and Mining, 2016. (Open Access)

[2] Rossetti, Giulio; Pappalardo, Luca; Kikas, Riivo; Pedreschi, Dino; Giannotti, Fosca; Dumas, Marlon "Homophilic network decomposition: a community-centric analysis of online social services", Social Network Analysis and Mining, 2016. (Open Access)

[3] Coscia, Michele; Rossetti, Giulio; Giannotti, Fosca; Pedreschi, Dino "Uncovering Hierarchical and Overlapping Communities with a Local-First Approach" ACM Transactions on Knowledge Discovery from Data (TKDD), 9 (1), 2014.

[4] Rossetti, Giulio; Pappalardo, Luca; Pedreschi, Dino; Giannotti, Fosca "Tiles: an online algorithm for community discovery in dynamic social networks" Machine Learning Journal, 2016. (Open Access)

[5] Rossetti, Giulio; Cazabet, Rémy "Community Discovery in Dynamic Networks: A Survey" ACM Computing Surveys (submitted)

[6] Rossetti, Giulio; Pappalardo, Luca; Rinzivillo, Salvatore "A novel approach to evaluate community detection algorithms on ground truth" 7th Workshop on Complex Networks, Springer-Verlag, 2016.

[7] Rossetti, Giulio "RDyn: Graph Benchmark handling Community Dynamics" Journal of Complex Networks (submitted)