Spectral Algorithms for Semi-Supervised Community Detection
Session Number
Project ID: CMPS 13
Advisor(s)
Dr. Julia Gaudio; Northwestern University, Department of Industrial Engineering and Management Sciences
Discipline
Computer Science
Start Date
19-4-2023 9:20 AM
End Date
19-4-2023 9:35 AM
Abstract
Community detection is the problem of identifying groups or clusters of nodes in a network that are more densely connected to each other than to the other clusters. This is an important problem in network analysis and has applications in a wide range of fields, including social networks, biology, and computer science. In 1983, the Stochastic Block Model (SBM) was introduced as a probabilistic model for generating clustered networks. Since then, the SBM has served as a testbed for community detection algorithms and has enhanced our understanding of the fundamental limits of community detection. In many community detection applications, such as tracking protein connections in biology, connecting similar hyperlinks on search engines, or keeping track of friendships on social networks, we may already have prior knowledge of community labels. Such scenarios are modeled by the Semi-Supervised SBM, a model which has received limited attention despite its practical importance. In this work, we investigate efficient algorithms for community detection in the Semi-Supervised SBM under two information models: the erasure model (the erasure model removes edges in a network with a certain probability, then uses the resulting subgraph to infer the community structure) and the flip model (flips edges in a network with a certain probability, rather than removing them). Specifically, we focus on spectral algorithms (a type of algorithm that operates on the spectral properties of a given data set or problem, using eigenvalues or eigenvectors) due to their efficiency. The spectral algorithms first compute the adjacency matrix representation of the given network and find its leading eigenvectors. Nodes are embedded into a low-dimensional space using the leading eigenvectors, then clustered based on proximity. We investigate the empirical performance of spectral algorithms in the erasure and flip models. The results validate the theoretical properties of the algorithms, demonstrating their even for moderately sized networks. Our results suggest that spectral algorithms correctly leverage prior knowledge of community memberships and are promising for use in practical applications.
Spectral Algorithms for Semi-Supervised Community Detection
Community detection is the problem of identifying groups or clusters of nodes in a network that are more densely connected to each other than to the other clusters. This is an important problem in network analysis and has applications in a wide range of fields, including social networks, biology, and computer science. In 1983, the Stochastic Block Model (SBM) was introduced as a probabilistic model for generating clustered networks. Since then, the SBM has served as a testbed for community detection algorithms and has enhanced our understanding of the fundamental limits of community detection. In many community detection applications, such as tracking protein connections in biology, connecting similar hyperlinks on search engines, or keeping track of friendships on social networks, we may already have prior knowledge of community labels. Such scenarios are modeled by the Semi-Supervised SBM, a model which has received limited attention despite its practical importance. In this work, we investigate efficient algorithms for community detection in the Semi-Supervised SBM under two information models: the erasure model (the erasure model removes edges in a network with a certain probability, then uses the resulting subgraph to infer the community structure) and the flip model (flips edges in a network with a certain probability, rather than removing them). Specifically, we focus on spectral algorithms (a type of algorithm that operates on the spectral properties of a given data set or problem, using eigenvalues or eigenvectors) due to their efficiency. The spectral algorithms first compute the adjacency matrix representation of the given network and find its leading eigenvectors. Nodes are embedded into a low-dimensional space using the leading eigenvectors, then clustered based on proximity. We investigate the empirical performance of spectral algorithms in the erasure and flip models. The results validate the theoretical properties of the algorithms, demonstrating their even for moderately sized networks. Our results suggest that spectral algorithms correctly leverage prior knowledge of community memberships and are promising for use in practical applications.