A Generalization of the Community Detection Problem via Side Information
Metwaly Saad, Hussein
MetadataShow full item record
The standard community detection consists of observing a graph and detecting its node labels. However, many practical problems offer further information about the individual node labels, which we denote side information. For example, social networks such as Facebook and Twitter have access to information other than the graph edges, such as age, gender, etc. This dissertation aims to understand when and by how much can side information change the fundamental limits of the community detection problem, devise efficient algorithms for it, and study the asymptotic performance of these algorithms. In the context of the binary symmetric and single-community stochastic block models for the graph, we introduce a model for side information that conveniently captures the variation of its quantity and quality as the size of the graph grows. For the binary symmetric stochastic block model under side information, we characterize tight necessary and sufficient conditions for exact recovery. An efficient, asymptotically optimal two-stage algorithm is introduced for recovery. Furthermore, the analysis of phase transition is extended to continuous-valued side information. For the single community stochastic block model, we characterize tight necessary and sufficient conditions for weak and exact recovery. An efficient belief propagation algorithm under side information is proposed. A weak recovery phase transition for belief propagation is characterized when both quality and quantity are fixed. When quality of side information varies with graph size, sufficient conditions for weak recovery are established. Sufficient conditions for belief propagation to achieve exact recovery are derived. The extrinsic information transfer (EXIT) method is applied to the analysis of belief propagation in community detection with side information, providing insights such as the asymptotic threshold for weak recovery, the performance of belief propagation near the optimal threshold, and the performance of belief propagation through the first few iterations.