Phase Transition of Community Detection Under Efficient Algorithms, Expressive Generative Models, and Confidentiality Constraints

dc.contributor.advisorNosratinia, Aria
dc.contributor.advisorWalker, Amy V.
dc.contributor.committeeMemberFaragó, András
dc.contributor.committeeMemberBusso-Recabarren, Carlos A.
dc.contributor.committeeMemberKehtarnavaz, Nasser
dc.creatorEsmaeili, Mohammad 1991-
dc.date.accessioned2024-03-14T19:18:33Z
dc.date.available2024-03-14T19:18:33Z
dc.date.created2022-12
dc.date.issuedDecember 2022
dc.date.submittedDecember 2022
dc.date.updated2024-03-14T19:18:34Z
dc.description.abstractWe formulate a semi-definite relaxation for the maximum likelihood estimation of node labels, subject to observing both graph and non-graph data. This formulation is distinct from the semidefinite programming solution of standard community detection, but maintains its desirable properties. We calculate the exact recovery threshold for three types of non- graph information, which are called side information: partially revealed labels, noisy labels, as well as multiple observations (features) per node with arbitrary but finite cardinality. We find that semidefinite programming has the same exact recovery threshold in the presence of side information as maximum likelihood with side information. Empirical observations suggest that in practice, community membership does not completely explain the dependency between the edges of an observation graph. The residual dependence of the graph edges are modeled in this dissertation, to first order, by auxiliary node latent variables that affect the statistics of the graph edges but carry no information about the communities of interest. We then study community detection in graphs obeying the stochastic block model and censored block model with auxiliary latent variables. We analyze the conditions for exact recovery when these auxiliary latent variables are unknown, representing unknown nuisance parameters or model mismatch. We also analyze exact recovery when these secondary latent variables have been either fully or partially revealed. Finally, we propose a semidefinite programming algorithm for recovering the desired labels when the secondary labels are either known or unknown. We show that exact recovery is possible by semidefinite programming down to the respective maximum likelihood exact recovery threshold. Releasing graph structures containing nodes with multiple latent variables might cause privacy issues and confidential information leakage of the users. This dissertation investigates the confidentiality in community detection in networks with multiple latent variables. Focusing on stochastic block model and censored block model with multiple latent variables, we address the leakage of confidential information by changing the connectivity of nodes. To this end, we first propose a new metric for evaluation of confidentiality based on Chernoff- Hellinger divergence. An optimization is introduced to minimize the required changes on the edges of the graph realization.
dc.format.mimetypeapplication/pdf
dc.identifier.uri
dc.identifier.urihttps://hdl.handle.net/10735.1/10059
dc.language.isoEnglish
dc.subjectEngineering, Electronics and Electrical
dc.titlePhase Transition of Community Detection Under Efficient Algorithms, Expressive Generative Models, and Confidentiality Constraints
dc.typeThesis
dc.type.materialtext
local.embargo.lift2023-12-01
local.embargo.terms2023-12-01
thesis.degree.collegeSchool of Engineering and Computer Science
thesis.degree.departmentElectrical Engineering
thesis.degree.grantorThe University of Texas at Dallas
thesis.degree.namePHD

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ESMAEILI-PRIMARY-2022.pdf
Size:
16.21 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
proquest_license.txt
Size:
6.38 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Plain Text
Description: