Scalable Learning Approaches for Sum-Product-Cutset Networks
MetadataShow full item record
Tractable models are a subclass of probabilistic graphical models (PGMs) in which exact inference can be performed tractably – a very desirable property missing from arbitrary PGMs like Bayesian and Markov networks – exact inference is a # P-complete problem in general PGMs. Thin junction trees, mixtures of trees, tractable Markov networks, and sum-product networks are some of the well-studied and popular tractable PGMs deployed in many practical application domains. In spite of their success in performing fast exact inference, these models often suffer from low expressivity and very slow learning time. Thus inventing new and improved tractable PGMs as well as developing efficient algorithms for learning them from data has become one of the core research topics in unsupervised density estimation using probabilistic graphical models. In this dissertation, we propose cutset networks (CNs) – a new class of tractable probabilistic models for representing multidimensional joint distributions. CNs are rooted OR search trees, in which each internal node (OR node) represents conditioning of a variable in the model, with tree distributions (tree Bayesian or Markov networks) at the leaves – a simple tractable PGM that can be learned in polynomial time using the Chow-Liu algorithm. Leveraging vast amount of research on decision tree induction, we present efficient algorithms for learning CNs and their ensembles from data. Sum-product networks (SPNs) have recently emerged as deep tractable probabilistic architectures which combine latent variable mixtures and model decomposition in an alternating fashion with univariate distributions at the leaves and have shown impressive performances in several important application domains including computer vision and natural language processing. Inspired the expressive power of SPNs and the efficient learnability of CNs, we combine the two and propose sum-product-cutset networks (SPCNs). Our experimental evaluations on a large variety of high-dimensional real world benchmark datasets proved that SPCNs are not only faster to learn than SPNs but also superior in accurately modeling the underlying distribution than CNs and SPNs. We proposed several novel and efficient post-processing optimization approaches like pruning and merging which not only reduce the variance of SPCNs that overfit to the training data but also increases the speed of prediction by learning a more compact model. Finally, we propose hybrid SPCNs (HSPCNs) – SPCNs with the ability to model distributions over continuous domains. HSPCNs model the sum-product space using discrete variables (latent or observed) and use tree structured conditional linear Gaussians to model distributions at the leaves with continuous variables. Most real world application domains contain both discrete and continuous variables, and HSPCNs enable an application designer to develop rich density estimators in such domains.