Variational Inference Methods for Continuous Probabilistic Graphical Models
Graphical models provide a general framework for representing and reasoning about data. Once these models are fit to data, they can be used to answer statistical queries about the observed data. Unfortunately these answering these queries, or performing inference, is NP hard in general. To tackle this problem, many approximate inference methods have been proposed. Belief propagation, a widely used inference method for probabilistic graphical models, is exact on tree-structured models but does not guarantee convergence or correctness on general graphs. Alternative approaches based on variational inference have also been proposed. Variational inference methods start by approximating the intractable model with a more friendly surrogate model by minimizing the KL divergence between the surrogate and original models. The marginals from the surrogate model can then be treated as approximations to the intractable marginals of original model. However the performance of variational inference methods can be highly dependent on the surrogate model and performance can be terrible if surrogate model cannot well approximate the original model. Another drawback of variational inference methods is that extra approximations are necessary in some cases to make the approach computationally feasible. Also it is very hard to prove any theoretical guarantees for these methods. In this work, we propose a new variational inference method that adopts a mixture of independent distributions as our surrogate model. Instead of minimizing the KL divergence between surrogate and original model, we propose to maximize the Bethe free energy with respect to the surrogate marginals standard optimization strategies. Our method has many advantages compared to existing methods and is provably correct under certain conditions. We demonstrate the superior performance of our method on a variety of large-scale real world problems, where we show that not only can our method can achieve better results on these tasks compared existing state-of-the-art methods, but it can also be implemented efficiently at scale.