Study in Big Data Harnessing and Related Problems
Date
Authors
ORCID
Journal Title
Journal ISSN
Volume Title
Publisher
item.page.doi
Abstract
Social networks, such as Facebook and Twitter, have provided incredible opportunities for social communication between web users around the world. Social network analysis is an important problem in data harnessing. The analysis of social networks helps summarizing the interests and opinions of users (nodes), discovering patterns from the interactions (edges) between users, and mining the events that take place in online platforms. The information obtained by analyzing social networks could be especially valuable for many applications. Some typical examples include online advertisement targeting, viral marketing, personalized recommendation, health social media, social influence analysis, and citation network analysis. In this dissertation, we study two types of applications emerging from modern online social platforms in the view of social influence. One is influence maximization(IM) problem from a discount-based online viral marketing scenario, which aims at maximizing influence in the adoption of target products, and the other is online rumor source detection problem, in which the spread of misinformation is supposed to be minimized and the source is expected to be detected. We formulate them as set function optimization problems and design solutions with performance guarantees. In study of set function optimization, there is a challenge coming from the submodularity of objective function. That is, some of the practical problems are not submodular or supermodular, the existing greedy strategy cannot be directly applied to problems to get a guaranteed approximate solution. To solve those non-submodular and non-supermodular problems, one method called DS decomposition has been considered, in which given a set function, we decompose it to be representable as a difference between submodular functions. Based on this method, we further study a problem about how to find a DS decomposition efficiently and effectively. Then we propose a generalized framework that is made up of our novel algorithms under deterministic version and random version respectively to solve maximization of DS decomposition and show their performances under various combinatorial settings. In addition, we discuss our findings on the role of black-box, that has been an important component in study of computational complexity theory as well as has been used for establishing the hardness of problems, about its implied power and limitations in study of data-driven computation for proving solutions to some computational problems.