Sarkar, Sumit

Permanent URI for this collection

Dr. Sumit Sarkar is Professor of Information Systems and holds the Charles and Nancy Davidson Chair. "Sarkar's expertise lies in modeling uncertainty in databases and knowledge bases, and devising efficient ways to automate reasoning under uncertain conditions. His fascination with uncertainty modeling has led to publications in a wide variety of applications. For example, his research on personalization demonstrates how a website can quickly infer profiles of customers based on their browsing history, which can in turn be used to improve recommendations in e-commerce environments."


Recent Submissions

Now showing 1 - 2 of 2
  • Item
    Privacy and Big Data: Scalable Approaches to Sanitize Large Transactional Databases for Sharing
    (Management Information Systems Research Center, University of Minnesota) Menon, Syam; Sarkar, Sumit; Menon, Syam; Sarkar, Sumit
    Scalability and privacy form two critical dimensions that will eventually determine the extent of the success of big data analytics. We present scalable approaches to address privacy concerns when sharing transactional databases. Although the benefits of sharing are well documented and the number of firms sharing transactional data has increased over the years, the rate at which this number has grown is not quite what it could have been. Concerns about revealing proprietary information have prevented some retailers from sharing, despite the obvious advantages in an increasingly networked economy. In the context of sharing transactional data, sensitive information is typically based on relationships derived from frequently occurring itemsets, result of surprisingly successful promotions by the retailer, or unexpected relationships identified by the retailer while mining the data. Prior work in this area includes optimal approaches based on integer programming to maximize the accuracy of shared databases, while hiding all sensitive itemsets. While these approaches were shown to solve problems involving up to 10 million transactions, many transactional databases in the big data context are considerably larger and the existing integer programming-based procedures do not scale well enough to solve these larger problems. Consequently, there is no effective solution procedure for such databases in extant literature. In this paper, we first present an optimal procedure leveraging intuition from linear programming based column generation. Next, we identify a common structure that exists in these problems, and show how it can be taken advantage of through an approach based on sorting and column generation to make the process more efficient. We then illustrate how this structure can be incorporated into the column generation based procedure to develop an effective, scalable heuristic. Computational experiments are conducted on databases with 50 million and 100 million transactions, involving problems that could not be solved using existing optimal procedures. These experiments show that the optimal column generation based procedure can solve problem instances significantly larger than those tackled previously, and that the scalable heuristic identifies nearoptimal solutions quickly in all instances where the optimal solution is known. We investigate the impact of hiding sensitive itemsets on the quality of a rule-based recommender system derived from the shared data. As expected, recommendation quality decreases as the number of sensitive itemsets increases; however, recommendation accuracy stays above 80% of the original rate when using the unmodified data even when there are 1,000 sensitive itemsets to hide. The effect on recommendation accuracy from using the heuristic relative to the optimal approach was very small: the accuracies with the heuristic were over 97% of the corresponding accuracies with the optimal approach in every experiment, and over 99% in the vast majority.
  • Item
    Selling vs. Profiling: Optimizing the Offer Set in Web-Based Personalization
    (Institute for Operations Research and the Management Sciences) Johar, M.; Mookerjee, Vijay S.; Sarkar, Sumit; 90649574‏ (Mookerjee, VS)
    We study the problem of optimally choosing the composition of the offer set for firms engaging in web-based personalization. A firm can offer items or links that are targeted for immediate sales based on what is already known about a customer's profile. Alternatively, the firm can offer items directed at learning a customer's preferences. This, in turn, can help the firm make improved recommendations for the remainder of the engagement period with the customer. An important decision problem faced by a profit maximizing firm is what proportion of the offer set should be targeted toward immediate sales and what proportion toward learning the customer's profile. We study the problem as an optimal control model, and characterize the solution. Our findings can help firms decide how to vary the size and composition of the offer set during the course of a customer's engagement period with the firm. The benefits of the proposed approach are illustrated for different patterns of engagement, including the length of the engagement period, uncertainty in the length of the period, and the frequency of the customer's visits to the firm. We also study the scenario where the firm optimizes the size of the offer set during the planning horizon. One of the most important insights of this study is that frequent visits to the firm's website are extremely important for an e-tailing firm even though the customer may not always buy products during these visits.

Works in Treasures @ UT Dallas are made available exclusively for educational purposes such as research or instruction. Literary rights, including copyright for published works held by the creator(s) or their heirs, or other third parties may apply. All rights are reserved unless otherwise indicated by the copyright owner(s).