Towards Algorithmic Accountability in Data Mining




Journal Title

Journal ISSN

Volume Title



Increasingly, machine learning and data mining techniques are being used in making crucial decisions in our daily lives ranging from credit card approvals to employment decisions. Despite the existing effort of explaining their underlying reasoning, data mining models and their output appear as opaque black boxes—accessible only to experts with years of training and development experiences. As data mining techniques gain growing popularity in the real world, accountability becomes increasingly important and critical, especially when these techniques are applied in areas such as criminal justice and law enforcement.

In this dissertation, we focus on four desired aspects of accountability in data mining. These aspects are: reliability, discrimination-awareness, transparency, and privacy. We first investigate the reliability of using data mining techniques to predict various individual traits. We measure the impact of various adversarial attacks on the prediction accuracy of data mining models. We also propose countermeasures that can reduce the effectiveness of these attacks. Secondly, we develop two techniques to measure discrimination of a black-box model (i.e., without knowing the data mining model details) as a result of data bias or algorithmic weakness. Data bias is investigated further by introducing artificial bias to the dataset under consideration. After that, we develop transparency models to help unmask the incomprehensible reasoning made by any data mining/machine learning models. We look into transparency of both white-box (i.e., when we know the model details) and black-box machine learning models. For white-box transparency, we propose an Instance-based Transparency model (IT) that provides simple explanations by using a novel rule selection technique. For the black-box transparency, we introduce the Reverse Engineering Approximate Learning (REAL) Model that outputs a decision tree to interpret any black-box classifier. Finally, we extensively study the trade-off between privacy and transparency. We introduce a novel privacy model that can prevent the inference of individuals’ sensitive information when disclosing a transparency model.



Liability (Law), Transparency, Privacy, Data mining, Machine learning


©2018 The Author. Digital access to this material is made possible by the Eugene McDermott Library. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.