Towards Algorithmic Accountability in Data Mining
Alufaisan, Yasmeen Mansour
MetadataShow full item record
Increasingly, machine learning and data mining techniques are being used in making crucial decisions in our daily lives ranging from credit card approvals to employment decisions. Despite the existing eﬀort of explaining their underlying reasoning, data mining models and their output appear as opaque black boxes—accessible only to experts with years of training and development experiences. As data mining techniques gain growing popularity in the real world, accountability becomes increasingly important and critical, especially when these techniques are applied in areas such as criminal justice and law enforcement. In this dissertation, we focus on four desired aspects of accountability in data mining. These aspects are: reliability, discrimination-awareness, transparency, and privacy. We ﬁrst investigate the reliability of using data mining techniques to predict various individual traits. We measure the impact of various adversarial attacks on the prediction accuracy of data mining models. We also propose countermeasures that can reduce the eﬀectiveness of these attacks. Secondly, we develop two techniques to measure discrimination of a black-box model (i.e., without knowing the data mining model details) as a result of data bias or algorithmic weakness. Data bias is investigated further by introducing artiﬁcial bias to the dataset under consideration. After that, we develop transparency models to help unmask the incomprehensible reasoning made by any data mining/machine learning models. We look into transparency of both white-box (i.e., when we know the model details) and black-box machine learning models. For white-box transparency, we propose an Instance-based Transparency model (IT) that provides simple explanations by using a novel rule selection technique. For the black-box transparency, we introduce the Reverse Engineering Approximate Learning (REAL) Model that outputs a decision tree to interpret any black-box classiﬁer. Finally, we extensively study the trade-oﬀ between privacy and transparency. We introduce a novel privacy model that can prevent the inference of individuals’ sensitive information when disclosing a transparency model.