Kantarcioglu, Murat
Permanent URI for this collectionhttps://hdl.handle.net/10735.1/2796
Dr. Murat Kantarcioglu is an Associate Professor in Computer Science. His research focuses on "creating technologies that can efficiently extract useful information from any data without sacrificing privacy or security." His interests are:
- Data Security
- Data Privacy
- Privacy-preserving data mining
- Databases
- Cloud computing
ORCID page
News
Elected 2020 fellow of the American Association for the Advancement of Science (AAAS) for his "distinguished contributions to the field of secure and privacy-preserving data storage, querying and mining, and adversarial machine learning."
Browse
Recent Submissions
Item Hybrid Private Record Linkage: Separating Differentially Private Synopses from Matching Records(Association for Computing Machinery, 2019-04-26) Rao, F. -Y; Cao, J.; Bertino, E.; Kantarcioglu, Murat; 0000-0001-6423-4533 (Kantarcioglu, M); Kantarcioglu, MuratPrivate record linkage protocols allow multiple parties to exchange matching records, which refer to the same entities or have similar values, while keeping the non-matching ones secret. Conventional protocols are based on computationally expensive cryptographic primitives and therefore do not scale. To address these scalability issues, hybrid protocols have been proposed that combine differential privacy techniques with secure multiparty computation techniques. However, a drawback of such protocols is that they disclose to the parties both the matching records and the differentially private synopses of the datasets involved in the linkage. Consequently, differential privacy is no longer always satisfied. To address this issue, we propose a novel framework that separates the private synopses from the matching records. The two parties do not access the synopses directly, but still use them to efficiently link records. We theoretically prove the security of our framework under the state-of-the-art privacy notion of differential privacy for record linkage (DPRL). In addition, we develop a simple but effective strategy for releasing private synopses. Extensive experimental results show that our framework is superior to the existing methods in terms of efficiency. © 2019 Association for Computing Machinery.Item Determining the Impact of Missing Values on Blocking in Record Linkage(Springer Verlag, 2019-03-20) Anindya, Imrul Chowdhury; Kantarcioglu, Murat; Malin, B.; 0000-0001-6423-4533 (Kantarcioglu, M); Anindya, Imrul Chowdhury; Kantarcioglu, MuratRecord linkage is the process of integrating information from the same underlying entity across disparate data sets. This process, which is increasingly utilized to build accurate representations of individuals and organizations for a variety of applications, ranging from credit worthiness assessments to continuity of medical care, can be computationally intensive because it requires comparing large quantities of records over a range of attributes. To reduce the amount of computation in record linkage in big data settings, blocking methods, which are designed to limit the number of record pair comparisons that needs to be performed, are critical for scaling up the record linkage process. These methods group together potential matches into blocks, often using a subset of attributes before a final comparator function predicts which record pairs within the blocks correspond to matches. Yet data corruption and missing values adversely influence the performance of blocking methods (e.g., it may cause some matching records not to be placed in the same block). While there has been some investigation into the impact of missing values on general record linkage techniques (e.g., the comparator function), no study has addressed the impact of the missing values on blocking methods. To address this issue, in this work, we systematically perform a detailed empirical analysis of the individual and joint impact of missing values and data corruption on different blocking methods using realistic data sets. Our results show that blocking approaches that do not depend on one type of blocking attributes are more robust against missing values. In addition, our results indicate that blocking parameters must be chosen carefully for different blocking techniques. © Springer Nature Switzerland AG 2019.Item Towards a Privacy-Aware Quantified Self Data Management Framework(Association for Computing Machinery) Thuraisingham, Bhavani M.; Kantarcioglu, Murat; Bertino, E.; Bakdash, Jonathan Z.; Fernandez, M.; Thuraisingham, Bhavani M.; Kantarcioglu, Murat; Bakdash, Jonathan Z.Massive amounts of data are being collected, stored, and analyzed for various business and marketing purposes. While such data analysis is critical for many applications, it could also violate the privacy of individuals. This paper describes the issues involved in designing a privacy aware data management framework for collecting, storing, and analyzing the data. We also discuss behavioral aspects of data sharing as well as aspects of a formal framework based on rewriting rules that encompasses the privacy aware data management framework. ©2018 Association for Computing Machinery.Item SmartProvenance: A Distributed, Blockchain Based Data Provenance System(Association for Computing Machinery, Inc) Ramachandran, Aravind; Kantarcioglu, Murat; 0000-0001-6423-4533 (Kantarcioglu, M); 305367293 (Kantarcioglu, M); Ramachandran, Aravind; Kantarcioglu, MuratBlockchain technology has evolved from being an immutable ledger of transactions for cryptocurrencies to a programmable interactive environment for building distributed reliable applications. Although the blockchain technology has been used to address various challenges, to our knowledge none of the previous work focused on using Blockchain to develop a secure and immutable scientific data provenance management framework that automatically verifies the provenance records. In this work, we leverage Blockchain as a platform to facilitate trustworthy data provenance collection, verification, and management. The developed system utilizes smart contracts and open provenance model (OPM) to record immutable data trails. We show that our proposed framework can securely capture and validate provenance data that prevents any malicious modification to the captured data as long as the majority of the participants are honest. ©2018 Copyright held by the owner/author(s). Publication rights licensed to Association for Computing Machinery.Item Integrating Cyber Security and Data Science for Social Media: A Position Paper(Institute of Electrical and Electronics Engineers Inc.) Thuraisingham, Bhavani M.; Kantarcioglu, Murat; Khan, Latifur; 0000-0001-6423-4533 (Kantarcioglu, M); 51867299 (Thuraisingham, BM); 305367293 (Kantarcioglu, M); 51656251 (Khan, L); Thuraisingham, Bhavani M.; Kantarcioglu, Murat; Khan, LatifurCyber security and data science are two of the fastest growing fields in Computer Science and more recently they are being integrated for various applications. This position paper will review the developments in applying Data science for cyber security and cyber security for data science and then discuss the applications in Social Media.Item Forecasting Bitcoin Price with Graph Chainlets(Springer Verlag) Akcora, Cuneyt G.; Dey, Asim Kumer; Gel, Yulia R.; Kantarcioglu, Murat; Akcora, Cuneyt G.; Dey, Asim Kumer; Gel, Yulia R.; Kantarcioglu, MuratOver the last couple of years, Bitcoin cryptocurrency and the Blockchain technology that forms the basis of Bitcoin have witnessed a flood of attention. In contrast to fiat currencies used worldwide, the Bitcoin distributed ledger is publicly available by design. This facilitates observing all financial interactions on the network, and analyzing how the network evolves in time. We introduce a novel concept of chainlets, or Bitcoin subgraphs, which allows us to evaluate the local topological structure of the Bitcoin graph over time. Furthermore, we assess the role of chainlets on Bitcoin price formation and dynamics. We investigate the predictive Granger causality of chainlets and identify certain types of chainlets that exhibit the highest predictive influence on Bitcoin price and investment risk.Item Data Mining with Algorithmic Transparency(Springer Verlag) Zhou, Yan; Alufaisan, Yasmeen; Kantarcioglu, Murat; 0000-0001-6423-4533 (Kantarcioglu, M); 305367293 (Kantarcioglu, M); Zhou, Yan; Alufaisan, Yasmeen; Kantarcioglu, MuratIn this paper, we investigate whether decision trees can be used to interpret a black-box classifier without knowing the learning algorithm and the training data. Decision trees are known for their transparency and high expressivity. However, they are also notorious for their instability and tendency to grow excessively large. We present a classifier reverse engineering model that outputs a decision tree to interpret the black-box classifier. There are two major challenges. One is to build such a decision tree with controlled stability and size, and the other is that probing the black-box classifier is limited for security and economic reasons. Our model addresses the two issues by simultaneously minimizing sampling cost and classifier complexity. We present our empirical results on four real datasets, and demonstrate that our reverse engineering learning model can effectively approximate and simplify the black box classifier.Item Adversarial Anomaly Detection Using Centroid-Based Clustering(Institute of Electrical and Electronics Engineers Inc.) Anindya, I. C.; Kantarcioglu, Murat; Kantarcioglu, MuratAs cyber attacks are growing with an unprecedented rate in the recent years, organizations are seeking an efficient and scalable solution towards a holistic protection system. As the adversaries are becoming more skilled and organized, traditional rule based detection systems have been proved to be quite ineffective against the continuously evolving cyber attacks. Consequently, security researchers are focusing on applying machine learning techniques and big data analytics to defend against cyber attacks. Over the recent years, several anomaly detection systems have been claimed to be quite successful against the sophisticated cyber attacks including the previously unseen zero-day attacks. But often, these systems do not consider the adversary's adaptive attacking behavior for bypassing the detection procedure. As a result, deploying these systems in active real-world scenarios fails to provide significant benefits in the presence of intelligent adversaries that are carefully manipulating the attack vectors. In this work, we analyze the adversarial impact on anomaly detection models that are built upon centroid-based clustering from game-theoretic aspect and propose adversarial anomaly detection technique for these models. The experimental results show that our game-theoretic anomaly detection models can withstand attacks more effectively compared to the traditional models.Item Controlling the Signal: Practical Privacy Protection of Genomic Data Sharing through Beacon Services(2018-09-24) Wan, Zhiyu; Vorobeychik, Yevgeniy; Kantarcioglu, Murat; Malin, Bradley; 0000-0001-6423-4533 (Kantarcioglu, M); 305367293 (Kantarcioglu, M); Kantarcioglu, MuratBackground: Genomic data is increasingly collected by a wide array of organizations. As such, there is a growing demand to make summary information about such collections available more widely. However, over the past decade, a series of investigations have shown that attacks, rooted in statistical inference methods, can be applied to discern the presence of a known individual's DNA sequence in the pool of subjects. Recently, it was shown that the Beacon Project of the Global Alliance for Genomics and Health, a web service for querying about the presence (or absence) of a specific allele, was vulnerable. The Integrating Data for Analysis, Anonymization, and Sharing (iDASH) Center modeled a track in their third Privacy Protection Challenge on how to mitigate the Beacon vulnerability. We developed the winning solution for this track. Methods: This paper describes our computational method to optimize the tradeoff between the utility and the privacy of the Beacon service. We generalize the genomic data sharing problem beyond that which was introduced in the iDASH Challenge to be more representative of real world scenarios to allow for a more comprehensive evaluation. We then conduct a sensitivity analysis of our method with respect to several state-of-the-art methods using a dataset of 400,000 positions in Chromosome 10 for 500 individuals from Phase 3 of the 1000 Genomes Project. All methods are evaluated for utility, privacy and efficiency. Results: Our method achieves better performance than all state-of-the-art methods, irrespective of how key factors (e.g., the allele frequency in the population, the size of the pool and utility weights) change from the original parameters of the problem. We further illustrate that it is possible for our method to exhibit subpar performance under special cases of allele query sequences. However, we show our method can be extended to address this issue when the query sequence is fixed and known a priori to the data custodian, so that they may plan stage their responses accordingly. Conclusions: This research shows that it is possible to thwart the attack on Beacon services, without substantially altering the utility of the system, using computational methods. The method we initially developed is limited by the design of the scenario and evaluation protocol for the iDASH Challenge; however, it can be improved by allowing the data custodian to act in a staged manner.Item A Game Theoretic Framework for Analyzing Re-Identification Risk(Public Library of Science) Wan, Zhiyu; Vorobeychik, Yevgeniy; Xia, Weiyi; Clayton, Ellen Wright; Kantarcioglu, Murat; Ganta, Ranjit; Heatherly, Raymond; Malin, Bradley A.; 0000 0001 2710 6938 (Kantarcioglu, M); nb201302379 (Kantarcioglu, M); 0000-0001-6423-4533 (Kantarcioglu, M); 305367293 (Kantarcioglu, M)Given the potential wealth of insights in personal data the big databases can provide, many organizations aim to share data while protecting privacy by sharing de-identified data, but are concerned because various demonstrations show such data can be re-identified. Yet these investigations focus on how attacks can be perpetrated, not the likelihood they will be realized. This paper introduces a game theoretic framework that enables a publisher to balance re-identification risk with the value of sharing data, leveraging a natural assumption that a recipient only attempts re-identification if its potential gains outweigh the costs. We apply the framework to a real case study, where the value of the data to the publisher is the actual grant funding dollar amounts from a national sponsor and the re-identification gain of the recipient is the fine paid to a regulator for violation of federal privacy rules. There are three notable findings: 1) it is possible to achieve zero risk, in that the recipient never gains from re-identification, while sharing almost as much data as the optimal solution that allows for a small amount of risk; 2) the zero-risk solution enables sharing much more data than a commonly invoked de-identification policy of the U.S. Health Insurance Portability and Accountability Act (HIPAA); and 3) a sensitivity analysis demonstrates these findings are robust to order-of-magnitude changes in player losses and gains. In combination, these findings provide support that such a framework can enable pragmatic policy decisions about de-identified data sharing.Item A protocol for the secure linking of registries for HPV surveillance(2012-07-02) El Emam, Khaled; Samet, Saeed; Hu, Jun; Peyton, Liam; Earle, Craig; Jayaraman, Gayatri C.; Wong, Tom; Kantarcioglu, Murat; Dankar, Fida; Essex, AleksanderIntroduction: In order to monitor the effectiveness of HPV vaccination in Canada the linkage of multiple data registries may be required. These registries may not always be managed by the same organization and, furthermore, privacy legislation or practices may restrict any data linkages of records that can actually be done among registries. The objective of this study was to develop a secure protocol for linking data from different registries and to allow on-going monitoring of HPV vaccine effectiveness. Methods: A secure linking protocol, using commutative hash functions and secure multi-party computation techniques was developed. This protocol allows for the exact matching of records among registries and the computation of statistics on the linked data while meeting five practical requirements to ensure patient confidentiality and privacy. The statistics considered were: odds ratio and its confidence interval, chi-square test, and relative risk and its confidence interval. Additional statistics on contingency tables, such as other measures of association, can be added using the same principles presented. The computation time performance of this protocol was evaluated. Results: The protocol has acceptable computation time and scales linearly with the size of the data set and the size of the contingency table. The worse case computation time for up to 100, 000 patients returned by each query and a 16 cell contingency table is less than 4 hours for basic statistics, and the best case is under 3 hours. Discussion: A computationally practical protocol for the secure linking of data from multiple registries has been demonstrated in the context of HPV vaccine initiative impact assessment. The basic protocol can be generalized to the surveillance of other conditions, diseases, or vaccination programs. © 2012 El Emam et al.