Using Machine Learning Techniques for Prediction and Data Generation with Applications to Data Privacy
Abay, Nazmiye Ceren
MetadataShow full item record
Increasingly, machine learning (ML) applications are developed and become an integral part of many real-world applications. Especially, ML techniques are heavily used in research and industry to help make effective decisions. Despite the apparent recent success of ML techniques, there exist some domain-specific challenges that require in-depth investigations with respect to predictive accuracy, privacy protection and cybersecurity. In this dissertation, we start with understanding the usability of ML techniques in the cryptocurrency transaction domain (e.g., Bitcoin) where there is no privacy concern (i.e., all Bitcoin transaction information is public) and show how to use ML techniques to make better predictions in real-time. For application domains that involve sensitive data, collecting, sharing and refining of these sensitive data may raise serious privacy concerns. To address these concerns, we propose a privacy preserving synthetic data generation technique that leverages deep learning. The proposed technique allows participants to share the synthetic datasets freely without worrying about the individual privacy. Furthermore, we compare our proposed technique with the existing synthetic data generation algorithms, and investigate the utility of these algorithms under different use cases. Finally, we explore the usage of the generated synthetic data to improve the cybersecurity posture of the organizations. Basically, we show that the generated synthetic data not only protect individual privacy but can be used to deceive (i.e., the synthetic data is indistinguishable from the real data) the potential cyberattackers. This in return could be used to reduce sensitive data leakage under successful cyberattacks where an attacker could be deceived to target synthetic data instead of the real, and sensitive data.