Using Machine Learning Techniques for Prediction and Data Generation with Applications to Data Privacy

dc.contributor.ORCID0000-0002-7930-3455 (Abay, NC)
dc.contributor.advisorThuraisingham, Bhavani
dc.contributor.advisorKantarcioglu, Murat
dc.creatorAbay, Nazmiye Ceren
dc.date.accessioned2020-12-10T19:55:09Z
dc.date.available2020-12-10T19:55:09Z
dc.date.created2019-12
dc.date.issued2019-12
dc.date.submittedDecember 2019
dc.date.updated2020-12-10T19:55:10Z
dc.description.abstractIncreasingly, machine learning (ML) applications are developed and become an integral part of many real-world applications. Especially, ML techniques are heavily used in research and industry to help make effective decisions. Despite the apparent recent success of ML techniques, there exist some domain-specific challenges that require in-depth investigations with respect to predictive accuracy, privacy protection and cybersecurity. In this dissertation, we start with understanding the usability of ML techniques in the cryptocurrency transaction domain (e.g., Bitcoin) where there is no privacy concern (i.e., all Bitcoin transaction information is public) and show how to use ML techniques to make better predictions in real-time. For application domains that involve sensitive data, collecting, sharing and refining of these sensitive data may raise serious privacy concerns. To address these concerns, we propose a privacy preserving synthetic data generation technique that leverages deep learning. The proposed technique allows participants to share the synthetic datasets freely without worrying about the individual privacy. Furthermore, we compare our proposed technique with the existing synthetic data generation algorithms, and investigate the utility of these algorithms under different use cases. Finally, we explore the usage of the generated synthetic data to improve the cybersecurity posture of the organizations. Basically, we show that the generated synthetic data not only protect individual privacy but can be used to deceive (i.e., the synthetic data is indistinguishable from the real data) the potential cyberattackers. This in return could be used to reduce sensitive data leakage under successful cyberattacks where an attacker could be deceived to target synthetic data instead of the real, and sensitive data.
dc.description.sponsorshipNIH award 1R01HG006844, NSF awards CICI-1547324 and IIS-1633331
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/10735.1/9091
dc.language.isoen
dc.rights©2019 Nazmiye Ceren Abay. All rights reserved.
dc.subjectCryptocurrencies
dc.subjectComputer security
dc.subjectMachine learning
dc.subjectArtificial intelligence
dc.titleUsing Machine Learning Techniques for Prediction and Data Generation with Applications to Data Privacy
dc.typeDissertation
dc.type.materialtext
thesis.degree.departmentComputer Science
thesis.degree.grantorThe University of Texas at Dallas
thesis.degree.levelDoctoral
thesis.degree.namePHD

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ETD-5608-011D-262384.21.pdf
Size:
993.38 KB
Format:
Adobe Portable Document Format
Description:
Dissertation

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description: