Design and Development of Real-Time Big Data Analytics Frameworks

dc.contributor.advisorKhan, Latifur
dc.creatorSolaimani, M.
dc.creator.orcid0000-0001-7049-581X
dc.date.accessioned2018-04-02T13:42:44Z
dc.date.available2018-04-02T13:42:44Z
dc.date.created2017-12
dc.date.issued2017-12
dc.date.submittedDecember 2017
dc.date.updated2018-04-02T13:42:44Z
dc.description.abstractToday most sophisticated technologies such as Internet of Things (IoT), autonomous driving, Cloud, data center consolidation, etc., demand smarter IT infrastructure and real-time operations. They continuously generate lots of data called “Big Data” to report their operational activities. In response to this, we need advanced analytics frameworks to capture, filter, and analyze data and make quick decisions in real-time. The high volumes, velocities, and varieties of data make it an impossible (overwhelming) task for humans in real-time. Current state-of-the-arts like advanced analytics, Machine learning (ML), Natural Language Processing (NLP) can be utilized to handle heterogeneous Big Data. However, most of these algorithms suffer scalability issues and cannot manage real-time constraints. In this dissertation, we have focused on two areas: anomaly detection on structured VMware performance data (e.g., CPU/Memory usage metric, etc.) and text mining for politics in unstructured text data. We have developed real-time distributed frameworks with ML and NLP techniques. With regard to anomaly detection, we have implemented an adaptive clustering technique to identify individual anomalies and a Chi-square-based statistical technique to detect group anomalies in real-time. With regards to text mining, we have developed a real-time framework SPEC to capture online news articles of different languages from the web and annotated them using CoreNLP, PETRARCH, and CAMEO dictionary to generate structured political events like ‘who-did-what-to-whom’ format. Later, we extend this framework to code atrocity events – a machine coded structured data containing perpetrators, action, victims, etc. Finally, we have developed a novel, distributed, window-based political actor recommendation framework to discover and recommend new political actors with their possible roles. We have implemented scalable distributed streaming frameworks with a message broker – Kafka, unsupervised and supervised machine learning techniques and Spark.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10735.1/5684
dc.language.isoen
dc.rightsCopyright ©2017 is held by the author. Digital access to this material is made possible by the Eugene McDermott Library. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
dc.subjectBig data
dc.subjectSPARK (Computer program language)
dc.subjectAnomaly detection (Computer security)
dc.subjectMachine learning
dc.subjectText processing (Computer science)
dc.subjectData mining
dc.titleDesign and Development of Real-Time Big Data Analytics Frameworks
dc.typeDissertation
dc.type.materialtext
thesis.degree.departmentComputer Science
thesis.degree.grantorThe University of Texas at Dallas
thesis.degree.levelDoctoral
thesis.degree.namePHD

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ETD-5608-7474.73.pdf
Size:
1.93 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description: