Design and Development of Real-Time Big Data Analytics Frameworks
MetadataShow full item record
Today most sophisticated technologies such as Internet of Things (IoT), autonomous driving, Cloud, data center consolidation, etc., demand smarter IT infrastructure and real-time operations. They continuously generate lots of data called “Big Data” to report their operational activities. In response to this, we need advanced analytics frameworks to capture, ﬁlter, and analyze data and make quick decisions in real-time. The high volumes, velocities, and varieties of data make it an impossible (overwhelming) task for humans in real-time. Current state-of-the-arts like advanced analytics, Machine learning (ML), Natural Language Processing (NLP) can be utilized to handle heterogeneous Big Data. However, most of these algorithms suffer scalability issues and cannot manage real-time constraints. In this dissertation, we have focused on two areas: anomaly detection on structured VMware performance data (e.g., CPU/Memory usage metric, etc.) and text mining for politics in unstructured text data. We have developed real-time distributed frameworks with ML and NLP techniques. With regard to anomaly detection, we have implemented an adaptive clustering technique to identify individual anomalies and a Chi-square-based statistical technique to detect group anomalies in real-time. With regards to text mining, we have developed a real-time framework SPEC to capture online news articles of different languages from the web and annotated them using CoreNLP, PETRARCH, and CAMEO dictionary to generate structured political events like ‘who-did-what-to-whom’ format. Later, we extend this framework to code atrocity events – a machine coded structured data containing perpetrators, action, victims, etc. Finally, we have developed a novel, distributed, window-based political actor recommendation framework to discover and recommend new political actors with their possible roles. We have implemented scalable distributed streaming frameworks with a message broker – Kafka, unsupervised and supervised machine learning techniques and Spark.