Design and Development of Scalable Analytics Frameworks with Applications in Blockchain Smart Contract Security and Political News Mining




Journal Title

Journal ISSN

Volume Title



Nowadays, high amounts of data are continuously generated at unprecedented rate from various domains such as e-commerce, education, health, security, and social networks. This is due to many technological advancements, including Internet of Things (IoT), autonomous driving, the proliferation of Cloud Computing, data center consolidation as well as the growth of smart devices. The term big data was created to demonstrate the meaning of this emerging trend. The high volumes, velocities, and varieties of data pose a great challenge for the data mining community to extract useful knowledge. In response to this, we need scalable analytics frameworks for data acquisition, filtering, and analyzing in a quick time. Current state-of-the-arts like advanced analytics, Machine Learning (ML), Natural Language Processing (NLP) can be utilized to handle heterogeneous Big Data. Yet, most of these systems suffer scalability issues. In this dissertation, we focus on social science and blockchain areas. More specifically, we focus on location extraction from unstructured political text data, vulnerability detection in Blockchain’s smart contracts and fault diagnosis in wind turbine vibration data. With regard to focus location extraction, although various tools exist to identify geolocation, they fail to identify at a granular level; they mostly rely on external knowledge, and they do not support most languages. We propose a novel scalable framework PROFILE to extract the primary focus location from political news articles in different languages. With regard to blockchain, existing solutions to this problem particularly rely on human experts to define features or different rules to detect vulnerabilities, which often lead to missing many vulnerabilities and they are inefficient in detecting new vulnerabilities. We develop a novel scalable framework to detect vulnerabilities in smart contracts. With regard to fault diagnosis in wind turbines, real-time fault diagnosis for streaming vibration data from turbine gearboxes is still an outstanding challenge. Moreover, monitoring gearboxes in a wind farm with thousands of wind turbines requires massive computational power. We address these challenges by developing SAIL, a scalable real-time framework, to capture wind turbine vibration data using a novel feature extraction and predict faults in gearbox. We show empirically that the proposed techniques outperform state-of-the-art techniques in all three areas.



Data mining, Press and politics, Big data, Text data mining, Information visualization


©2020 Maryam Bahojb Imani. All rights reserved.