Khan, Latifur

Permanent URI for this collection

Latifur Khan joined the UTD faculty in 2000 and is currently Professor of Computer Science. Dr. Khan's research interests include:

  • Big data management and analytics
  • Data mining
  • Database systems
  • Semantic web
  • Complex data management (multimedia, geo-spatial)
  • Multimedia information management
  • Intrusion detection


Recent Submissions

Now showing 1 - 6 of 6
  • Item
    Towards Self-Adaptive Metric Learning on the Fly
    (Association For Computing Machinery, Inc, 2019-05) Gao, Yang; Li, Yi-Fan; Chandra, Swarup; Khan, Latifur; Thuraisingham, Bhavani; 51867299 (Thuraisingham, BM); Gao, Yang; Li, Yi-Fan; Chandra, Swarup; Khan, Latifur; Thuraisingham, Bhavani M.
    Good quality similarity metrics can significantly facilitate the performance of many large-scale, real-world applications. Existing studies have proposed various solutions to learn a Mahalanobis or bilinear metric in an online fashion by either restricting distances between similar (dissimilar) pairs to be smaller (larger) than a given lower (upper) bound or requiring similar instances to be separated from dissimilar instances with a given margin. However, these linear metrics learned by leveraging fixed bounds or margins may not perform well in real-world applications, especially when data distributions are complex. We aim to address the open challenge of “Online Adaptive Metric Learning” (OAML) for learning adaptive metric functions on-the-fly. Unlike traditional online metric learning methods, OAML is significantly more challenging since the learned metric could be non-linear and the model has to be self-adaptive as more instances are observed. In this paper, we present a new online metric learning framework that attempts to tackle the challenge by learning a ANN-based metric with adaptive model complexity from a stream of constraints. In particular, we propose a novel Adaptive-Bound Triplet Loss (ABTL) to effectively utilize the input constraints, and present a novel Adaptive Hedge Update (AHU) method for online updating the model parameters. We empirically validates the effectiveness and efficacy of our framework on various applications such as real-world image classification, facial verification, and image retrieval. © 2019 IW3C2 (International World Wide Web Conference Committee), published under Creative Commons CC-BY 4.0 License.
  • Item
    Classified Enhancement Model for Big Data Storage Reliability Based on Boolean Satisfiability Problem
    (Springer New York LLC, 2019-05-11) Huang, H.; Khan, Latifur; Zhou, S.; 51656251 (Khan, L); Khan, Latifur
    Disk reliability is a serious problem in the big data foundation environment. Although the reliability of disk drives has greatly improved over the past few years, they are still the most vulnerable core components in the server. If they fail, the result can be catastrophic: it can take some days to recover data, sometimes data lost forever. These are unacceptable for some important data. XOR parity is a typical method to generate reliability syndrome, thus improving the reliability of the data. In practice, we find that the data is still likely to be lost. In most storage systems reliability improvements are achieved through the allocation of additional disks in Redundant Arrays of Independent Disks (RAID), which will increase the hardware costs, thus it will be very difficult for cost-constrained environments. Therefore, how to improve the data integrity without raising the hardware cost has aroused much interest of big data researchers. This challenge is when creating non-traditional RAID geometries, care must be taken to respect data dependence relationships to ensure that the new RAID strategy improves reliability, which is a NP-hard problem. In this paper, we present an approach for characterizing these challenges using high-dimension variants of the n-queens problem that enables performable solutions via the SAT solver MiniSAT, and use the greedy algorithm to analyze the queen’s attack domain, as a basis for reliability syndrome generation. A large number of experiments show that the approach proposed in this paper is feasible in software-defined data centers and the performance of the algorithm can meet the current requirements of the big data environment. © 2019, Springer Science+Business Media, LLC, part of Springer Nature.
  • Item
    SAIL: A Scalable Wind Turbine Fault Diagnosis Platform: A Case Study on Gearbox Fault Diagnosis
    (Springer Verlag) Bahojb Imani, Maryam; Heydarzadeh, Mehrdad; Chandra, Swarup; Khan, Latifur; Nourani, Mehrdad; 0000-0001-5077-4424 (Nourani, M); 51656251 (Khan, L); Bahojb Imani, Maryam; Heydarzadeh, Mehrdad; Chandra, Swarup; Khan, Latifur; Nourani, Mehrdad
    Failure of a wind turbine is largely attributed to faults that occur in its gearbox. Maintenance of this machinery is very expensive, mainly due to large downtime and repair cost. While much attention has been given to detect faults in these mechanical devices, real-time fault diagnosis for streaming vibration data from turbine gearboxes is still an outstanding challenge. Moreover, monitoring gearboxes in a wind farm with thousands of wind turbines require massive computational power. In this paper, we propose a three-layer monitoring system: Sensor, Fog, and Cloud layers. Each layer provides a special functionality and runs part of the proposed data processing pipeline. In the Sensor layer, vibration data is collected using accelerometers. Industrial single chip computers are best candidates for node computation. Since the majority of wind turbines are installed in harsh environments, sensor node computers should be embedded within wind turbines. Therefore, a robust computation platform is necessary for sensor nodes. In this layer, we propose a novel feature extraction method which is applied over a short window of vibration data. Using a time-series model assumption, our method estimates vibration power at high resolution and low cost. Fog layer provides Internet connectivity. Fog-server collects data from sensor nodes and sends them to the cloud. Since many wind farms are located in remote locations, providing network connectivity is challenging and expensive. Sometimes a wind farm is offshore and a satellite connection is the only solution. In this regard, we use a compressive sensing algorithm by deploying them on fog-servers to conserve communication bandwidth. Cloud layer performs most computations. In the online mode, after decompression, fault diagnosis is performed using trained classifier, while generating reports and logs. Whereas, in the offline mode, model training for classifier, parameters learning for feature extraction in sensor layer and dictionary learning for compression on fog servers and decompression are performed. The proposed architecture monitors the health of turbines in a scalable framework by leveraging the distributed computation techniques. Our empirical evaluation of vibration datasets obtained from real wind turbines demonstrates high scalability and performance of diagnosing gearbox failures, i.e., with an accuracy greater than 99%, for application in large wind farms. ©2019, Springer Nature Switzerland AG.
  • Item
    Integrating Cyber Security and Data Science for Social Media: A Position Paper
    (Institute of Electrical and Electronics Engineers Inc.) Thuraisingham, Bhavani M.; Kantarcioglu, Murat; Khan, Latifur; 0000-0001-6423-4533 (Kantarcioglu, M); 51867299 (Thuraisingham, BM); 305367293 (Kantarcioglu, M); 51656251 (Khan, L); Thuraisingham, Bhavani M.; Kantarcioglu, Murat; Khan, Latifur
    Cyber security and data science are two of the fastest growing fields in Computer Science and more recently they are being integrated for various applications. This position paper will review the developments in applying Data science for cyber security and cyber security for data science and then discuss the applications in Social Media.
  • Item
    A Complex Task Scheduling Scheme for Big Data Platforms Based on Boolean Satisfiability Problem
    (Institute of Electrical and Electronics Engineers Inc.) Hong, H.; Khan, Latifur; Ayoade, Gbadebo G.; Shaohua, Z.; Yong, W.; Khan, Latifur; Ayoade, Gbadebo G.
    In the big data processing systems, the amount of data is increasing. At the same time, the real-time requirement of data processing and analysis is higher and higher. Therefore, it is required that the big data processing and analysis systems have better performance. Job scheduling plays an important role in improving the overall system performance in big data processing frameworks. However, job scheduling is a difficult NP-hard problem. There are many factors that need to be considered for job scheduling. For example, jobs have dependencies among stages, therefore we should not allocate resources to tasks that are not ready. Sometimes, there are constraints between jobs. These are a challenge to the scheduling performance of big data processing and analysis systems. In this paper, we try to solve the problem by translating it into Boolean Satisfiability Problem (SAT) which is an exact method. SAT-based scheduling algorithm is not a new approach, but in the past it mainly used to solve the static scheduling problems. For dynamic scheduling system, it requires all problems to be solved within a limited time, which is a challenge for SAT encoding. In this paper, we refer to the previous SAT solution to the Job Shop Scheduling Problem, and adjust the algorithm to meet the requirements of the big data processing system. At the same time, we optimized the coding approach and reduced the number of clauses. Thus, the efficiency of the problem solved is improved to meet the performance requirements. The experimental results show that the number of clauses is reduced by more than 30%, and the processing time of the SAT solver to get the solution can be reduced by more than 50%. To demonstrate its effectiveness, we have also implemented our new job scheduler in Apache Hadoop YARN, and validated its effectiveness.
  • Item
    Decentralized IoT Data Management Using BlockChain and Trusted Execution Environment
    (Institute of Electrical and Electronics Engineers Inc.) Ayoade, Ghadebo; Karande, Vishal; Khan, Latifur; Hamlen, Kevin W.; 51656251 (Khan, L); 50151836493420401232 (Hamlen, KW); Ayoade, Ghadebo; Karande, Vishal; Khan, Latifur; Hamlen, Kevin
    Due to the centralization of authority in the management of data generated by IoT devices, there is a lack of transparency in how user data is being shared among third party entities. With the popularity of adoption of blockchain technology, which provide decentralized management of assets such as currency as seen in Bitcoin, we propose a decentralized system of data management for IoT devices where all data access permission is enforced using smart contracts and the audit trail of data access is stored in the blockchain. With smart contracts applications, multiple parties can specify rules to govern their interactions which is independently enforced in the blockchain without the need for a centralized system. We provide a framework that store the hash of the data in the blockchain and store the raw data in a secure storage platform using trusted execution environment (TEE). In particular, we consider Intel SGX as a part of TEE that ensure data security and privacy for sensitive part of the application (code and data).

Works in Treasures @ UT Dallas are made available exclusively for educational purposes such as research or instruction. Literary rights, including copyright for published works held by the creator(s) or their heirs, or other third parties may apply. All rights are reserved unless otherwise indicated by the copyright owner(s).