Browsing by Author "Gao, Yang"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item Enhancing Classification and Retrieval Performance by Mining Semantic Similarity Relation from Data(2021-02-12) Gao, Yang; Khan, LatifurWhen describing unstructured data, e.g., images and texts, humans often resort to similarity defining the characteristics of these data in relative terms rather than absolute terms. The subtle differences between such data can be indicated by a human easily while completely describing a single instance of them is a challenging task. For example, in an image retrieval task, to determine if two images are describing the same object, humans may simply ignore the differences in illumination, scaling, background, occlusion, viewpoint and only pay attention to the object itself. On the other hand, describing an image with all its information is hard and unnecessary. Cognitive evidence also suggests that we interpret objects by relating them to prototypical examples stored in our brain. Thus, the similarity is a fundamental property and of great importance in classification and retrieval tasks alike. Metric learning is the process of determining a non-negative, symmetric, and subadditive distance function d(a, b) that aims to establish the similarity or dissimilarity between objects. It reduces the distance between similar objects and increases the distance between dissimilar objects. From the human’s perspective, metric learning can be viewed as determining a function that best matches the user interpretation of the similarity and dissimilarity relation between data items. In this dissertation, we explore the possibilities to enhance the classification and retrieval performance by mining semantic similarity relations in data via metric learning. Unfortunately, existing metric learning solutions have several drawbacks. First, most metric learning models have a fixed model capacity that cannot be changed for adaption to input data. Second, existing online metric learning models learn a linear metric function which limits the model’s expressiveness. Third, they usually require a user-specified margin sensitive to input data and ignore a lot of failure cases during learning. To address these drawbacks, we propose a novel online metric learning framework OAHU that automatically adjusts model capacity based on input data, and introduce an Adaptive Bound Triplet Loss (ABTL) to avoid failure cases during learning. On the other hand, as an important subarea of classification, imbalanced classification is critical to the success of many real-world applications, but few existing solutions have ever considered utilizing data similarity to assist imbalanced learning. Based on this observation, we introduce a novel framework named SetConv, which customizes the feature extraction process for each input sample by considering its semantic similarity relation to the minority class to alleviate the model bias towards the majority classes. We also incorporate metric/similarity learning into a novel open-world stream classifier SIM to handle classifications on open-ended data distribution. Based on our research, we demonstrate that mining semantic similarity relation in data is critical to improving the performance of real-world classification and retrieval tasks.Item Multistream Classification for Cyber Threat Data with Heterogeneous Feature Space(Association for Computing Machinery, Inc, 2019-05) Li, Yifan; Tao, Hemeng; Gao, Yang; Khan, Latifur; Ayoade, Gbadebo; Thuraisingham, B.; 51656251 (Khan, L); Li, Yifan; Tao, Hemeng; Gao, Yang; Khan, Latifur; Ayoade, Gbadebo; Thuraisingham, B.Under a newly introduced setting of multistream classification, two data streams are involved, which are referred to as source and target streams. The source stream continuously generates data instances from a certain domain with labels, while the target stream does the same task without labels from another domain. Existing approaches assume that domains for both data streams are identical, which is not quite true in real world scenario, since data streams from different sources may contain distinct features. Furthermore, obtaining labels for every instance in a data stream is often expensive and time-consuming. Therefore, it has become an important topic to explore whether labeled instances from other related streams can be helpful to predict those unlabeled instances in a given stream. Note that domains of source and target streams may have distinct features spaces and data distributions. Our objective is to predict class labels of data instances in the target stream by using the classifiers trained by the source stream. We propose a framework of multistream classification by using projected data from a common latent feature space, which is embedded from both source and target domains. This framework is also crucial for enterprise system defenders to detect cross-platform attacks, such as Advanced Persistent Threats (APTs). Empirical evaluation and analysis on both real-world and synthetic datasets are performed to validate the effectiveness of our proposed algorithm, comparing to state-of-the-art techniques. Experimental results show that our approach significantly outperforms other existing approaches. © 2019 IW3C2 (International World Wide Web Conference Committee), published under Creative Commons CC-BY 4.0 License.Item Towards Self-Adaptive Metric Learning on the Fly(Association For Computing Machinery, Inc, 2019-05) Gao, Yang; Li, Yi-Fan; Chandra, Swarup; Khan, Latifur; Thuraisingham, Bhavani; 51867299 (Thuraisingham, BM); Gao, Yang; Li, Yi-Fan; Chandra, Swarup; Khan, Latifur; Thuraisingham, Bhavani M.Good quality similarity metrics can significantly facilitate the performance of many large-scale, real-world applications. Existing studies have proposed various solutions to learn a Mahalanobis or bilinear metric in an online fashion by either restricting distances between similar (dissimilar) pairs to be smaller (larger) than a given lower (upper) bound or requiring similar instances to be separated from dissimilar instances with a given margin. However, these linear metrics learned by leveraging fixed bounds or margins may not perform well in real-world applications, especially when data distributions are complex. We aim to address the open challenge of “Online Adaptive Metric Learning” (OAML) for learning adaptive metric functions on-the-fly. Unlike traditional online metric learning methods, OAML is significantly more challenging since the learned metric could be non-linear and the model has to be self-adaptive as more instances are observed. In this paper, we present a new online metric learning framework that attempts to tackle the challenge by learning a ANN-based metric with adaptive model complexity from a stream of constraints. In particular, we propose a novel Adaptive-Bound Triplet Loss (ABTL) to effectively utilize the input constraints, and present a novel Adaptive Hedge Update (AHU) method for online updating the model parameters. We empirically validates the effectiveness and efficacy of our framework on various applications such as real-world image classification, facial verification, and image retrieval. © 2019 IW3C2 (International World Wide Web Conference Committee), published under Creative Commons CC-BY 4.0 License.