Show simple item record

dc.contributor.authorLi, Yifan
dc.contributor.authorTao, Hemeng
dc.contributor.authorGao, Yang
dc.contributor.authorKhan, Latifur
dc.contributor.authorAyoade, Gbadebo
dc.contributor.authorThuraisingham, B.
dc.date.accessioned2020-03-23T19:40:59Z
dc.date.available2020-03-23T19:40:59Z
dc.date.issued2019-05
dc.identifier.isbn9781450366748
dc.identifier.urihttp://dx.doi.org/10.1145/3308558.3313572
dc.identifier.urihttps://hdl.handle.net/10735.1/7437
dc.description.abstractUnder a newly introduced setting of multistream classification, two data streams are involved, which are referred to as source and target streams. The source stream continuously generates data instances from a certain domain with labels, while the target stream does the same task without labels from another domain. Existing approaches assume that domains for both data streams are identical, which is not quite true in real world scenario, since data streams from different sources may contain distinct features. Furthermore, obtaining labels for every instance in a data stream is often expensive and time-consuming. Therefore, it has become an important topic to explore whether labeled instances from other related streams can be helpful to predict those unlabeled instances in a given stream. Note that domains of source and target streams may have distinct features spaces and data distributions. Our objective is to predict class labels of data instances in the target stream by using the classifiers trained by the source stream. We propose a framework of multistream classification by using projected data from a common latent feature space, which is embedded from both source and target domains. This framework is also crucial for enterprise system defenders to detect cross-platform attacks, such as Advanced Persistent Threats (APTs). Empirical evaluation and analysis on both real-world and synthetic datasets are performed to validate the effectiveness of our proposed algorithm, comparing to state-of-the-art techniques. Experimental results show that our approach significantly outperforms other existing approaches. © 2019 IW3C2 (International World Wide Web Conference Committee), published under Creative Commons CC-BY 4.0 License.
dc.description.sponsorshipNSF award numbers: DMS-1737978, DGE 17236021. OAC-1828467; ARO award number: W911-NF-18-1-0249; NSA awards; ONR award N00014-17-1-2995
dc.language.isoen
dc.publisherAssociation for Computing Machinery, Inc
dc.relation.isPartOfProceedings of the World Wide Web Conference (WWW '19)
dc.rightsCC BY 4.0 (Attribution)
dc.rights©2019 W3C2 (International World Wide Web Conference Committee)
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectData mining
dc.subjectWorld Wide Web
dc.subjectReal world
dc.titleMultistream Classification for Cyber Threat Data with Heterogeneous Feature Space
dc.type.genrearticle
dc.description.departmentErik Jonsson School of Engineering and Computer Science
dc.identifier.bibliographicCitationLi, Y. -F, H. Tao, Y. Gao, L. Khan, et al. 2019. "Multistream classification for cyber threat data with heterogeneous feature space." Proceedings of the World Wide Web Conference (WWW '19): 2992-2998, doi: 10.1145/3308558.3313572
dc.contributor.utdAuthorLi, Yifan
dc.contributor.utdAuthorTao, Hemeng
dc.contributor.utdAuthorGao, Yang
dc.contributor.utdAuthorKhan, Latifur
dc.contributor.utdAuthorAyoade, Gbadebo
dc.contributor.utdAuthorThuraisingham, B.
dc.contributor.VIAF51656251 (Khan, L)


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

CC BY 4.0 (Attribution)
Except where otherwise noted, this item's license is described as CC BY 4.0 (Attribution)