Practical Network Anomaly Detection: From Data Generation to Classification




Journal Title

Journal ISSN

Volume Title



Throughout the Internet age, computer network-based threats have been commonplace, with distributed denial-of-service (DDoS) attacks as a centerpiece. These attacks can knock network servers and even entire networks offline, potentially resulting in lost customers and revenue. What once was a means-to-an-end, DDoS attacks more recently have been used in coordination with other attacks. This threat ensemble, which we call DDoS-as-a-smokescreen (DaaSS), leverages a DDoS attack to provide a distraction or smokescreen for another attack or attacks. In addition to the damage a DDoS can inflict, entities targeted with a DaaSS attack may suffer from theft of financial data or customer records. As DDoS attacks become cheaper to launch, morphing into a service for hire, we expect DaaSS attacks to only increase in prevalence. Despite their prevalence, DaaSS attacks have been largely ignored by industry. One potential reason is the nature of DaaSS attacks themselves. The attack which the DDoS is providing a smokescreen for is often a zero-day, or something previously unknown. Detection of zero-day threats require anomaly-based intrusion detection systems (IDS), introducing a potentially unacceptable false positive rate. This issue is further complicated with DaaSS attacks, as a false positive could divert IT support staff mitigating the DDoS to a non-existent threat. Instead, signature-based IDS that are able to detect threats with no false positives are typically used in practice, but with a trade-off of being blind to zero-day attacks. Anomaly-based network intrusion detection systems that can detect DaaSS attacks need to be capable of detecting DDoS and the other attack or attacks which the DDoS is providing a smokescreen for. Our goal is to devise methodologies and techniques for anomaly-based DaaSS detection which minimizes, preferably to zero, the false positive rate, and maintaining this performance over time as network usage patterns shift. The detection systems should be straightforward to train without labeled training data or hyper-parameter guesswork. We refer to this entire process as practical network anomaly detection, to emphasize that our methodologies and techniques can produce an IDS that is capable of practical deployment and performance. This process incorporates network data generation techniques, data structuring, and design of detection systems. Network intrusion detection is inherently data-driven, but there do not exist network datasets that capture DaaSS attacks which we are aware of. This requires us to generate our own, and within the budget, time, and space constraints of an academic lab. To this end, we develop a computer network traffic generation and monitoring framework which is able to generate realistic benign network traffic along with DDoS and other attacks as packet captures. These packet captures are then compiled into propositional network flow datasets, using attributes and augmentation methods suitable to train and test DaaSS detection models. For DaaSS detection, we generalize cluster-based models to represent multiple known classes plus an unknown anomalous class. The baseline approach is an extension to k-Prototypes hard clustering which utilizes bounded densities to support classification of anomalies. Using network flow augmentation to structure the data, we demonstrate that this detection model is capable of detecting DaaSS attacks often with zero false positives, and can keep consistent performance in the face of shifting network traffic patterns. Utilizing flow segmentation and concurrency to structure the data, we are able to significantly decrease DDoS detection times, potentially leading to faster detection of the other threat or threats which comprise a DaaSS attack. We then develop a detection model utilizing soft clustering which encodes structure of the DaaSS threat ensemble itself, enabling model training from a small subset of benign traffic, simplifying the data generation and collection process significantly. This detection model outperforms the baseline, and furthermore does not incorporate any hyper-parameters, eliminating any trial-and-error involved with hyper-parameter tuning. Overall, this approach demonstrates a means by which anomaly-based network intrusion detection may be practical for real-world use, providing practicality from data generation through model training and performance once deployed.



Computer Science