logo
ResearchBunny Logo
Real-time botnet detection on large network bandwidths using machine learning

Computer Science

Real-time botnet detection on large network bandwidths using machine learning

J. Velasco-mata, V. González-castro, et al.

This research presents an advanced machine learning approach for real-time botnet detection on large networks, achieving an impressive F1-score of 0.926. Conducted by Javier Velasco-Mata, Víctor González-Castro, Eduardo Fidalgo, and Enrique Alegre, this study demonstrates exceptional performance even in challenging network conditions. Discover the future of cybersecurity with this innovative solution!

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses the challenge of detecting botnets in real time over large network bandwidths where manual inspection is infeasible. Traditional payload-based detection can be evaded, and while machine learning methods have achieved strong detection rates, few are adapted for high-demand, real-time environments such as Industry 4.0. The purpose is to design and validate a flow-based detection approach that operates on one-second time windows to minimize detection latency while maintaining acceptable accuracy. The authors contribute: (1) a high-capacity classifier that analyzes TCP/UDP traffic using only four quickly computed features on one-second windows, with assessment on saturated networks; (2) empirical evidence that using raw IP addresses as features is counterproductive; and (3) a hardware sizing study estimating the number of 2.4 GHz CPU cores required for real-time detection at 100 Mbps, 1 Gbps, and 10 Gbps. A Decision Tree classifier was selected for efficiency based on prior work, and the proposed approach is compared against three state-of-the-art models.
Literature Review
The State of the Art section surveys botnet detection across multiple strategies. DNS-based detection focuses on identifying DGA-generated domains used by botmasters; examples include ConnSpoiler for IoT networks using TRW with low CPU usage and low false positive rates, and deep learning approaches (e.g., CNN+LSTM) that flagged potential botnets undetected by commercial tools. Frameworks like BotDet combine DGA detection with blacklist modules. For inter-bot communication analysis, distributed and SDN-based systems (e.g., BotGuard) aggregate switch telemetry and achieve sub-100 ms delays with >90% accuracy on synthetic SDN traffic. Time-series anomaly methods using ordinal pattern transformations have achieved 98.5–100% accuracy on N-BaIoT but require at least 1 minute of data to construct time series and 24–40 ms per detection thereafter. Big data pipelines (Hadoop/Hive/Mahout) with Random Forest can process large pcaps quasi-real-time with 5–30 s delays at high accuracy (≈99.7%). Feature selection and flow-based learning improve efficiency; e.g., Gini-based selection with RF yields 90% accuracy on CTU-13. Time-windowing has been studied extensively: several works report optimal performance around 180 s windows using Naive Bayes, Decision Trees, or ensembles, with TPR >90% and low FPR, while too-short windows increase flow counts and too-long windows reduce accuracy. A multilayer framework with one-second windows achieved 92% F1 on CTU-13 but relied on clustering, preventing true real-time detection. The reviewed literature indicates a trade-off between detection performance and latency, with few methods tuned for one-second, real-time operation on high bandwidths.
Methodology
The approach processes network traffic in one-second time windows and performs flow-based classification using four features designed for rapid computation: (1) source port number, (2) destination port number, (3) number of packets within the window, and (4) total bytes transmitted in that second. Flows are distinguished by source/destination IP and ports; features avoid expensive statistics (means/standard deviations) and deep packet inspection to support large bandwidths and low latency suitable for IDS alerting. Classifier: A Decision Tree (Scikit-Learn default Gini impurity) is used for its fast inference and quick retraining capabilities. The authors acknowledge that ports could be spoofed to evade detection, but several evaluated botnets (e.g., Bunitu, Miuref) already use common ports (80/443) and are still detected. Robustness to packet loss: The model is trained on ideal (no-drop) traffic, then evaluated under (i) ideal conditions and (ii) simulated saturation with 10% random packet drops, reflecting a commonly cited operability threshold. Dropped packets are excluded from feature computation. Performance evaluation: The proposed model is compared with three Decision Tree-based flow classifiers adapted to one-second windows: Zhao et al., Gahelot and Dayal (original and a modified version excluding IP address features), and the authors’ previous HAIS model. Evaluation includes weighted F1 and per-sample processing time. Hardware sizing: Single-threaded timings for feature extraction and classification per one-second traffic slice are measured under simulated bandwidths of 100 Mbps, 1 Gbps, and 10 Gbps by aggregating real flows to match per-second byte volumes. Tests run in Python 3 on an Intel Xeon E5-2630 v3 (2.4 GHz, 3.5 GHz Turbo). Parallelization assumptions are then used to estimate required CPU cores for real-time throughput. Dataset and preprocessing: PCAP traces combine CTU-13 botnets (Donbot, Murlo, NSIS, Neris, Rbot, Sogou, Virut) and Stratosphere IPS captures for normal traffic and botnets Bunitu, Miuref, and NotPetya. PyShark extracts packet timestamp, length, protocol, source/dest IPs and ports, and SYN flag. Time windows of 1 s and 180 s are considered; flow counts per class vary by window size due to connection duration differences.
Key Findings
- One-second vs 180-second windows: Across classes, F1 scores with one-second windows were similar or better than with 180 s. Notably, Sogou achieved F1 ≈ 0.59 (1 s) vs <0.01 (180 s) due to sample scarcity in longer windows. Donbot F1 remained <0.1 in both due to overlap with Neris (≈96% of Donbot and 30% of Neris traffic to port 25), causing misclassification of Donbot as Neris. NotPetya, though underrepresented, achieved high F1 due to distinctive ports. - Performance under saturation: With a 10% packet drop rate, weighted F1s per class were similar to ideal conditions, indicating robustness despite using short (1 s) windows and few packets per decision. - Comparative performance and speed (one-second windows): • Proposed model: F1 = 0.926; mean time per sample = 0.007 ms (fastest), only ≈2 percentage points below the best-accuracy model. • HAIS (authors’ prior work, 13 features): F1 = 0.947; 0.503 ms/sample (slowest). • Zhao et al.: F1 = 0.882; 0.095 ms/sample. • Gahelot & Dayal (original, uses IPs): F1 = 0.095; 0.048 ms/sample. • Gahelot & Dayal (modified, no IPs): F1 = 0.911; 0.046 ms/sample. - Evidence against using raw IP addresses as features: When test-set IPs are excluded from training, the original Gahelot & Dayal model collapses (F1 = 0.095), whereas a version without IP features improves markedly (F1 = 0.911). Using raw IPs leads to overfitting and poor generalization. - Processing time and hardware estimates (single-threaded on 2.4 GHz core): • Feature extraction is faster than classification. • 100 Mbps and 1 Gbps: maximum total processing time per 1 s of traffic ≈ 3.8 s; with parallelization, at least 4 similar CPU cores are required for real-time throughput. • 10 Gbps: maximum total processing time per 1 s of traffic ≈ 15.9 s (worst-case: features 8.0 s; classification 10.3 s); at least 19 similar CPU cores needed for real-time processing. - Overall, the approach achieves high accuracy with minimal features and an order-of-magnitude speedup over comparable one-second-window methods.
Discussion
The findings demonstrate that accurate botnet detection can be achieved with one-second time windows and a minimal set of rapidly computed flow features, directly addressing the need for real-time operation on high-bandwidth networks. Maintaining F1 performance comparable to or better than 180-second windows shows that extended observation periods are not strictly necessary for effective discrimination, and short windows confer faster alerting with minimal latency. Robustness under 10% packet loss indicates resilience in realistic, saturated network conditions. Comparative evaluations substantiate that the proposed Decision Tree model balances accuracy and computational efficiency better than alternatives: it approaches the top F1 score while being dramatically faster per sample, enabling scaling through parallelization. The analysis also clarifies methodological pitfalls: incorporating raw IP addresses as features causes spurious learning tied to specific hosts or networks, leading to poor generalization. The discussion considers potential evasion (e.g., using common ports or mimicking normal traffic patterns); empirical results show that even botnets using ports 80/443 can be separated from normal traffic, and more elaborate mimicry would increase malware complexity and communication volume, potentially reducing stealthiness. These results are relevant for deploying practical IDS/IPS in enterprise and industrial settings, where timely detection and predictable hardware sizing are crucial. The core-speed and core-count estimates provide actionable guidance for provisioning at 100 Mbps, 1 Gbps, and 10 Gbps.
Conclusion
The paper presents a real-time botnet detection approach using a Decision Tree and four simple flow features computed on one-second windows. Experiments show that shortening windows from 180 s to 1 s does not significantly reduce F1 and enables near-instantaneous detection (results within 1–2 s). The method maintains performance under 10% packet loss, achieves weighted F1 = 0.926 with 0.007 ms per-sample processing, and outperforms comparable methods in speed. The work also demonstrates that using raw IP addresses as features undermines generalization. Hardware sizing indicates that with 2.4 GHz cores, achieving real-time throughput requires approximately 4 cores for 100 Mbps and 1 Gbps, and 19 cores for 10 Gbps. Future work includes evaluating additional botnets, building a knowledge base of cases amenable to this lightweight model versus those requiring more advanced techniques.
Limitations
- Feature simplicity and port reliance: The four-feature set (source/destination ports, packet count, byte count) may be susceptible to advanced evasion, such as sophisticated mimicry of normal traffic patterns or deliberate use of common ports. Although several evaluated botnets already use ports 80/443 and were detected, more adaptive adversaries could reduce separability. - Class-specific challenges: Donbot exhibited very low F1 due to overlap with Neris on port 25, indicating potential confusion where port distributions heavily overlap across families. - Dataset scope and representativeness: The evaluation covers specific botnets (CTU-13 and Stratosphere IPS captures) and normal traffic traces available at the time; generalizability to other botnets or environments may vary. - Performance measurement context: Runtime assessments were single-threaded in Python on a specific CPU model and used simulated per-second bandwidth via aggregation; real-world deployments, alternative implementations, or different hardware may yield different timings. - Comparison to full-feature IDS: The approach is not directly comparable to heavyweight IDS like Suricata that perform deeper analyses; accuracy/throughput trade-offs differ.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny