Leveraging User and Entity Behavioral Analysis and Machine Learning for Log-Based Anomaly Detection
Abstract
As enterprise businesses increasingly migrate to the cloud for enhanced scalability and simplified management, the demand for robust network observability through log analysis has surged. Security Information and Event Management (SIEM) systems are pivotal in this landscape, ensuring smooth operations by analyzing log files. Integrating User and Entity Behavior Analytics (UEBA) with SIEM systems enhances reliability, rapidly identifies abnormal activities, and minimizes potential damage, thus achieving critical business objectives. Despite the promise of UEBA, significant challenges persist in a cloud-based environment, including the analysis of massive log volumes, high false alarm rates, and resource constraints.
The study's objectives include preprocessing structured logs for exploratory data analysis, assigning risk scores using UEBA, and identifying anomalies through machine learning techniques. This approach aims to create a simple yet effective UEBA and ML-based Log Anomaly Detection system that is affordable for small and medium enterprises (SMEs). The research addresses notable gaps in behavioral analysis for log anomaly detection, emphasizing the need for automated log preprocessing methods to efficiently manage large and complex log data volumes.
This research delves into the existing literature on UEBA-based log anomaly detection within the cloud ecosystem, evaluating computational and performance efficacy using the BGL Dataset. Various machine learning models—such as XGBoost, Random Forest, Neural Networks, and Isolation Forest—are compared to identify a model that optimally balances false alarm rates and computational time. The results indicate that while UEBA techniques generally increase training times, they significantly reduce prediction times, thereby enhancing real-time performance. Among the models studied, XGBoost emerges as the optimal choice due to its high performance and computational efficiency.
In summary, this thesis presents a critical review of UEBA-supported log anomaly detection, aiming to identify performance improvements and optimal machine learning models for superior anomaly detection. This research highlights the necessity for a benchmark UEBA-based log anomaly dataset, further contributing to the advancement of the research community in this domain. This research highlights the potential of UEBA-enhanced machine learning models to provide reliable, efficient, and cost-effective solutions for anomaly detection, particularly benefiting SMEs with limited resources. The findings enable businesses to design robust cloud log-based SIEM systems that achieve significant cost savings, reduce false positives, improve threat response, and proactively mitigate evolving cybersecurity threats, ultimately protecting assets while maintaining stakeholder trust.
KEYWORDS: UEBA, Log Anomaly Detection, Machine Learning, BGL Dataset, Entity Score, SIEM, Cloud Migration