TY - JOUR A1 - Sapegin, Andrey A1 - Jaeger, David A1 - Cheng, Feng A1 - Meinel, Christoph T1 - Towards a system for complex analysis of security events in large-scale networks JF - Computers & security : the international journal devoted to the study of the technical and managerial aspects of computer security N2 - After almost two decades of development, modern Security Information and Event Management (SIEM) systems still face issues with normalisation of heterogeneous data sources, high number of false positive alerts and long analysis times, especially in large-scale networks with high volumes of security events. In this paper, we present our own prototype of SIEM system, which is capable of dealing with these issues. For efficient data processing, our system employs in-memory data storage (SAP HANA) and our own technologies from the previous work, such as the Object Log Format (OLF) and high-speed event normalisation. We analyse normalised data using a combination of three different approaches for security analysis: misuse detection, query-based analytics, and anomaly detection. Compared to the previous work, we have significantly improved our unsupervised anomaly detection algorithms. Most importantly, we have developed a novel hybrid outlier detection algorithm that returns ranked clusters of anomalies. It lets an operator of a SIEM system to concentrate on the several top-ranked anomalies, instead of digging through an unsorted bundle of suspicious events. We propose to use anomaly detection in a combination with signatures and queries, applied on the same data, rather than as a full replacement for misuse detection. In this case, the majority of attacks will be captured with misuse detection, whereas anomaly detection will highlight previously unknown behaviour or attacks. We also propose that only the most suspicious event clusters need to be checked by an operator, whereas other anomalies, including false positive alerts, do not need to be explicitly checked if they have a lower ranking. We have proved our concepts and algorithms on a dataset of 160 million events from a network segment of a big multinational company and suggest that our approach and methods are highly relevant for modern SIEM systems. KW - Intrusion detection KW - SAP HANA KW - In-memory KW - Security KW - Machine learning KW - Anomaly detection KW - Outlier detection Y1 - 2017 U6 - https://doi.org/10.1016/j.cose.2017.02.001 SN - 0167-4048 SN - 1872-6208 VL - 67 SP - 16 EP - 34 PB - Elsevier Science CY - Oxford ER - TY - JOUR A1 - Jaeger, David A1 - Graupner, Hendrik A1 - Pelchen, Chris A1 - Cheng, Feng A1 - Meinel, Christoph T1 - Fast Automated Processing and Evaluation of Identity Leaks JF - International journal of parallel programming N2 - The relevance of identity data leaks on the Internet is more present than ever. Almost every week we read about leakage of databases with more than a million users in the news. Smaller but not less dangerous leaks happen even multiple times a day. The public availability of such leaked data is a major threat to the victims, but also creates the opportunity to learn not only about security of service providers but also the behavior of users when choosing passwords. Our goal is to analyze this data and generate knowledge that can be used to increase security awareness and security, respectively. This paper presents a novel approach to the processing and analysis of a vast majority of bigger and smaller leaks. We evolved from a semi-manual to a fully automated process that requires a minimum of human interaction. Our contribution is the concept and a prototype implementation of a leak processing workflow that includes the extraction of digital identities from structured and unstructured leak-files, the identification of hash routines and a quality control to ensure leak authenticity. By making use of parallel and distributed programming, we are able to make leaks almost immediately available for analysis and notification after they have been published. Based on the data collected, this paper reveals how easy it is for criminals to collect lots of passwords, which are plain text or only weakly hashed. We publish those results and hope to increase not only security awareness of Internet users but also security on a technical level on the service provider side. KW - Identity leak KW - Data breach KW - Automated parsing KW - Parallel processing Y1 - 2018 U6 - https://doi.org/10.1007/s10766-016-0478-6 SN - 0885-7458 SN - 1573-7640 VL - 46 IS - 2 SP - 441 EP - 470 PB - Springer CY - New York ER -