TY  - JOUR
A1  - Sapegin, Andrey
A1  - Jaeger, David
A1  - Cheng, Feng
A1  - Meinel, Christoph
T1  - Towards a system for complex analysis of security events in large-scale networks
JF  - Computers & security : the international journal devoted to the study of the technical and managerial aspects of computer security
N2  - After almost two decades of development, modern Security Information and Event Management (SIEM) systems still face issues with normalisation of heterogeneous data sources, high number of false positive alerts and long analysis times, especially in large-scale networks with high volumes of security events. In this paper, we present our own prototype of SIEM system, which is capable of dealing with these issues. For efficient data processing, our system employs in-memory data storage (SAP HANA) and our own technologies from the previous work, such as the Object Log Format (OLF) and high-speed event normalisation. We analyse normalised data using a combination of three different approaches for security analysis: misuse detection, query-based analytics, and anomaly detection. Compared to the previous work, we have significantly improved our unsupervised anomaly detection algorithms. Most importantly, we have developed a novel hybrid outlier detection algorithm that returns ranked clusters of anomalies. It lets an operator of a SIEM system to concentrate on the several top-ranked anomalies, instead of digging through an unsorted bundle of suspicious events. We propose to use anomaly detection in a combination with signatures and queries, applied on the same data, rather than as a full replacement for misuse detection. In this case, the majority of attacks will be captured with misuse detection, whereas anomaly detection will highlight previously unknown behaviour or attacks. We also propose that only the most suspicious event clusters need to be checked by an operator, whereas other anomalies, including false positive alerts, do not need to be explicitly checked if they have a lower ranking. We have proved our concepts and algorithms on a dataset of 160 million events from a network segment of a big multinational company and suggest that our approach and methods are highly relevant for modern SIEM systems.
KW  - Intrusion detection
KW  - SAP HANA
KW  - In-memory
KW  - Security
KW  - Machine learning
KW  - Anomaly detection
KW  - Outlier detection
Y1  - 2017
U6  - https://doi.org/10.1016/j.cose.2017.02.001
SN  - 0167-4048
SN  - 1872-6208
VL  - 67
SP  - 16
EP  - 34
PB  - Elsevier Science
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Jaeger, David
A1  - Graupner, Hendrik
A1  - Pelchen, Chris
A1  - Cheng, Feng
A1  - Meinel, Christoph
T1  - Fast Automated Processing and Evaluation of Identity Leaks
JF  - International journal of parallel programming
N2  - The relevance of identity data leaks on the Internet is more present than ever. Almost every week we read about leakage of databases with more than a million users in the news. Smaller but not less dangerous leaks happen even multiple times a day. The public availability of such leaked data is a major threat to the victims, but also creates the opportunity to learn not only about security of service providers but also the behavior of users when choosing passwords. Our goal is to analyze this data and generate knowledge that can be used to increase security awareness and security, respectively. This paper presents a novel approach to the processing and analysis of a vast majority of bigger and smaller leaks. We evolved from a semi-manual to a fully automated process that requires a minimum of human interaction. Our contribution is the concept and a prototype implementation of a leak processing workflow that includes the extraction of digital identities from structured and unstructured leak-files, the identification of hash routines and a quality control to ensure leak authenticity. By making use of parallel and distributed programming, we are able to make leaks almost immediately available for analysis and notification after they have been published. Based on the data collected, this paper reveals how easy it is for criminals to collect lots of passwords, which are plain text or only weakly hashed. We publish those results and hope to increase not only security awareness of Internet users but also security on a technical level on the service provider side.
KW  - Identity leak
KW  - Data breach
KW  - Automated parsing
KW  - Parallel processing
Y1  - 2018
U6  - https://doi.org/10.1007/s10766-016-0478-6
SN  - 0885-7458
SN  - 1573-7640
VL  - 46
IS  - 2
SP  - 441
EP  - 470
PB  - Springer
CY  - New York
ER  -