TY - GEN A1 - Perscheid, Cindy A1 - Uflacker, Matthias T1 - Integrating Biological Context into the Analysis of Gene Expression Data T2 - Distributed Computing and Artificial Intelligence, Special Sessions, 15th International Conference N2 - High-throughput RNA sequencing produces large gene expression datasets whose analysis leads to a better understanding of diseases like cancer. The nature of RNA-Seq data poses challenges to its analysis in terms of its high dimensionality, noise, and complexity of the underlying biological processes. Researchers apply traditional machine learning approaches, e. g. hierarchical clustering, to analyze this data. Until it comes to validation of the results, the analysis is based on the provided data only and completely misses the biological context. However, gene expression data follows particular patterns - the underlying biological processes. In our research, we aim to integrate the available biological knowledge earlier in the analysis process. We want to adapt state-of-the-art data mining algorithms to consider the biological context in their computations and deliver meaningful results for researchers. KW - Gene expression KW - Machine learning KW - Feature selection KW - Association rule mining KW - Biclustering KW - Knowledge bases Y1 - 2019 SN - 978-3-319-99608-0 SN - 978-3-319-99607-3 U6 - https://doi.org/10.1007/978-3-319-99608-0_41 SN - 2194-5357 SN - 2194-5365 VL - 801 SP - 339 EP - 343 PB - Springer CY - Cham ER - TY - JOUR A1 - Sapegin, Andrey A1 - Jaeger, David A1 - Cheng, Feng A1 - Meinel, Christoph T1 - Towards a system for complex analysis of security events in large-scale networks JF - Computers & security : the international journal devoted to the study of the technical and managerial aspects of computer security N2 - After almost two decades of development, modern Security Information and Event Management (SIEM) systems still face issues with normalisation of heterogeneous data sources, high number of false positive alerts and long analysis times, especially in large-scale networks with high volumes of security events. In this paper, we present our own prototype of SIEM system, which is capable of dealing with these issues. For efficient data processing, our system employs in-memory data storage (SAP HANA) and our own technologies from the previous work, such as the Object Log Format (OLF) and high-speed event normalisation. We analyse normalised data using a combination of three different approaches for security analysis: misuse detection, query-based analytics, and anomaly detection. Compared to the previous work, we have significantly improved our unsupervised anomaly detection algorithms. Most importantly, we have developed a novel hybrid outlier detection algorithm that returns ranked clusters of anomalies. It lets an operator of a SIEM system to concentrate on the several top-ranked anomalies, instead of digging through an unsorted bundle of suspicious events. We propose to use anomaly detection in a combination with signatures and queries, applied on the same data, rather than as a full replacement for misuse detection. In this case, the majority of attacks will be captured with misuse detection, whereas anomaly detection will highlight previously unknown behaviour or attacks. We also propose that only the most suspicious event clusters need to be checked by an operator, whereas other anomalies, including false positive alerts, do not need to be explicitly checked if they have a lower ranking. We have proved our concepts and algorithms on a dataset of 160 million events from a network segment of a big multinational company and suggest that our approach and methods are highly relevant for modern SIEM systems. KW - Intrusion detection KW - SAP HANA KW - In-memory KW - Security KW - Machine learning KW - Anomaly detection KW - Outlier detection Y1 - 2017 U6 - https://doi.org/10.1016/j.cose.2017.02.001 SN - 0167-4048 SN - 1872-6208 VL - 67 SP - 16 EP - 34 PB - Elsevier Science CY - Oxford ER -