TY - CHAP A1 - Rojahn, Marcel A1 - Ambros, Maximilian A1 - Biru, Tibebu A1 - Krallmann, Hermann A1 - Gronau, Norbert A1 - Grum, Marcus ED - Rutkowski, Leszek ED - Scherer, Rafał ED - Korytkowski, Marcin ED - Pedrycz, Witold ED - Tadeusiewicz, Ryszard ED - Zurada, Jacek M. T1 - Adequate basis for the data-driven and machine-learning-based identification T2 - Artificial intelligence and soft computing N2 - Process mining (PM) has established itself in recent years as a main method for visualizing and analyzing processes. However, the identification of knowledge has not been addressed adequately because PM aims solely at data-driven discovering, monitoring, and improving real-world processes from event logs available in various information systems. The following paper, therefore, outlines a novel systematic analysis view on tools for data-driven and machine learning (ML)-based identification of knowledge-intensive target processes. To support the effectiveness of the identification process, the main contributions of this study are (1) to design a procedure for a systematic review and analysis for the selection of relevant dimensions, (2) to identify different categories of dimensions as evaluation metrics to select source systems, algorithms, and tools for PM and ML as well as include them in a multi-dimensional grid box model, (3) to select and assess the most relevant dimensions of the model, (4) to identify and assess source systems, algorithms, and tools in order to find evidence for the selected dimensions, and (5) to assess the relevance and applicability of the conceptualization and design procedure for tool selection in data-driven and ML-based process mining research. KW - data mining KW - knowledge engineering KW - various applications Y1 - 2023 SN - 978-3-031-42504-2 SN - 978-3-031-42505-9 U6 - https://doi.org/10.1007/978-3-031-42505-9_48 SP - 570 EP - 588 PB - Springer CY - Cham ER - TY - CHAP A1 - Abramova, Olga A1 - Batzel, Katharina A1 - Modesti, Daniela T1 - Coping and regulatory responses on social media during health crisis BT - a large-scale analysis T2 - Proceedings of the 55th Hawaii International Conference on System Sciences N2 - During a crisis event, social media enables two-way communication and many-to-many information broadcasting, browsing others’ posts, publishing own content, and public commenting. These records can deliver valuable insights to approach problematic situations effectively. Our study explores how social media communication can be analyzed to understand the responses to health crises better. Results based on nearly 800 K tweets indicate that the coping and regulation foci framework holds good explanatory power, with four clusters salient in public reactions: 1) “Understanding” (problem-promotion); 2) “Action planning” (problem-prevention); 3) “Hope” (emotion-promotion) and 4) “Reassurance” (emotion-prevention). Second, the inter-temporal analysis shows high volatility of topic proportions and a shift from self-centered to community-centered topics during the course of the event. The insights are beneficial for research on crisis management and practicians who are interested in large-scale monitoring of their audience for well-informed decision-making. KW - Digital-Enabled Human-Information Interaction KW - big data KW - data mining KW - health crisis KW - social media Y1 - 2022 SN - 978-0-9981331-5-7 PB - HICSS Conference Office University of Hawaii at Manoa CY - Honolulu ER - TY - THES A1 - Najafi, Pejman T1 - Leveraging data science & engineering for advanced security operations T1 - Der Einsatz von Data Science & Engineering für fortschrittliche Security Operations N2 - The Security Operations Center (SOC) represents a specialized unit responsible for managing security within enterprises. To aid in its responsibilities, the SOC relies heavily on a Security Information and Event Management (SIEM) system that functions as a centralized repository for all security-related data, providing a comprehensive view of the organization's security posture. Due to the ability to offer such insights, SIEMS are considered indispensable tools facilitating SOC functions, such as monitoring, threat detection, and incident response. Despite advancements in big data architectures and analytics, most SIEMs fall short of keeping pace. Architecturally, they function merely as log search engines, lacking the support for distributed large-scale analytics. Analytically, they rely on rule-based correlation, neglecting the adoption of more advanced data science and machine learning techniques. This thesis first proposes a blueprint for next-generation SIEM systems that emphasize distributed processing and multi-layered storage to enable data mining at a big data scale. Next, with the architectural support, it introduces two data mining approaches for advanced threat detection as part of SOC operations. First, a novel graph mining technique that formulates threat detection within the SIEM system as a large-scale graph mining and inference problem, built on the principles of guilt-by-association and exempt-by-reputation. The approach entails the construction of a Heterogeneous Information Network (HIN) that models shared characteristics and associations among entities extracted from SIEM-related events/logs. Thereon, a novel graph-based inference algorithm is used to infer a node's maliciousness score based on its associations with other entities in the HIN. Second, an innovative outlier detection technique that imitates a SOC analyst's reasoning process to find anomalies/outliers. The approach emphasizes explainability and simplicity, achieved by combining the output of simple context-aware univariate submodels that calculate an outlier score for each entry. Both approaches were tested in academic and real-world settings, demonstrating high performance when compared to other algorithms as well as practicality alongside a large enterprise's SIEM system. This thesis establishes the foundation for next-generation SIEM systems that can enhance today's SOCs and facilitate the transition from human-centric to data-driven security operations. N2 - In einem Security Operations Center (SOC) werden alle sicherheitsrelevanten Prozesse, Daten und Personen einer Organisation zusammengefasst. Das Herzstück des SOCs ist ein Security Information and Event Management (SIEM)-System, welches als zentraler Speicher aller sicherheitsrelevanten Daten fungiert und einen Überblick über die Sicherheitslage einer Organisation geben kann. SIEM-Systeme sind unverzichtbare Werkzeuge für viele SOC-Funktionen wie Monitoring, Threat Detection und Incident Response. Trotz der Fortschritte bei Big-Data-Architekturen und -Analysen können die meisten SIEMs nicht mithalten. Sie fungieren nur als Protokollsuchmaschine und unterstützen keine verteilte Data Mining und Machine Learning. In dieser Arbeit wird zunächst eine Blaupause für die nächste Generation von SIEM-Systemen vorgestellt, welche Daten verteilt, verarbeitet und in mehreren Schichten speichert, damit auch Data Mining im großen Stil zu ermöglichen. Zudem werden zwei Data Mining-Ansätze vorgeschlagen, mit denen auch anspruchsvolle Bedrohungen erkannt werden können. Der erste Ansatz ist eine neue Graph-Mining-Technik, bei der SIEM-Daten als Graph strukturiert werden und Reputationsinferenz mithilfe der Prinzipien guiltby-association (Kontaktschuld) und exempt-by-reputation (Reputationsbefreiung) implementiert wird. Der Ansatz nutzt ein heterogenes Informationsnetzwerk (HIN), welches gemeinsame Eigenschaften und Assoziationen zwischen Entitäten aus Event Logs verknüpft. Des Weiteren ermöglicht ein neuer Inferenzalgorithmus die Bestimmung der Schädlichkeit eines Kontos anhand seiner Verbindungen zu anderen Entitäten im HIN. Der zweite Ansatz ist eine innovative Methode zur Erkennung von Ausreißern, die den Entscheidungsprozess eines SOC-Analysten imitiert. Diese Methode ist besonders einfach und interpretierbar, da sie einzelne univariate Teilmodelle kombiniert, die sich jeweils auf eine kontextualisierte Eigenschaft einer Entität beziehen. Beide Ansätze wurden sowohl akademisch als auch in der Praxis getestet und haben im Vergleich mit anderen Methoden auch in großen Unternehmen eine hohe Qualität bewiesen. Diese Arbeit bildet die Grundlage für die nächste Generation von SIEM-Systemen, welche den Übergang von einer personalzentrischen zu einer datenzentrischen Perspektive auf SOCs ermöglichen. KW - cybersecurity KW - endpoint security KW - threat detection KW - intrusion detection KW - apt KW - advanced threats KW - advanced persistent threat KW - zero-day KW - security analytics KW - data-driven KW - data mining KW - data science KW - anomaly detection KW - outlier detection KW - graph mining KW - graph inference KW - machine learning KW - Advanced Persistent Threats KW - fortschrittliche Angriffe KW - Anomalieerkennung KW - APT KW - Cyber-Sicherheit KW - Data-Mining KW - Data-Science KW - datengetrieben KW - Endpunktsicherheit KW - Graphableitung KW - Graph-Mining KW - Einbruchserkennung KW - Machine-Learning KW - Ausreißererkennung KW - Sicherheitsanalyse KW - Bedrohungserkennung KW - 0-day Y1 - 2023 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-612257 ER - TY - JOUR A1 - Hagemann, Linus A1 - Abramova, Olga T1 - Sentiment, we-talk and engagement on social media BT - insights from Twitter data mining on the US presidential elections 2020 JF - Internet research N2 - Purpose Given inconsistent results in prior studies, this paper applies the dual process theory to investigate what social media messages yield audience engagement during a political event. It tests how affective cues (emotional valence, intensity and collective self-representation) and cognitive cues (insight, causation, certainty and discrepancy) contribute to public engagement. Design/methodology/approach The authors created a dataset of more than three million tweets during the 2020 United States (US) presidential elections. Affective and cognitive cues were assessed via sentiment analysis. The hypotheses were tested in negative binomial regressions. The authors also scrutinized a subsample of far-famed Twitter users. The final dataset, scraping code, preprocessing and analysis are available in an open repository. Findings The authors found the prominence of both affective and cognitive cues. For the overall sample, negativity bias was registered, and the tweet’s emotionality was negatively related to engagement. In contrast, in the sub-sample of tweets from famous users, emotionally charged content produced higher engagement. The role of sentiment decreases when the number of followers grows and ultimately becomes insignificant for Twitter participants with many followers. Collective self-representation (“we-talk”) is consistently associated with more likes, comments and retweets in the overall sample and subsamples. Originality/value The authors expand the dominating one-sided perspective to social media message processing focused on the peripheral route and hence affective cues. Leaning on the dual process theory, the authors shed light on the effectiveness of both affective (peripheral route) and cognitive (central route) cues on information appeal and dissemination on Twitter during a political event. The popularity of the tweet’s author moderates these relationships. KW - social media KW - engagement KW - data mining KW - big data Y1 - 2023 U6 - https://doi.org/10.1108/INTR-12-2021-0885 SN - 1066-2243 VL - 33 IS - 6 SP - 2058 EP - 2085 PB - Emeral CY - Bingley ER -