TY - THES A1 - Rohloff, Tobias T1 - Learning analytics at scale BT - supporting learning and teaching in MOOCs with data-driven insights N2 - Digital technologies are paving the way for innovative educational approaches. The learning format of Massive Open Online Courses (MOOCs) provides a highly accessible path to lifelong learning while being more affordable and flexible than face-to-face courses. Thereby, thousands of learners can enroll in courses mostly without admission restrictions, but this also raises challenges. Individual supervision by teachers is barely feasible, and learning persistence and success depend on students' self-regulatory skills. Here, technology provides the means for support. The use of data for decision-making is already transforming many fields, whereas in education, it is still a young research discipline. Learning Analytics (LA) is defined as the measurement, collection, analysis, and reporting of data about learners and their learning contexts with the purpose of understanding and improving learning and learning environments. The vast amount of data that MOOCs produce on the learning behavior and success of thousands of students provides the opportunity to study human learning and develop approaches addressing the demands of learners and teachers. The overall purpose of this dissertation is to investigate the implementation of LA at the scale of MOOCs and to explore how data-driven technology can support learning and teaching in this context. To this end, several research prototypes have been iteratively developed for the HPI MOOC Platform. Hence, they were tested and evaluated in an authentic real-world learning environment. Most of the results can be applied on a conceptual level to other MOOC platforms as well. The research contribution of this thesis thus provides practical insights beyond what is theoretically possible. In total, four system components were developed and extended: (1) The Learning Analytics Architecture: A technical infrastructure to collect, process, and analyze event-driven learning data based on schema-agnostic pipelining in a service-oriented MOOC platform. (2) The Learning Analytics Dashboard for Learners: A tool for data-driven support of self-regulated learning, in particular to enable learners to evaluate and plan their learning activities, progress, and success by themselves. (3) Personalized Learning Objectives: A set of features to better connect learners' success to their personal intentions based on selected learning objectives to offer guidance and align the provided data-driven insights about their learning progress. (4) The Learning Analytics Dashboard for Teachers: A tool supporting teachers with data-driven insights to enable the monitoring of their courses with thousands of learners, identify potential issues, and take informed action. For all aspects examined in this dissertation, related research is presented, development processes and implementation concepts are explained, and evaluations are conducted in case studies. Among other findings, the usage of the learner dashboard in combination with personalized learning objectives demonstrated improved certification rates of 11.62% to 12.63%. Furthermore, it was observed that the teacher dashboard is a key tool and an integral part for teaching in MOOCs. In addition to the results and contributions, general limitations of the work are discussed—which altogether provide a solid foundation for practical implications and future research. N2 - Digitale Technologien sind Wegbereiter für innovative Bildungsansätze. Das Lernformat der Massive Open Online Courses (MOOCs) bietet einen einfachen und globalen Zugang zu lebenslangem Lernen und ist oft kostengünstiger und flexibler als klassische Präsenzlehre. Dabei können sich Tausende von Lernenden meist ohne Zulassungsbeschränkung in Kurse einschreiben, wodurch jedoch auch Herausforderungen entstehen. Eine individuelle Betreuung durch Lehrende ist kaum möglich und das Durchhaltevermögen und der Lernerfolg hängen von selbstregulatorischen Fähigkeiten der Lernenden ab. Hier bietet Technologie die Möglichkeit zur Unterstützung. Die Nutzung von Daten zur Entscheidungsfindung transformiert bereits viele Bereiche, aber im Bildungswesen ist dies noch eine junge Forschungsdisziplin. Als Learning Analytics (LA) wird das Messen, Erfassen, Analysieren und Auswerten von Daten über Lernende und ihren Lernkontext verstanden, mit dem Ziel, das Lernen und die Lernumgebungen zu verstehen und zu verbessern. Die riesige Menge an Daten, die MOOCs über das Lernverhalten und den Lernerfolg produzieren, bietet die Möglichkeit, das menschliche Lernen zu studieren und Ansätze zu entwickeln, die den Anforderungen von Lernenden und Lehrenden gerecht werden. Der Schwerpunkt dieser Dissertation liegt auf der Implementierung von LA für die Größenordnung von MOOCs und erforscht dabei, wie datengetriebene Technologie das Lernen und Lehren in diesem Kontext unterstützen kann. Zu diesem Zweck wurden mehrere Forschungsprototypen iterativ für die HPI-MOOC-Plattform entwickelt. Daher wurden diese in einer authentischen und realen Lernumgebung getestet und evaluiert. Die meisten Ergebnisse lassen sich auf konzeptioneller Ebene auch auf andere MOOC-Plattformen übertragen, wodurch der Forschungsbeitrag dieser Arbeit praktische Erkenntnisse über das theoretisch Mögliche hinaus liefert. Insgesamt wurden vier Systemkomponenten entwickelt und erweitert: (1) Die LA-Architektur: Eine technische Infrastruktur zum Sammeln, Verarbeiten und Analysieren von ereignisgesteuerten Lerndaten basierend auf einem schemaagnostischem Pipelining in einer serviceorientierten MOOC-Plattform. (2) Das LA-Dashboard für Lernende: Ein Werkzeug zur datengesteuerten Unterstützung der Selbstregulierung, insbesondere um Lernende in die Lage zu versetzen, ihre Lernaktivitäten, ihren Fortschritt und ihren Lernerfolg selbst zu evaluieren und zu planen. (3) Personalisierte Lernziele: Eine Reihe von Funktionen, um den Lernerfolg besser mit persönlichen Absichten zu verknüpfen, die auf ausgewählten Lernzielen basieren, um Leitlinien anzubieten und die bereitgestellten datengetriebenen Einblicke über den Lernfortschritt darauf abzustimmen. (4) Das LA-Dashboard für Lehrende: Ein Hilfsmittel, das Lehrkräfte mit datengetriebenen Erkenntnissen unterstützt, um ihre Kurse mit Tausenden von Lernenden zu überblicken, mögliche Probleme zu erkennen und fundierte Maßnahmen zu ergreifen. Für alle untersuchten Aspekte dieser Dissertation werden verwandte Forschungsarbeiten vorgestellt, Entwicklungsprozesse und Implementierungskonzepte erläutert und Evaluierungen in Fallstudien durchgeführt. Unter anderem konnte durch den Einsatz des Dashboards für Lernende in Kombination mit personalisierten Lernzielen verbesserte Zertifizierungsraten von 11,62% bis 12,63% nachgewiesen werden. Außerdem wurde beobachtet, dass das Dashboard für Lehrende ein entscheidendes Werkzeug und ein integraler Bestandteil für die Lehre in MOOCs ist. Neben den Ergebnissen und Beiträgen werden generelle Einschränkungen der Arbeit diskutiert, die insgesamt eine fundierte Grundlage für praktische Implikationen und zukünftige Forschungsvorhaben schaffen. KW - Learning Analytics KW - MOOCs KW - Self-Regulated Learning KW - E-Learning KW - Service-Oriented Architecture KW - Online Learning Environments Y1 - 2021 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-526235 ER - TY - JOUR A1 - Quinzan, Francesco A1 - Göbel, Andreas A1 - Wagner, Markus A1 - Friedrich, Tobias T1 - Evolutionary algorithms and submodular functions BT - benefits of heavy-tailed mutations JF - Natural computing : an innovative journal bridging biosciences and computer sciences ; an international journal N2 - A core operator of evolutionary algorithms (EAs) is the mutation. Recently, much attention has been devoted to the study of mutation operators with dynamic and non-uniform mutation rates. Following up on this area of work, we propose a new mutation operator and analyze its performance on the (1 + 1) Evolutionary Algorithm (EA). Our analyses show that this mutation operator competes with pre-existing ones, when used by the (1 + 1) EA on classes of problems for which results on the other mutation operators are available. We show that the (1 + 1) EA using our mutation operator finds a (1/3)-approximation ratio on any non-negative submodular function in polynomial time. We also consider the problem of maximizing a symmetric submodular function under a single matroid constraint and show that the (1 + 1) EA using our operator finds a (1/3)-approximation within polynomial time. This performance matches that of combinatorial local search algorithms specifically designed to solve these problems and outperforms them with constant probability. Finally, we evaluate the performance of the (1 + 1) EA using our operator experimentally by considering two applications: (a) the maximum directed cut problem on real-world graphs of different origins, with up to 6.6 million vertices and 56 million edges and (b) the symmetric mutual information problem using a four month period air pollution data set. In comparison with uniform mutation and a recently proposed dynamic scheme, our operator comes out on top on these instances. KW - Evolutionary algorithms KW - Mutation operators KW - Submodular functions KW - Matroids Y1 - 2021 U6 - https://doi.org/10.1007/s11047-021-09841-7 SN - 1572-9796 VL - 20 IS - 3 SP - 561 EP - 575 PB - Springer Science + Business Media B.V. CY - Dordrecht ER - TY - JOUR A1 - Pfitzner, Bjarne A1 - Steckhan, Nico A1 - Arnrich, Bert T1 - Federated learning in a medical context BT - a systematic literature review JF - ACM transactions on internet technology : TOIT / Association for Computing N2 - Data privacy is a very important issue. Especially in fields like medicine, it is paramount to abide by the existing privacy regulations to preserve patients' anonymity. However, data is required for research and training machine learning models that could help gain insight into complex correlations or personalised treatments that may otherwise stay undiscovered. Those models generally scale with the amount of data available, but the current situation often prohibits building large databases across sites. So it would be beneficial to be able to combine similar or related data from different sites all over the world while still preserving data privacy. Federated learning has been proposed as a solution for this, because it relies on the sharing of machine learning models, instead of the raw data itself. That means private data never leaves the site or device it was collected on. Federated learning is an emerging research area, and many domains have been identified for the application of those methods. This systematic literature review provides an extensive look at the concept of and research into federated learning and its applicability for confidential healthcare datasets. KW - Federated learning Y1 - 2021 U6 - https://doi.org/10.1145/3412357 SN - 1533-5399 SN - 1557-6051 VL - 21 IS - 2 SP - 1 EP - 31 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Perugia, Giulia A1 - Paetzel-Prüsmann, Maike A1 - Alanenpää, Madelene A1 - Castellano, Ginevra T1 - I can see it in your eyes BT - Gaze as an implicit cue of uncanniness and task performance in repeated interactions with robots JF - Frontiers in robotics and AI N2 - Over the past years, extensive research has been dedicated to developing robust platforms and data-driven dialog models to support long-term human-robot interactions. However, little is known about how people's perception of robots and engagement with them develop over time and how these can be accurately assessed through implicit and continuous measurement techniques. In this paper, we explore this by involving participants in three interaction sessions with multiple days of zero exposure in between. Each session consists of a joint task with a robot as well as two short social chats with it before and after the task. We measure participants' gaze patterns with a wearable eye-tracker and gauge their perception of the robot and engagement with it and the joint task using questionnaires. Results disclose that aversion of gaze in a social chat is an indicator of a robot's uncanniness and that the more people gaze at the robot in a joint task, the worse they perform. In contrast with most HRI literature, our results show that gaze toward an object of shared attention, rather than gaze toward a robotic partner, is the most meaningful predictor of engagement in a joint task. Furthermore, the analyses of gaze patterns in repeated interactions disclose that people's mutual gaze in a social chat develops congruently with their perceptions of the robot over time. These are key findings for the HRI community as they entail that gaze behavior can be used as an implicit measure of people's perception of robots in a social chat and of their engagement and task performance in a joint task. KW - perception of robots KW - long-term interaction KW - mutual gaze KW - engagement KW - uncanny valley Y1 - 2021 U6 - https://doi.org/10.3389/frobt.2021.645956 SN - 2296-9144 VL - 8 PB - Frontiers Media CY - Lausanne ER - TY - JOUR A1 - Perscheid, Cindy T1 - Integrative biomarker detection on high-dimensional gene expression data sets BT - a survey on prior knowledge approaches JF - Briefings in bioinformatics N2 - Gene expression data provide the expression levels of tens of thousands of genes from several hundred samples. These data are analyzed to detect biomarkers that can be of prognostic or diagnostic use. Traditionally, biomarker detection for gene expression data is the task of gene selection. The vast number of genes is reduced to a few relevant ones that achieve the best performance for the respective use case. Traditional approaches select genes based on their statistical significance in the data set. This results in issues of robustness, redundancy and true biological relevance of the selected genes. Integrative analyses typically address these shortcomings by integrating multiple data artifacts from the same objects, e.g. gene expression and methylation data. When only gene expression data are available, integrative analyses instead use curated information on biological processes from public knowledge bases. With knowledge bases providing an ever-increasing amount of curated biological knowledge, such prior knowledge approaches become more powerful. This paper provides a thorough overview on the status quo of biomarker detection on gene expression data with prior biological knowledge. We discuss current shortcomings of traditional approaches, review recent external knowledge bases, provide a classification and qualitative comparison of existing prior knowledge approaches and discuss open challenges for this kind of gene selection. KW - gene selection KW - external knowledge bases KW - biomarker detection KW - gene KW - expression KW - prior knowledge Y1 - 2021 U6 - https://doi.org/10.1093/bib/bbaa151 SN - 1467-5463 SN - 1477-4054 VL - 22 IS - 3 PB - Oxford Univ. Press CY - Oxford ER - TY - JOUR A1 - Perscheid, Cindy T1 - Comprior BT - Facilitating the implementation and automated benchmarking of prior knowledge-based feature selection approaches on gene expression data sets JF - BMC Bioinformatics N2 - Background Reproducible benchmarking is important for assessing the effectiveness of novel feature selection approaches applied on gene expression data, especially for prior knowledge approaches that incorporate biological information from online knowledge bases. However, no full-fledged benchmarking system exists that is extensible, provides built-in feature selection approaches, and a comprehensive result assessment encompassing classification performance, robustness, and biological relevance. Moreover, the particular needs of prior knowledge feature selection approaches, i.e. uniform access to knowledge bases, are not addressed. As a consequence, prior knowledge approaches are not evaluated amongst each other, leaving open questions regarding their effectiveness. Results We present the Comprior benchmark tool, which facilitates the rapid development and effortless benchmarking of feature selection approaches, with a special focus on prior knowledge approaches. Comprior is extensible by custom approaches, offers built-in standard feature selection approaches, enables uniform access to multiple knowledge bases, and provides a customizable evaluation infrastructure to compare multiple feature selection approaches regarding their classification performance, robustness, runtime, and biological relevance. Conclusion Comprior allows reproducible benchmarking especially of prior knowledge approaches, which facilitates their applicability and for the first time enables a comprehensive assessment of their effectiveness KW - Feature selection KW - Prior knowledge KW - Gene expression KW - Reproducible benchmarking Y1 - 2021 U6 - https://doi.org/10.1186/s12859-021-04308-z SN - 1471-2105 VL - 22 SP - 1 EP - 15 PB - Springer Nature CY - London ER - TY - JOUR A1 - Pawassar, Christian Matthias A1 - Tiberius, Victor T1 - Virtual reality in health care BT - Bibliometric analysis JF - JMIR Serious Games N2 - Background: Research into the application of virtual reality technology in the health care sector has rapidly increased, resulting in a large body of research that is difficult to keep up with. Objective: We will provide an overview of the annual publication numbers in this field and the most productive and influential countries, journals, and authors, as well as the most used, most co-occurring, and most recent keywords. Methods: Based on a data set of 356 publications and 20,363 citations derived from Web of Science, we conducted a bibliometric analysis using BibExcel, HistCite, and VOSviewer. Results: The strongest growth in publications occurred in 2020, accounting for 29.49% of all publications so far. The most productive countries are the United States, the United Kingdom, and Spain; the most influential countries are the United States, Canada, and the United Kingdom. The most productive journals are the Journal of Medical Internet Research (JMIR), JMIR Serious Games, and the Games for Health Journal; the most influential journals are Patient Education and Counselling, Medical Education, and Quality of Life Research. The most productive authors are Riva, del Piccolo, and Schwebel; the most influential authors are Finset, del Piccolo, and Eide. The most frequently occurring keywords other than “virtual” and “reality” are “training,” “trial,” and “patients.” The most relevant research themes are communication, education, and novel treatments; the most recent research trends are fitness and exergames. Conclusions: The analysis shows that the field has left its infant state and its specialization is advancing, with a clear focus on patient usability. KW - virtual reality KW - healthcare KW - bibliometric analysis KW - literature review KW - citation analysis KW - VR KW - usability KW - review KW - health care Y1 - 2021 U6 - https://doi.org/10.2196/32721 SN - 2291-9279 VL - 9 SP - 1 EP - 19 PB - JMIR Publications CY - Toronto, Kanada ET - 4 ER - TY - THES A1 - Pape, Tobias T1 - Efficient compound values in virtual machines N2 - Compound values are not universally supported in virtual machine (VM)-based programming systems and languages. However, providing data structures with value characteristics can be beneficial. On one hand, programming systems and languages can adequately represent physical quantities with compound values and avoid inconsistencies, for example, in representation of large numbers. On the other hand, just-in-time (JIT) compilers, which are often found in VMs, can rely on the fact that compound values are immutable, which is an important property in optimizing programs. Considering this, compound values have an optimization potential that can be put to use by implementing them in VMs in a way that is efficient in memory usage and execution time. Yet, optimized compound values in VMs face certain challenges: to maintain consistency, it should not be observable by the program whether compound values are represented in an optimized way by a VM; an optimization should take into account, that the usage of compound values can exhibit certain patterns at run-time; and that necessary value-incompatible properties due to implementation restrictions should be reduced. We propose a technique to detect and compress common patterns of compound value usage at run-time to improve memory usage and execution speed. Our approach identifies patterns of frequent compound value references and introduces abbreviated forms for them. Thus, it is possible to store multiple inter-referenced compound values in an inlined memory representation, reducing the overhead of metadata and object references. We extend our approach by a notion of limited mutability, using cells that act as barriers for our approach and provide a location for shared, mutable access with the possibility of type specialization. We devise an extension to our approach that allows us to express automatic unboxing of boxed primitive data types in terms of our initial technique. We show that our approach is versatile enough to express another optimization technique that relies on values, such as Booleans, that are unique throughout a programming system. Furthermore, we demonstrate how to re-use learned usage patterns and optimizations across program runs, thus reducing the performance impact of pattern recognition. We show in a best-case prototype that the implementation of our approach is feasible and can also be applied to general purpose programming systems, namely implementations of the Racket language and Squeak/Smalltalk. In several micro-benchmarks, we found that our approach can effectively reduce memory consumption and improve execution speed. N2 - Zusammengesetzte Werte werden in VM-basierten Programmiersystemen und -sprachen nicht durchgängig unterstützt. Die Bereitstellung von Datenstrukturen mit Wertemerkmalen kann jedoch von Vorteil sein. Einerseits können Programmiersysteme und Sprachen physikalische Größen mit zusammengesetzten Werten, wie beispielsweise bei der Darstellung großer Zahlen, adäquat darstellen und Inkonsistenzen vermeiden. Andererseits können sich Just-in-time-Compiler, die oft in VMs zu finden sind, darauf verlassen, dass zusammengesetzte Werte unveränderlich sind, was eine wichtige Eigenschaft bei der Programmoptimierung ist. In Anbetracht dessen haben zusammengesetzte Werte ein Optimierungspotenzial, das genutzt werden kann, indem sie in VMs so implementiert werden, dass sie effizient in Speichernutzung und Ausführungszeit sind. Darüber hinaus stehen optimierte zusammengesetzte Werte in VMs vor bestimmten Herausforderungen: Um die Konsistenz zu erhalten, sollte das Programm nicht beobachten können, ob zusammengesetzte Werte durch eine VM in einer optimierten Weise dargestellt werden; eine Optimierung sollte berücksichtigen, dass die Verwendung von zusammengesetzten Werten bestimmte Muster zur Laufzeit aufweisen kann; und dass wertinkompatible Eigenschaften vermindert werden sollten, die nur aufgrund von Implementierungsbeschränkungen notwendig sind. Wir schlagen eine Verfahrensweise vor, um gängige Muster der Verwendung von zusammengesetzten Werten zur Laufzeit zu erkennen und zu komprimieren, um die Speichernutzung und Ausführungsgeschwindigkeit zu verbessern. Unser Ansatz identifiziert Muster häufiger zusammengesetzter Wertreferenzen und führt für sie abgekürzte Formen ein. Dies ermöglicht es, mehrere miteinander verknüpfte zusammengesetzte Werte in einer eingebetteten Art und Weise im Speicher darzustellen, wodurch der Verwaltungsaufwand, der sich aus Metadaten und Objektreferenzen ergibt, reduziert wird. Wir erweitern unseren Ansatz um ein Konzept der eingeschränkten Veränderbarkeit, indem wir Zellen verwenden, die als Barrieren für unseren Ansatz dienen und einen Platz für einen gemeinsamen, schreibenden Zugriff mit der Möglichkeit der Typspezialisierung bieten. Wir entwickeln eine Erweiterung unseres Ansatzes, die es uns ermöglicht, mithilfe unserer ursprünglichen Technik das automatische Entpacken von primitiven geboxten Datentypen auszudrücken. Wir zeigen, dass unser Ansatz vielseitig genug ist, um auch eine andere Optimierungstechnik auszudrücken, die sich auf einzigartige Werte in einem Programmiersystem, wie beispielsweise Booleans, stützt. Darüber hinaus zeigen wir, wie erlernte Nutzungsmuster und Optimierungen über Programmausführungen hinweg wiederverwendet werden können, wodurch die Auswirkungen der Mustererkennung auf die Leistung reduziert werden. Wir zeigen in einem Best-Case-Prototyp, dass unser Ansatzes umsetzbar ist und auch auf allgemeinere Programmiersysteme wie Racket und Squeak/Smalltalk angewendet werden kann. In mehreren Mikro-Benchmarks haben wir festgestellt, dass unser Ansatz den Speicherverbrauch effektiv reduzieren und die Ausführungsgeschwindigkeit verbessern kann. KW - Compound Values KW - Objects KW - Data Structure Optimization KW - Virtual Machines KW - Smalltalk KW - Verbundwerte KW - Objekte KW - Datenstrukturoptimierung KW - Virtuelle Maschinen KW - Smalltalk Y1 - 2021 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-499134 ER - TY - JOUR A1 - Paetzel-Prüsmann, Maike A1 - Perugia, Giulia A1 - Castellano, Ginevra T1 - The influence of robot personality on the development of uncanny feelings JF - Computers in human behavior N2 - Empirical investigations on the uncanny valley have almost solely focused on the analysis of people?s noninteractive perception of a robot at first sight. Recent studies suggest, however, that these uncanny first impressions may be significantly altered over an interaction. What is yet to discover is whether certain interaction patterns can lead to a faster decline in uncanny feelings. In this paper, we present a study in which participants with limited expertise in Computer Science played a collaborative geography game with a Furhat robot. During the game, Furhat displayed one of two personalities, which corresponded to two different interaction strategies. The robot was either optimistic and encouraging, or impatient and provocative. We performed the study in a science museum and recruited participants among the visitors. Our findings suggest that a robot that is rated high on agreeableness, emotional stability, and conscientiousness can indeed weaken uncanny feelings. This study has important implications for human-robot interaction design as it further highlights that a first impression, merely based on a robot?s appearance, is not indicative of the affinity people might develop towards it throughout an interaction. We thus argue that future work should emphasize investigations on exact interaction patterns that can help to overcome uncanny feelings. KW - Human-robot interaction KW - Uncanny valley KW - Robot personality KW - Human KW - perception of robots KW - Crowd-sourcing KW - Multimodal behavior Y1 - 2021 U6 - https://doi.org/10.1016/j.chb.2021.106756 SN - 0747-5632 SN - 1873-7692 VL - 120 PB - Elsevier CY - Amsterdam ER - TY - JOUR A1 - Oosthoek, Kris A1 - Dörr, Christian T1 - Cyber security threats to bitcoin exchanges BT - adversary exploitation and laundering techniques JF - IEEE transactions on network and service management : a publication of the IEEE N2 - Bitcoin is gaining traction as an alternative store of value. Its market capitalization transcends all other cryptocurrencies in the market. But its high monetary value also makes it an attractive target to cyber criminal actors. Hacking campaigns usually target an ecosystem's weakest points. In Bitcoin, the exchange platforms are one of them. Each exchange breach is a threat not only to direct victims, but to the credibility of Bitcoin's entire ecosystem. Based on an extensive analysis of 36 breaches of Bitcoin exchanges, we show the attack patterns used to exploit Bitcoin exchange platforms using an industry standard for reporting intelligence on cyber security breaches. Based on this we are able to provide an overview of the most common attack vectors, showing that all except three hacks were possible due to relatively lax security. We show that while the security regimen of Bitcoin exchanges is subpar compared to other financial service providers, the use of stolen credentials, which does not require any hacking, is decreasing. We also show that the amount of BTC taken during a breach is decreasing, as well as the exchanges that terminate after being breached. Furthermore we show that overall security posture has improved, but still has major flaws. To discover adversarial methods post-breach, we have analyzed two cases of BTC laundering. Through this analysis we provide insight into how exchange platforms with lax cyber security even further increase the intermediary risk introduced by them into the Bitcoin ecosystem. KW - Bitcoin KW - Computer crime KW - Cryptography KW - Ecosystems KW - Currencies KW - Industries KW - Vocabulary KW - cryptocurrency exchanges KW - cyber KW - security KW - cyber threat intelligence KW - attacks KW - vulnerabilities KW - forensics Y1 - 2021 U6 - https://doi.org/10.1109/TNSM.2020.3046145 SN - 1932-4537 VL - 18 IS - 2 SP - 1616 EP - 1628 PB - IEEE CY - New York ER - TY - JOUR A1 - Omranian, Sara A1 - Angeleska, Angela A1 - Nikoloski, Zoran T1 - PC2P BT - parameter-free network-based prediction of protein complexes JF - Bioinformatics N2 - Motivation: Prediction of protein complexes from protein-protein interaction (PPI) networks is an important problem in systems biology, as they control different cellular functions. The existing solutions employ algorithms for network community detection that identify dense subgraphs in PPI networks. However, gold standards in yeast and human indicate that protein complexes can also induce sparse subgraphs, introducing further challenges in protein complex prediction. Results: To address this issue, we formalize protein complexes as biclique spanned subgraphs, which include both sparse and dense subgraphs. We then cast the problem of protein complex prediction as a network partitioning into biclique spanned subgraphs with removal of minimum number of edges, called coherent partition. Since finding a coherent partition is a computationally intractable problem, we devise a parameter-free greedy approximation algorithm, termed Protein Complexes from Coherent Partition (PC2P), based on key properties of biclique spanned subgraphs. Through comparison with nine contenders, we demonstrate that PC2P: (i) successfully identifies modular structure in networks, as a prerequisite for protein complex prediction, (ii) outperforms the existing solutions with respect to a composite score of five performance measures on 75% and 100% of the analyzed PPI networks and gold standards in yeast and human, respectively, and (iii,iv) does not compromise GO semantic similarity and enrichment score of the predicted protein complexes. Therefore, our study demonstrates that clustering of networks in terms of biclique spanned subgraphs is a promising framework for detection of complexes in PPI networks. Y1 - 2021 U6 - https://doi.org/10.1093/bioinformatics/btaa1089 SN - 1367-4811 VL - 37 IS - 1 SP - 73 EP - 81 PB - Oxford Univ. Press CY - Oxford ER - TY - JOUR A1 - Navarro, Marisa A1 - Orejas, Fernando A1 - Pino, Elvira A1 - Lambers, Leen T1 - A navigational logic for reasoning about graph properties JF - Journal of logical and algebraic methods in programming N2 - Graphs play an important role in many areas of Computer Science. In particular, our work is motivated by model-driven software development and by graph databases. For this reason, it is very important to have the means to express and to reason about the properties that a given graph may satisfy. With this aim, in this paper we present a visual logic that allows us to describe graph properties, including navigational properties, i.e., properties about the paths in a graph. The logic is equipped with a deductive tableau method that we have proved to be sound and complete. KW - Graph logic KW - Algebraic methods KW - Formal modelling KW - Specification Y1 - 2021 U6 - https://doi.org/10.1016/j.jlamp.2020.100616 SN - 2352-2208 SN - 2352-2216 VL - 118 PB - Elsevier Science CY - Amsterdam [u.a.] ER - TY - BOOK A1 - Meinel, Christoph A1 - Döllner, Jürgen Roland Friedrich A1 - Weske, Mathias A1 - Polze, Andreas A1 - Hirschfeld, Robert A1 - Naumann, Felix A1 - Giese, Holger A1 - Baudisch, Patrick A1 - Friedrich, Tobias A1 - Böttinger, Erwin A1 - Lippert, Christoph A1 - Dörr, Christian A1 - Lehmann, Anja A1 - Renard, Bernhard A1 - Rabl, Tilmann A1 - Uebernickel, Falk A1 - Arnrich, Bert A1 - Hölzle, Katharina T1 - Proceedings of the HPI Research School on Service-oriented Systems Engineering 2020 Fall Retreat N2 - Design and Implementation of service-oriented architectures imposes a huge number of research questions from the fields of software engineering, system analysis and modeling, adaptability, and application integration. Component orientation and web services are two approaches for design and realization of complex web-based system. Both approaches allow for dynamic application adaptation as well as integration of enterprise application. Service-Oriented Systems Engineering represents a symbiosis of best practices in object-orientation, component-based development, distributed computing, and business process management. It provides integration of business and IT concerns. The annual Ph.D. Retreat of the Research School provides each member the opportunity to present his/her current state of their research and to give an outline of a prospective Ph.D. thesis. Due to the interdisciplinary structure of the research school, this technical report covers a wide range of topics. These include but are not limited to: Human Computer Interaction and Computer Vision as Service; Service-oriented Geovisualization Systems; Algorithm Engineering for Service-oriented Systems; Modeling and Verification of Self-adaptive Service-oriented Systems; Tools and Methods for Software Engineering in Service-oriented Systems; Security Engineering of Service-based IT Systems; Service-oriented Information Systems; Evolutionary Transition of Enterprise Applications to Service Orientation; Operating System Abstractions for Service-oriented Computing; and Services Specification, Composition, and Enactment. N2 - Der Entwurf und die Realisierung dienstbasierender Architekturen wirft eine Vielzahl von Forschungsfragestellungen aus den Gebieten der Softwaretechnik, der Systemmodellierung und -analyse, sowie der Adaptierbarkeit und Integration von Applikationen auf. Komponentenorientierung und WebServices sind zwei Ansätze für den effizienten Entwurf und die Realisierung komplexer Web-basierender Systeme. Sie ermöglichen die Reaktion auf wechselnde Anforderungen ebenso, wie die Integration großer komplexer Softwaresysteme. "Service-Oriented Systems Engineering" repräsentiert die Symbiose bewährter Praktiken aus den Gebieten der Objektorientierung, der Komponentenprogrammierung, des verteilten Rechnen sowie der Geschäftsprozesse und berücksichtigt auch die Integration von Geschäftsanliegen und Informationstechnologien. Die Klausurtagung des Forschungskollegs "Service-oriented Systems Engineering" findet einmal jährlich statt und bietet allen Kollegiaten die Möglichkeit den Stand ihrer aktuellen Forschung darzulegen. Bedingt durch die Querschnittstruktur des Kollegs deckt dieser Bericht ein weites Spektrum aktueller Forschungsthemen ab. Dazu zählen unter anderem Human Computer Interaction and Computer Vision as Service; Service-oriented Geovisualization Systems; Algorithm Engineering for Service-oriented Systems; Modeling and Verification of Self-adaptive Service-oriented Systems; Tools and Methods for Software Engineering in Service-oriented Systems; Security Engineering of Service-based IT Systems; Service-oriented Information Systems; Evolutionary Transition of Enterprise Applications to Service Orientation; Operating System Abstractions for Service-oriented Computing; sowie Services Specification, Composition, and Enactment. T3 - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 138 KW - Hasso Plattner Institute KW - research school KW - Ph.D. retreat KW - service-oriented systems engineering KW - Hasso-Plattner-Institut KW - Forschungskolleg KW - Klausurtagung KW - Service-oriented Systems Engineering Y1 - 2021 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-504132 SN - 978-3-86956-513-2 SN - 1613-5652 SN - 2191-1665 IS - 138 PB - Universitätsverlag Potsdam CY - Potsdam ER - TY - BOOK A1 - Maximova, Maria A1 - Schneider, Sven A1 - Giese, Holger T1 - Interval probabilistic timed graph transformation systems N2 - The formal modeling and analysis is of crucial importance for software development processes following the model based approach. We present the formalism of Interval Probabilistic Timed Graph Transformation Systems (IPTGTSs) as a high-level modeling language. This language supports structure dynamics (based on graph transformation), timed behavior (based on clocks, guards, resets, and invariants as in Timed Automata (TA)), and interval probabilistic behavior (based on Discrete Interval Probability Distributions). That is, for the probabilistic behavior, the modeler using IPTGTSs does not need to provide precise probabilities, which are often impossible to obtain, but rather provides a probability range instead from which a precise probability is chosen nondeterministically. In fact, this feature on capturing probabilistic behavior distinguishes IPTGTSs from Probabilistic Timed Graph Transformation Systems (PTGTSs) presented earlier. Following earlier work on Interval Probabilistic Timed Automata (IPTA) and PTGTSs, we also provide an analysis tool chain for IPTGTSs based on inter-formalism transformations. In particular, we provide in our tool AutoGraph a translation of IPTGTSs to IPTA and rely on a mapping of IPTA to Probabilistic Timed Automata (PTA) to allow for the usage of the Prism model checker. The tool Prism can then be used to analyze the resulting PTA w.r.t. probabilistic real-time queries asking for worst-case and best-case probabilities to reach a certain set of target states in a given amount of time. N2 - Die formale Modellierung und Analyse ist für Softwareentwicklungsprozesse nach dem modellbasierten Ansatz von entscheidender Bedeutung. Wir präsentieren den Formalismus von Interval Probabilistic Timed Graph Transformation Systems (IPTGTS) als Modellierungssprache auf hoher abstrakter Ebene. Diese Sprache unterstützt Strukturdynamik (basierend auf Graphtransformation), zeitgesteuertes Verhalten (basierend auf Clocks, Guards, Resets und Invarianten wie in Timed Automata (TA)) und intervallwahrscheinliches Verhalten (basierend auf diskreten Intervallwahrscheinlichkeitsverteilungen). Das heißt, für das probabilistische Verhalten muss der Modellierer, der IPTGTS verwendet, keine genauen Wahrscheinlichkeiten bereitstellen, die oft nicht zu bestimmen sind, sondern stattdessen einen Wahrscheinlichkeitsbereich bereitstellen, aus dem eine genaue Wahrscheinlichkeit nichtdeterministisch ausgewählt wird. Tatsächlich unterscheidet diese Funktion zur Erfassung des probabilistischen Verhaltens IPTGTS von den zuvor vorgestellten PTGTS (Probabilistic Timed Graph Transformation Systems). Nach früheren Arbeiten zu Intervall Probabilistic Timed Automata (IPTA) und PTGTS bieten wir auch eine Analyse-Toolkette für IPTGTS, die auf Interformalismus-Transformationen basiert. Insbesondere bieten wir in unserem Tool AutoGraph eine Übersetzung von IPTGTSs in IPTA und stützen uns auf eine Zuordnung von IPTA zu probabilistischen zeitgesteuerten Automaten (PTA), um die Verwendung des Prism-Modellprüfers zu ermöglichen. Das Werkzeug Prism kann dann verwendet werden, um den resultierenden PTA bezüglich probabilistische Echtzeitabfragen (in denen nach Worst-Case- und Best-Case-Wahrscheinlichkeiten gefragt wird, um einen bestimmten Satz von Zielzuständen in einem bestimmten Zeitraum zu erreichen) zu analysieren. T3 - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 134 KW - cyber-physical systems KW - graph transformation systems KW - interval timed automata KW - timed automata KW - qualitative analysis KW - quantitative analysis KW - probabilistic timed systems KW - interval probabilistic timed systems KW - model checking KW - cyber-physikalische Systeme KW - Graphentransformationssysteme KW - Interval Timed Automata KW - Timed Automata KW - qualitative Analyse KW - quantitative Analyse KW - probabilistische zeitgesteuerte Systeme KW - interval probabilistische zeitgesteuerte Systeme KW - Modellprüfung Y1 - 2021 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-512895 SN - 978-3-86956-502-6 SN - 1613-5652 SN - 2191-1665 IS - 134 PB - Universitätsverlag Potsdam CY - Potsdam ER - TY - THES A1 - Makowski, Silvia T1 - Discriminative Models for Biometric Identification using Micro- and Macro-Movements of the Eyes N2 - Human visual perception is an active process. Eye movements either alternate between fixations and saccades or follow a smooth pursuit movement in case of moving targets. Besides these macroscopic gaze patterns, the eyes perform involuntary micro-movements during fixations which are commonly categorized into micro-saccades, drift and tremor. Eye movements are frequently studied in cognitive psychology, because they reflect a complex interplay of perception, attention and oculomotor control. A common insight of psychological research is that macro-movements are highly individual. Inspired by this finding, there has been a considerable amount of prior research on oculomotoric biometric identification. However, the accuracy of known approaches is too low and the time needed for identification is too long for any practical application. This thesis explores discriminative models for the task of biometric identification. Discriminative models optimize a quality measure of the predictions and are usually superior to generative approaches in discriminative tasks. However, using discriminative models requires to select a suitable form of data representation for sequential eye gaze data; i.e., by engineering features or constructing a sequence kernel and the performance of the classification model strongly depends on the data representation. We study two fundamentally different ways of representing eye gaze within a discriminative framework. In the first part of this thesis, we explore the integration of data and psychological background knowledge in the form of generative models to construct representations. To this end, we first develop generative statistical models of gaze behavior during reading and scene viewing that account for viewer-specific distributional properties of gaze patterns. In a second step, we develop a discriminative identification model by deriving Fisher kernel functions from these and several baseline models. We find that an SVM with Fisher kernel is able to reliably identify users based on their eye gaze during reading and scene viewing. However, since the generative models are constrained to use low-frequency macro-movements, they discard a significant amount of information contained in the raw eye tracking signal at a high cost: identification requires about one minute of input recording, which makes it inapplicable for real world biometric systems. In the second part of this thesis, we study a purely data-driven modeling approach. Here, we aim at automatically discovering the individual pattern hidden in the raw eye tracking signal. To this end, we develop a deep convolutional neural network DeepEyedentification that processes yaw and pitch gaze velocities and learns a representation end-to-end. Compared to prior work, this model increases the identification accuracy by one order of magnitude and the time to identification decreases to only seconds. The DeepEyedentificationLive model further improves upon the identification performance by processing binocular input and it also detects presentation-attacks. We find that by learning a representation, the performance of oculomotoric identification and presentation-attack detection can be driven close to practical relevance for biometric applications. Eye tracking devices with high sampling frequency and precision are expensive and the applicability of eye movement as a biometric feature heavily depends on cost of recording devices. In the last part of this thesis, we therefore study the requirements on data quality by evaluating the performance of the DeepEyedentificationLive network under reduced spatial and temporal resolution. We find that the method still attains a high identification accuracy at a temporal resolution of only 250 Hz and a precision of 0.03 degrees. Reducing both does not have an additive deteriorating effect. KW - Machine Learning Y1 - 2021 ER - TY - JOUR A1 - Magkos, Sotirios A1 - Kupsch, Andreas A1 - Bruno, Giovanni T1 - Suppression of cone-beam artefacts with Direct Iterative Reconstruction Computed Tomography Trajectories (DIRECTT) JF - Journal of imaging : open access journal N2 - The reconstruction of cone-beam computed tomography data using filtered back-projection algorithms unavoidably results in severe artefacts. We describe how the Direct Iterative Reconstruction of Computed Tomography Trajectories (DIRECTT) algorithm can be combined with a model of the artefacts for the reconstruction of such data. The implementation of DIRECTT results in reconstructed volumes of superior quality compared to the conventional algorithms. KW - iteration method KW - signal processing KW - X-ray imaging KW - computed tomography Y1 - 2021 U6 - https://doi.org/10.3390/jimaging7080147 SN - 2313-433X VL - 7 IS - 8 PB - MDPI CY - Basel ER - TY - JOUR A1 - Loster, Michael A1 - Koumarelas, Ioannis A1 - Naumann, Felix T1 - Knowledge transfer for entity resolution with siamese neural networks JF - ACM journal of data and information quality N2 - The integration of multiple data sources is a common problem in a large variety of applications. Traditionally, handcrafted similarity measures are used to discover, merge, and integrate multiple representations of the same entity-duplicates-into a large homogeneous collection of data. Often, these similarity measures do not cope well with the heterogeneity of the underlying dataset. In addition, domain experts are needed to manually design and configure such measures, which is both time-consuming and requires extensive domain expertise.
We propose a deep Siamese neural network, capable of learning a similarity measure that is tailored to the characteristics of a particular dataset. With the properties of deep learning methods, we are able to eliminate the manual feature engineering process and thus considerably reduce the effort required for model construction. In addition, we show that it is possible to transfer knowledge acquired during the deduplication of one dataset to another, and thus significantly reduce the amount of data required to train a similarity measure. We evaluated our method on multiple datasets and compare our approach to state-of-the-art deduplication methods. Our approach outperforms competitors by up to +26 percent F-measure, depending on task and dataset. In addition, we show that knowledge transfer is not only feasible, but in our experiments led to an improvement in F-measure of up to +4.7 percent. KW - Entity resolution KW - duplicate detection KW - transfer learning KW - neural KW - networks KW - metric learning KW - similarity learning KW - data quality Y1 - 2021 U6 - https://doi.org/10.1145/3410157 SN - 1936-1955 SN - 1936-1963 VL - 13 IS - 1 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Lambers, Leen A1 - Orejas, Fernando T1 - Transformation rules with nested application conditions BT - critical pairs, initial conflicts & minimality JF - Theoretical computer science N2 - Recently, initial conflicts were introduced in the framework of M-adhesive categories as an important optimization of critical pairs. In particular, they represent a proper subset such that each conflict is represented in a minimal context by a unique initial one. The theory of critical pairs has been extended in the framework of M-adhesive categories to rules with nested application conditions (ACs), restricting the applicability of a rule and generalizing the well-known negative application conditions. A notion of initial conflicts for rules with ACs does not exist yet. In this paper, on the one hand, we extend the theory of initial conflicts in the framework of M-adhesive categories to transformation rules with ACs. They represent a proper subset again of critical pairs for rules with ACs, and represent each conflict in a minimal context uniquely. They are moreover symbolic because we can show that in general no finite and complete set of conflicts for rules with ACs exists. On the other hand, we show that critical pairs are minimally M-complete, whereas initial conflicts are minimally complete. Finally, we introduce important special cases of rules with ACs for which we can obtain finite, minimally (M-)complete sets of conflicts. KW - Graph transformation KW - Critical pairs KW - Initial conflicts KW - Application KW - conditions Y1 - 2021 U6 - https://doi.org/10.1016/j.tcs.2021.07.023 SN - 0304-3975 SN - 1879-2294 VL - 884 SP - 44 EP - 67 PB - Elsevier CY - Amsterdam ER - TY - JOUR A1 - Ladleif, Jan A1 - Weske, Mathias T1 - Which event happened first? BT - Deferred choice on blockchain using oracles JF - Frontiers in blockchain N2 - First come, first served: Critical choices between alternative actions are often made based on events external to an organization, and reacting promptly to their occurrence can be a major advantage over the competition. In Business Process Management (BPM), such deferred choices can be expressed in process models, and they are an important aspect of process engines. Blockchain-based process execution approaches are no exception to this, but are severely limited by the inherent properties of the platform: The isolated environment prevents direct access to external entities and data, and the non-continual runtime based entirely on atomic transactions impedes the monitoring and detection of events. In this paper we provide an in-depth examination of the semantics of deferred choice, and transfer them to environments such as the blockchain. We introduce and compare several oracle architectures able to satisfy certain requirements, and show that they can be implemented using state-of-the-art blockchain technology. KW - business processes KW - business process management KW - deferred choice KW - workflow patterns KW - blockchain KW - smart contracts KW - oracles KW - formal semantics Y1 - 2021 U6 - https://doi.org/10.3389/fbloc.2021.758169 SN - 2624-7852 VL - 4 SP - 1 EP - 16 PB - Frontiers in Blockchain CY - Lausanne, Schweiz ER - TY - GEN A1 - Ladleif, Jan A1 - Weske, Mathias T1 - Which Event Happened First? Deferred Choice on Blockchain Using Oracles T2 - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät N2 - First come, first served: Critical choices between alternative actions are often made based on events external to an organization, and reacting promptly to their occurrence can be a major advantage over the competition. In Business Process Management (BPM), such deferred choices can be expressed in process models, and they are an important aspect of process engines. Blockchain-based process execution approaches are no exception to this, but are severely limited by the inherent properties of the platform: The isolated environment prevents direct access to external entities and data, and the non-continual runtime based entirely on atomic transactions impedes the monitoring and detection of events. In this paper we provide an in-depth examination of the semantics of deferred choice, and transfer them to environments such as the blockchain. We introduce and compare several oracle architectures able to satisfy certain requirements, and show that they can be implemented using state-of-the-art blockchain technology. T3 - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 11 KW - business processes KW - business process management KW - deferred choice KW - workflow patterns KW - blockchain KW - smart contracts KW - oracles KW - formal semantics Y1 - 2022 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-550681 VL - 4 SP - 1 EP - 16 PB - Universitätsverlag Potsdam CY - Potsdam ER -