TY  - JOUR
A1  - Ullrich, André
A1  - Teichmann, Malte
A1  - Gronau, Norbert
T1  - Fast trainable capabilities in software engineering-skill development in learning factories
JF  - Ji suan ji jiao yu = Computer Education / Qing hua da xue
N2  - The increasing demand for software engineers cannot completely be fulfilled by university education and conventional training approaches due to limited capacities. Accordingly, an alternative approach is necessary where potential software engineers are being educated in software engineering skills using new methods. We suggest micro tasks combined with theoretical lessons to overcome existing skill deficits and acquire fast trainable capabilities. This paper addresses the gap between demand and supply of software engineers by introducing an actionoriented and scenario-based didactical approach, which enables non-computer scientists to code. Therein, the learning content is provided in small tasks and embedded in learning factory scenarios. Therefore, different requirements for software engineers from the market side and from an academic viewpoint are analyzed and synthesized into an integrated, yet condensed skills catalogue. This enables the development of training and education units that focus on the most important skills demanded on the market. To achieve this objective, individual learning scenarios are developed. Of course, proper basic skills in coding cannot be learned over night but software programming is also no sorcery.
KW  - learning factory
KW  - programming skills
KW  - software engineering
KW  - training
Y1  - 2021
U6  - https://doi.org/10.16512/j.cnki.jsjjy.2020.12.002
SN  - 1672-5913
IS  - 12
SP  - 2
EP  - 10
PB  - [Verlag nicht ermittelbar]
CY  - Bei jing shi
ER  - 
TY  - JOUR
A1  - Marx, Susanne
A1  - Freundlich, Heidi
A1  - Klotz, Michael
A1  - Kylänen, Mika
A1  - Niedoszytko, Grazyna
A1  - Swacha, Jakub
A1  - Vollerthum, Anne
T1  - Towards an Online Learning Community on Digitalization in Tourism
JF  - EMOOCs 2021
N2  - Information technology and digital solutions as enablers in the tourism sector require continuous development of skills, as digital transformation is characterized by fast change, complexity and uncertainty. This research investigates how a cMOOC concept could support the tourism industry. A consortium of three universities, a tourism association, and a tourist attraction investigates online learning needs and habits of tourism industry stakeholders in the field of digitalization in a cross-border study in the Baltic Sea region. The multi-national survey (n = 244) reveals a high interest in participating in an online learning community, with two-thirds of respondents seeing opportunities to contributing to such community apart from consuming knowledge. The paper demonstrates preferred ways of learning, motivational and hampering aspects as well as types of possible contributions.
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-515986
SN  - 978-3-86956-512-5
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - CHAP
A1  - Abramova, Olga
A1  - Gundlach, Jana
A1  - Bilda, Juliane
T1  - Understanding the role of newsfeed clutter in stereotype activation
BT  - the case of Facebook
T2  - PACIS 2021 proceedings
N2  - Despite the phenomenal growth of Big Data Analytics in the last few years, little research is done to explicate the relationship between Big Data Analytics Capability (BDAC) and indirect strategic value derived from such digital capabilities. We attempt to address this gap by proposing a conceptual model of the BDAC - Innovation relationship using dynamic capability theory. The work expands on BDAC business value research and extends the nominal research done on BDAC – innovation. We focus on BDAC's relationship with different innovation objects, namely product, business process, and business model innovation, impacting all value chain activities. The insights gained will stimulate academic and practitioner interest in explicating strategic value generated from BDAC and serve as a framework for future research on the subject
Y1  - 2021
UR  - https://aisel.aisnet.org/pacis2021/79
SN  - 978-1-7336325-7-7
IS  - 473
PB  - AIS Electronic Library (AISeL)
CY  - [Erscheinungsort nicht ermittelbar]
ER  - 
TY  - JOUR
A1  - Cseh, Ágnes
A1  - Juhos, Attila
T1  - Pairwise preferences in the stable marriage problem
JF  - ACM Transactions on Economics and Computation / Association for Computing Machinery
N2  - We study the classical, two-sided stable marriage problem under pairwise preferences. In the most general setting, agents are allowed to express their preferences as comparisons of any two of their edges, and they also have the right to declare a draw or even withdraw from such a comparison. This freedom is then gradually restricted as we specify six stages of orderedness in the preferences, ending with the classical case of strictly ordered lists. We study all cases occurring when combining the three known notions of stability-weak, strong, and super-stability-under the assumption that each side of the bipartite market obtains one of the six degrees of orderedness. By designing three polynomial algorithms and two NP-completeness proofs, we determine the complexity of all cases not yet known and thus give an exact boundary in terms of preference structure between tractable and intractable cases.
KW  - Stable marriage
KW  - intransitivity
KW  - acyclic preferences
KW  - poset
KW  - weakly
KW  - stable matching
KW  - strongly stable matching
KW  - super stable matching
Y1  - 2021
U6  - https://doi.org/10.1145/3434427
SN  - 2167-8375
SN  - 2167-8383
VL  - 9
IS  - 1
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - JOUR
A1  - Cseh, Ágnes
A1  - Kavitha, Telikepalli
T1  - Popular matchings in complete graphs
JF  - Algorithmica : an international journal in computer science
N2  - Our input is a complete graph G on n vertices where each vertex has a strict ranking of all other vertices in G. The goal is to construct a matching in G that is popular. A matching M is popular if M does not lose a head-to-head election against any matching M ': here each vertex casts a vote for the matching in {M,M '} in which it gets a better assignment. Popular matchings need not exist in the given instance G and the popular matching problem is to decide whether one exists or not. The popular matching problem in G is easy to solve for odd n. Surprisingly, the problem becomes NP-complete for even n, as we show here. This is one of the few graph theoretic problems efficiently solvable when n has one parity and NP-complete when n has the other parity.
KW  - Popular matching
KW  - Complexity
KW  - Stable matching
Y1  - 2021
U6  - https://doi.org/10.1007/s00453-020-00791-7
SN  - 0178-4617
SN  - 1432-0541
VL  - 83
IS  - 5
SP  - 1493
EP  - 1523
PB  - Springer
CY  - New York
ER  - 
TY  - JOUR
A1  - Brede, Nuria
A1  - Botta, Nicola
T1  - On the correctness of monadic backward induction
JF  - Journal of functional programming
N2  - In control theory, to solve a finite-horizon sequential decision problem (SDP) commonly means to find a list of decision rules that result in an optimal expected total reward (or cost) when taking a given number of decision steps. SDPs are routinely solved using Bellman's backward induction. Textbook authors (e.g. Bertsekas or Puterman) typically give more or less formal proofs to show that the backward induction algorithm is correct as solution method for deterministic and stochastic SDPs. Botta, Jansson and Ionescu propose a generic framework for finite horizon, monadic SDPs together with a monadic version of backward induction for solving such SDPs. In monadic SDPs, the monad captures a generic notion of uncertainty, while a generic measure function aggregates rewards. In the present paper, we define a notion of correctness for monadic SDPs and identify three conditions that allow us to prove a correctness result for monadic backward induction that is comparable to textbook correctness proofs for ordinary backward induction. The conditions that we impose are fairly general and can be cast in category-theoretical terms using the notion of Eilenberg-Moore algebra. They hold in familiar settings like those of deterministic or stochastic SDPs, but we also give examples in which they fail. Our results show that backward induction can safely be employed for a broader class of SDPs than usually treated in textbooks. However, they also rule out certain instances that were considered admissible in the context of Botta et al. 's generic framework. Our development is formalised in Idris as an extension of the Botta et al. framework and the sources are available as supplementary material.
Y1  - 2021
U6  - https://doi.org/10.1017/S0956796821000228
SN  - 1469-7653
SN  - 0956-7968
VL  - 31
PB  - Cambridge University Press
CY  - Cambridge
ER  - 
TY  - JOUR
A1  - Benson, Lawrence
A1  - Makait, Hendrik
A1  - Rabl, Tilmann
T1  - Viper
BT  - An Efficient Hybrid PMem-DRAM Key-Value Store
JF  - Proceedings of the VLDB Endowment
N2  - Key-value stores (KVSs) have found wide application in modern software systems. For persistence, their data resides in slow secondary storage, which requires KVSs to employ various techniques to increase their read and write performance from and to the underlying medium. Emerging persistent memory (PMem) technologies offer data persistence at close-to-DRAM speed, making them a promising alternative to classical disk-based storage. However, simply drop-in replacing existing storage with PMem does not yield good results, as block-based access behaves differently in PMem than on disk and ignores PMem's byte addressability, layout, and unique performance characteristics. In this paper, we propose three PMem-specific access patterns and implement them in a hybrid PMem-DRAM KVS called Viper. We employ a DRAM-based hash index and a PMem-aware storage layout to utilize the random-write speed of DRAM and efficient sequential-write performance PMem. Our evaluation shows that Viper significantly outperforms existing KVSs for core KVS operations while providing full data persistence. Moreover, Viper outperforms existing PMem-only, hybrid, and disk-based KVSs by 4-18x for write workloads, while matching or surpassing their get performance.
KW  - memory
Y1  - 2021
U6  - https://doi.org/10.14778/3461535.3461543
SN  - 2150-8097
VL  - 14
IS  - 9
SP  - 1544
EP  - 1556
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - CHAP
A1  - Krause, Hannes-Vincent
A1  - Baumann, Annika
T1  - The devil in disguise
BT  - malicious envy’s impact on harmful interactions between social networking site users
T2  - ICIS 2021: user behaviors, engagement, and consequences
N2  - Envy constitutes a serious issue on Social Networking Sites (SNSs), as this painful emotion can severely diminish individuals' well-being. With prior research mainly focusing on the affective consequences of envy in the SNS context, its behavioral consequences remain puzzling. While negative interactions among SNS users are an alarming issue, it remains unclear to which extent the harmful emotion of malicious envy contributes to these toxic dynamics. This study constitutes a first step in understanding malicious envy’s causal impact on negative interactions within the SNS sphere. Within an online experiment, we experimentally induce malicious envy and measure its immediate impact on users’ negative behavior towards other users. Our findings show that malicious envy seems to be an essential factor fueling negativity among SNS users and further illustrate that this effect is especially pronounced when users are provided an objective factor to mask their envy and justify their norm-violating negative behavior.
Y1  - 2021
UR  - https://aisel.aisnet.org/icis2021/user_behaivors/user_behaivors/21
PB  - AIS Electronic Library (AISeL)
CY  - [Erscheinungsort nicht ermittelbar]
ER  - 
TY  - JOUR
A1  - Xu, Rudan
A1  - Razaghi-Moghadam, Zahra
A1  - Nikoloski, Zoran
T1  - Maximization of non-idle enzymes improves the coverage of the estimated maximal in vivo enzyme catalytic rates in Escherichia coli
JF  - Bioinformatics
N2  - Motivation: 
Constraint-based modeling approaches allow the estimation of maximal in vivo enzyme catalytic rates that can serve as proxies for enzyme turnover numbers. Yet, genome-scale flux profiling remains a challenge in deploying these approaches to catalogue proxies for enzyme catalytic rates across organisms.

Results:
Here, we formulate a constraint-based approach, termed NIDLE-flux, to estimate fluxes at a genome-scale level by using the principle of efficient usage of expressed enzymes. Using proteomics data from Escherichia coli, we show that the fluxes estimated by NIDLE-flux and the existing approaches are in excellent qualitative agreement (Pearson correlation > 0.9). We also find that the maximal in vivo catalytic rates estimated by NIDLE-flux exhibits a Pearson correlation of 0.74 with in vitro enzyme turnover numbers. However, NIDLE-flux results in a 1.4-fold increase in the size of the estimated maximal in vivo catalytic rates in comparison to the contenders. Integration of the maximum in vivo catalytic rates with publically available proteomics and metabolomics data provide a better match to fluxes estimated by NIDLE-flux. Therefore, NIDLE-flux facilitates more effective usage of proteomics data to estimate proxies for kcatomes.
Y1  - 2021
U6  - https://doi.org/10.1093/bioinformatics/btab575
SN  - 1367-4803
SN  - 1460-2059
VL  - 37
IS  - 21
SP  - 3848
EP  - 3855
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Angeleska, Angela
A1  - Omranian, Sara
A1  - Nikoloski, Zoran
T1  - Coherent network partitions
BT  - Characterizations with cographs and prime graphs
JF  - Theoretical computer science : the journal of the EATCS
N2  - We continue to study coherent partitions of graphs whereby the vertex set is partitioned into subsets that induce biclique spanned subgraphs. The problem of identifying the minimum number of edges to obtain biclique spanned connected components (CNP), called the coherence number, is NP-hard even on bipartite graphs. Here, we propose a graph transformation geared towards obtaining an O (log n)-approximation algorithm for the CNP on a bipartite graph with n vertices. The transformation is inspired by a new characterization of biclique spanned subgraphs. In addition, we study coherent partitions on prime graphs, and show that finding coherent partitions reduces to the problem of finding coherent partitions in a prime graph. Therefore, these results provide future directions for approximation algorithms for the coherence number of a given graph.
KW  - Graph partitions
KW  - Network clustering
KW  - Cographs
KW  - Coherent partition
KW  - Prime graphs
Y1  - 2021
U6  - https://doi.org/10.1016/j.tcs.2021.10.002
SN  - 0304-3975
VL  - 894
SP  - 3
EP  - 11
PB  - Elsevier
CY  - Amsterdam [u.a.]
ER  - 
TY  - JOUR
A1  - Steinrötter, Björn
T1  - Das Konzept einer datenaltruistischen Organisation
JF  - Datenschutz und Datensicherheit
N2  - Dass Technologien wie Machine Learning-Anwendungen oder Big bzw. Smart Data- Verfahren unbedingt Daten in ausreichender Menge und Güte benötigen, erscheint inzwischen als Binsenweisheit. Vor diesem Hintergrund hat insbesondere der EU-Gesetzgeber für sich zuletzt ein neues Betätigungsfeld entdeckt, indem er versucht, auf unterschiedlichen Wegen Anreize zum Datenteilen zu schaffen, um Innovation zu kreieren. Hierzu zählt auch eine geradezu wohltönend mit ,,Datenaltruismus‘‘ verschlagwortete Konstellation. Der Beitrag stellt die diesbezüglichen Regulierungserwägungen auf supranationaler Ebene dar und nimmt eine erste Analyse vor.
KW  - coding and information theory
KW  - computer science
KW  - general
KW  - cryptology
KW  - data structures and information theory
Y1  - 2021
U6  - https://doi.org/10.1007/s11623-021-1539-6
SN  - 1862-2607
SN  - 1614-0702
VL  - 45
IS  - 12
SP  - 794
EP  - 798
PB  - Springer
CY  - Berlin
ER  - 
TY  - JOUR
A1  - Schindler, Daniel
A1  - Moldenhawer, Ted
A1  - Stange, Maike
A1  - Lepro, Valentino
A1  - Beta, Carsten
A1  - Holschneider, Matthias
A1  - Huisinga, Wilhelm
T1  - Analysis of protrusion dynamics in amoeboid cell motility by means of regularized contour flows
JF  - PLoS Computational Biology : a new community journal
N2  - Amoeboid cell motility is essential for a wide range of biological processes including wound healing, embryonic morphogenesis, and cancer metastasis. It relies on complex dynamical patterns of cell shape changes that pose long-standing challenges to mathematical modeling and raise a need for automated and reproducible approaches to extract quantitative morphological features from image sequences. Here, we introduce a theoretical framework and a computational method for obtaining smooth representations of the spatiotemporal contour dynamics from stacks of segmented microscopy images. Based on a Gaussian process regression we propose a one-parameter family of regularized contour flows that allows us to continuously track reference points (virtual markers) between successive cell contours. We use this approach to define a coordinate system on the moving cell boundary and to represent different local geometric quantities in this frame of reference. In particular, we introduce the local marker dispersion as a measure to identify localized membrane expansions and provide a fully automated way to extract the properties of such expansions, including their area and growth time. The methods are available as an open-source software package called AmoePy, a Python-based toolbox for analyzing amoeboid cell motility (based on time-lapse microscopy data), including a graphical user interface and detailed documentation. Due to the mathematical rigor of our framework, we envision it to be of use for the development of novel cell motility models. We mainly use experimental data of the social amoeba Dictyostelium discoideum to illustrate and validate our approach. <br /> Author summary Amoeboid motion is a crawling-like cell migration that plays an important key role in multiple biological processes such as wound healing and cancer metastasis. This type of cell motility results from expanding and simultaneously contracting parts of the cell membrane. From fluorescence images, we obtain a sequence of points, representing the cell membrane, for each time step. By using regression analysis on these sequences, we derive smooth representations, so-called contours, of the membrane. Since the number of measurements is discrete and often limited, the question is raised of how to link consecutive contours with each other. In this work, we present a novel mathematical framework in which these links are described by regularized flows allowing a certain degree of concentration or stretching of neighboring reference points on the same contour. This stretching rate, the so-called local dispersion, is used to identify expansions and contractions of the cell membrane providing a fully automated way of extracting properties of these cell shape changes. We applied our methods to time-lapse microscopy data of the social amoeba Dictyostelium discoideum.
Y1  - 2021
U6  - https://doi.org/10.1371/journal.pcbi.1009268
SN  - 1553-734X
SN  - 1553-7358
VL  - 17
IS  - 8
PB  - PLoS
CY  - San Fransisco
ER  - 
TY  - JOUR
A1  - Tavakoli, Hamad
A1  - Alirezazadeh, Pendar
A1  - Hedayatipour, Ava
A1  - Nasib, A. H. Banijamali
A1  - Landwehr, Niels
T1  - Leaf image-based classification of some common bean cultivars using discriminative convolutional neural networks
JF  - Computers and electronics in agriculture : COMPAG online ; an international journal
N2  - In recent years, many efforts have been made to apply image processing techniques for plant leaf identification. However, categorizing leaf images at the cultivar/variety level, because of the very low inter-class variability, is still a challenging task. In this research, we propose an automatic discriminative method based on convolutional neural networks (CNNs) for classifying 12 different cultivars of common beans that belong to three various species. We show that employing advanced loss functions, such as Additive Angular Margin Loss and Large Margin Cosine Loss, instead of the standard softmax loss function for the classification can yield better discrimination between classes and thereby mitigate the problem of low inter-class variability. The method was evaluated by classifying species (level I), cultivars from the same species (level II), and cultivars from different species (level III), based on images from the leaf foreside and backside. The results indicate that the performance of the classification algorithm on the leaf backside image dataset is superior. The maximum mean classification accuracies of 95.86, 91.37 and 86.87% were obtained at the levels I, II and III, respectively. The proposed method outperforms the previous relevant works and provides a reliable approach for plant cultivars identification.
KW  - Bean
KW  - Plant identification
KW  - Digital image analysis
KW  - VGG16
KW  - Loss
KW  - functions
Y1  - 2021
U6  - https://doi.org/10.1016/j.compag.2020.105935
SN  - 0168-1699
SN  - 1872-7107
VL  - 181
PB  - Elsevier
CY  - Amsterdam [u.a.]
ER  - 
TY  - JOUR
A1  - Pfitzner, Bjarne
A1  - Steckhan, Nico
A1  - Arnrich, Bert
T1  - Federated learning in a medical context
BT  - a systematic literature review
JF  - ACM transactions on internet technology : TOIT / Association for Computing
N2  - Data privacy is a very important issue. Especially in fields like medicine, it is paramount to abide by the existing privacy regulations to preserve patients' anonymity. However, data is required for research and training machine learning models that could help gain insight into complex correlations or personalised treatments that may otherwise stay undiscovered. Those models generally scale with the amount of data available, but the current situation often prohibits building large databases across sites. So it would be beneficial to be able to combine similar or related data from different sites all over the world while still preserving data privacy. Federated learning has been proposed as a solution for this, because it relies on the sharing of machine learning models, instead of the raw data itself. That means private data never leaves the site or device it was collected on. Federated learning is an emerging research area, and many domains have been identified for the application of those methods. This systematic literature review provides an extensive look at the concept of and research into federated learning and its applicability for confidential healthcare datasets.
KW  - Federated learning
Y1  - 2021
U6  - https://doi.org/10.1145/3412357
SN  - 1533-5399
SN  - 1557-6051
VL  - 21
IS  - 2
SP  - 1
EP  - 31
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - JOUR
A1  - Bonnet, Philippe
A1  - Dong, Xin Luna
A1  - Naumann, Felix
A1  - Tözün, Pınar
T1  - VLDB 2021
BT  - Designing a hybrid conference
JF  - SIGMOD record
N2  - The 47th International Conference on Very Large Databases (VLDB'21) was held on August 16-20, 2021 as a hybrid conference. It attracted 180 in-person attendees in Copenhagen and 840 remote attendees. In this paper, we describe our key decisions as general chairs and program committee chairs and share the lessons we learned.
Y1  - 2021
U6  - https://doi.org/10.1145/3516431.3516447
SN  - 0163-5808
SN  - 1943-5835
VL  - 50
IS  - 4
SP  - 50
EP  - 53
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - JOUR
A1  - Cabalar, Pedro
A1  - Fandiño, Jorge
A1  - Fariñas del Cerro, Luis
T1  - Splitting epistemic logic programs
JF  - Theory and practice of logic programming / publ. for the Association for Logic Programming
N2  - Epistemic logic programs constitute an extension of the stable model semantics to deal with new constructs called subjective literals. Informally speaking, a subjective literal allows checking whether some objective literal is true in all or some stable models. As it can be imagined, the associated semantics has proved to be non-trivial, since the truth of subjective literals may interfere with the set of stable models it is supposed to query. As a consequence, no clear agreement has been reached and different semantic proposals have been made in the literature. Unfortunately, comparison among these proposals has been limited to a study of their effect on individual examples, rather than identifying general properties to be checked. In this paper, we propose an extension of the well-known splitting property for logic programs to the epistemic case. We formally define when an arbitrary semantics satisfies the epistemic splitting property and examine some of the consequences that can be derived from that, including its relation to conformant planning and to epistemic constraints. Interestingly, we prove (through counterexamples) that most of the existing approaches fail to fulfill the epistemic splitting property, except the original semantics proposed by Gelfond 1991 and a recent proposal by the authors, called Founded Autoepistemic Equilibrium Logic.
KW  - knowledge representation and nonmonotonic reasoning
KW  - logic programming methodology and applications
KW  - theory
Y1  - 2021
U6  - https://doi.org/10.1017/S1471068420000058
SN  - 1471-0684
SN  - 1475-3081
VL  - 21
IS  - 3
SP  - 296
EP  - 316
PB  - Cambridge Univ. Press
CY  - Cambridge [u.a.]
ER  - 
TY  - JOUR
A1  - Göbel, Andreas
A1  - Lagodzinski, Julius Albert Gregor
A1  - Seidel, Karen
T1  - Counting homomorphisms to trees modulo a prime
JF  - ACM transactions on computation theory : TOCT / Association for Computing Machinery
N2  - Many important graph-theoretic notions can be encoded as counting graph homomorphism problems, such as partition functions in statistical physics, in particular independent sets and colourings. In this article, we study the complexity of #(p) HOMSTOH, the problem of counting graph homomorphisms from an input graph to a graph H modulo a prime number p. Dyer and Greenhill proved a dichotomy stating that the tractability of non-modular counting graph homomorphisms depends on the structure of the target graph. Many intractable cases in non-modular counting become tractable in modular counting due to the common phenomenon of cancellation. In subsequent studies on counting modulo 2, however, the influence of the structure of H on the tractability was shown to persist, which yields similar dichotomies. <br /> Our main result states that for every tree H and every prime p the problem #pHOMSTOH is either polynomial time computable or #P-p-complete. This relates to the conjecture of Faben and Jerrum stating that this dichotomy holds for every graph H when counting modulo 2. In contrast to previous results on modular counting, the tractable cases of #pHOMSTOH are essentially the same for all values of the modulo when H is a tree. To prove this result, we study the structural properties of a homomorphism. As an important interim result, our study yields a dichotomy for the problem of counting weighted independent sets in a bipartite graph modulo some prime p. These results are the first suggesting that such dichotomies hold not only for the modulo 2 case but also for the modular counting functions of all primes p.
KW  - Graph homomorphisms
KW  - modular counting
KW  - complexity dichotomy
Y1  - 2021
U6  - https://doi.org/10.1145/3460958
SN  - 1942-3454
SN  - 1942-3462
VL  - 13
IS  - 3
SP  - 1
EP  - 33
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - JOUR
A1  - Nguyen, Dong Hai Phuong
A1  - Georgie, Yasmin Kim
A1  - Kayhan, Ezgi
A1  - Eppe, Manfred
A1  - Hafner, Verena Vanessa
A1  - Wermter, Stefan
T1  - Sensorimotor representation learning for an "active self" in robots
BT  - a model survey
JF  - Künstliche Intelligenz : KI ; Forschung, Entwicklung, Erfahrungen ; Organ des Fachbereichs 1 Künstliche Intelligenz der Gesellschaft für Informatik e.V., GI / Fachbereich 1 der Gesellschaft für Informatik e.V
N2  - Safe human-robot interactions require robots to be able to learn how to behave appropriately in spaces populated by people and thus to cope with the challenges posed by our dynamic and unstructured environment, rather than being provided a rigid set of rules for operations. In humans, these capabilities are thought to be related to our ability to perceive our body in space, sensing the location of our limbs during movement, being aware of other objects and agents, and controlling our body parts to interact with them intentionally. Toward the next generation of robots with bio-inspired capacities, in this paper, we first review the developmental processes of underlying mechanisms of these abilities: The sensory representations of body schema, peripersonal space, and the active self in humans. Second, we provide a survey of robotics models of these sensory representations and robotics models of the self; and we compare these models with the human counterparts. Finally, we analyze what is missing from these robotics models and propose a theoretical computational framework, which aims to allow the emergence of the sense of self in artificial agents by developing sensory representations through self-exploration.
KW  - Developmental robotics
KW  - Body schema
KW  - Peripersonal space
KW  - Agency
KW  - Robot learning
Y1  - 2021
U6  - https://doi.org/10.1007/s13218-021-00703-z
SN  - 0933-1875
SN  - 1610-1987
VL  - 35
IS  - 1
SP  - 9
EP  - 35
PB  - Springer
CY  - Berlin
ER  - 
TY  - JOUR
A1  - Omranian, Sara
A1  - Angeleska, Angela
A1  - Nikoloski, Zoran
T1  - PC2P
BT  - parameter-free network-based prediction of protein complexes
JF  - Bioinformatics
N2  - Motivation: 
Prediction of protein complexes from protein-protein interaction (PPI) networks is an important problem in systems biology, as they control different cellular functions. The existing solutions employ algorithms for network community detection that identify dense subgraphs in PPI networks. However, gold standards in yeast and human indicate that protein complexes can also induce sparse subgraphs, introducing further challenges in protein complex prediction. 

Results: 
To address this issue, we formalize protein complexes as biclique spanned subgraphs, which include both sparse and dense subgraphs. We then cast the problem of protein complex prediction as a network partitioning into biclique spanned subgraphs with removal of minimum number of edges, called coherent partition. Since finding a coherent partition is a computationally intractable problem, we devise a parameter-free greedy approximation algorithm, termed Protein Complexes from Coherent Partition (PC2P), based on key properties of biclique spanned subgraphs. Through comparison with nine contenders, we demonstrate that PC2P: (i) successfully identifies modular structure in networks, as a prerequisite for protein complex prediction, (ii) outperforms the existing solutions with respect to a composite score of five performance measures on 75% and 100% of the analyzed PPI networks and gold standards in yeast and human, respectively, and (iii,iv) does not compromise GO semantic similarity and enrichment score of the predicted protein complexes. Therefore, our study demonstrates that clustering of networks in terms of biclique spanned subgraphs is a promising framework for detection of complexes in PPI networks.
Y1  - 2021
U6  - https://doi.org/10.1093/bioinformatics/btaa1089
SN  - 1367-4811
VL  - 37
IS  - 1
SP  - 73
EP  - 81
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Trautmann, Justin
A1  - Zhou, Lin
A1  - Brahms, Clemens Markus
A1  - Tunca, Can
A1  - Ersoy, Cem
A1  - Granacher, Urs
A1  - Arnrich, Bert
T1  - TRIPOD
BT  - A treadmill walking dataset with IMU, pressure-distribution  and photoelectric data for gait analysis
JF  - Data : open access ʻData in scienceʼ journal
N2  - Inertial measurement units (IMUs) enable easy to operate and low-cost data recording for gait analysis. When combined with treadmill walking, a large number of steps can be collected in a controlled environment without the need of a dedicated gait analysis laboratory. In order to evaluate existing and novel IMU-based gait analysis algorithms for treadmill walking, a reference dataset that includes IMU data as well as reliable ground truth measurements for multiple participants and walking speeds is needed. This article provides a reference dataset consisting of 15 healthy young adults who walked on a treadmill at three different speeds. Data were acquired using seven IMUs placed on the lower body, two different reference systems (Zebris FDMT-HQ and OptoGait), and two RGB cameras. Additionally, in order to validate an existing IMU-based gait analysis algorithm using the dataset, an adaptable modular data analysis pipeline was built. Our results show agreement between the pressure-sensitive Zebris and the photoelectric OptoGait system (r = 0.99), demonstrating the quality of our reference data. As a use case, the performance of an algorithm originally designed for overground walking was tested on treadmill data using the data pipeline. The accuracy of stride length and stride time estimations was comparable to that reported in other studies with overground data, indicating that the algorithm is equally applicable to treadmill data. The Python source code of the data pipeline is publicly available, and the dataset will be provided by the authors upon request, enabling future evaluations of IMU gait analysis algorithms without the need of recording new data.
KW  - inertial measurement unit
KW  - gait analysis algorithm
KW  - OptoGait
KW  - Zebris
KW  - data pipeline
KW  - public dataset
Y1  - 2021
U6  - https://doi.org/10.3390/data6090095
SN  - 2306-5729
VL  - 6
IS  - 9
PB  - MDPI
CY  - Basel
ER  - 
TY  - JOUR
A1  - De Freitas, Jessica K.
A1  - Johnson, Kipp W.
A1  - Golden, Eddye
A1  - Nadkarni, Girish N.
A1  - Dudley, Joel T.
A1  - Böttinger, Erwin
A1  - Glicksberg, Benjamin S.
A1  - Miotto, Riccardo
T1  - Phe2vec
BT  - Automated disease phenotyping based on unsupervised embeddings from electronic health records
JF  - Patterns
N2  - Robust phenotyping of patients from electronic health records (EHRs) at scale is a challenge in clinical informatics. Here, we introduce Phe2vec, an automated framework for disease phenotyping from EHRs based on unsupervised learning and assess its effectiveness against standard rule-based algorithms from Phenotype KnowledgeBase (PheKB). Phe2vec is based on pre-computing embeddings of medical concepts and patients' clinical history. Disease phenotypes are then derived from a seed concept and its neighbors in the embedding space. Patients are linked to a disease if their embedded representation is close to the disease phenotype. Comparing Phe2vec and PheKB cohorts head-to-head using chart review, Phe2vec performed on par or better in nine out of ten diseases. Differently from other approaches, it can scale to any condition and was validated against widely adopted expert-based standards. Phe2vec aims to optimize clinical informatics research by augmenting current frameworks to characterize patients by condition and derive reliable disease cohorts.
Y1  - 2021
U6  - https://doi.org/10.1016/j.patter.2021.100337
SN  - 2666-3899
VL  - 2
IS  - 9
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - JOUR
A1  - Freitas da Cruz, Harry
A1  - Pfahringer, Boris
A1  - Martensen, Tom
A1  - Schneider, Frederic
A1  - Meyer, Alexander
A1  - Böttinger, Erwin
A1  - Schapranow, Matthieu-Patrick
T1  - Using interpretability approaches to update "black-box" clinical prediction models
BT  - an external validation study in nephrology
JF  - Artificial intelligence in medicine : AIM
N2  - Despite advances in machine learning-based clinical prediction models, only few of such models are actually deployed in clinical contexts. Among other reasons, this is due to a lack of validation studies. In this paper, we present and discuss the validation results of a machine learning model for the prediction of acute kidney injury in cardiac surgery patients initially developed on the MIMIC-III dataset when applied to an external cohort of an American research hospital. To help account for the performance differences observed, we utilized interpretability methods based on feature importance, which allowed experts to scrutinize model behavior both at the global and local level, making it possible to gain further insights into why it did not behave as expected on the validation cohort. The knowledge gleaned upon derivation can be potentially useful to assist model update during validation for more generalizable and simpler models. We argue that interpretability methods should be considered by practitioners as a further tool to help explain performance differences and inform model update in validation studies.
KW  - Clinical predictive modeling
KW  - Nephrology
KW  - Validation
KW  - Interpretability
KW  - methods
Y1  - 2021
U6  - https://doi.org/10.1016/j.artmed.2020.101982
SN  - 0933-3657
SN  - 1873-2860
VL  - 111
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - JOUR
A1  - Borchert, Florian
A1  - Mock, Andreas
A1  - Tomczak, Aurelie
A1  - Hügel, Jonas
A1  - Alkarkoukly, Samer
A1  - Knurr, Alexander
A1  - Volckmar, Anna-Lena
A1  - Stenzinger, Albrecht
A1  - Schirmacher, Peter
A1  - Debus, Jürgen
A1  - Jäger, Dirk
A1  - Longerich, Thomas
A1  - Fröhling, Stefan
A1  - Eils, Roland
A1  - Bougatf, Nina
A1  - Sax, Ulrich
A1  - Schapranow, Matthieu-Patrick
T1  - Knowledge bases and software support for variant interpretation in precision oncology
JF  - Briefings in bioinformatics
N2  - Precision oncology is a rapidly evolving interdisciplinary medical specialty. Comprehensive cancer panels are becoming increasingly available at pathology departments worldwide, creating the urgent need for scalable cancer variant annotation and molecularly informed treatment recommendations. A wealth of mainly academia-driven knowledge bases calls for software tools supporting the multi-step diagnostic process. We derive a comprehensive list of knowledge bases relevant for variant interpretation by a review of existing literature followed by a survey among medical experts from university hospitals in Germany. In addition, we review cancer variant interpretation tools, which integrate multiple knowledge bases. We categorize the knowledge bases along the diagnostic process in precision oncology and analyze programmatic access options as well as the integration of knowledge bases into software tools. The most commonly used knowledge bases provide good programmatic access options and have been integrated into a range of software tools. For the wider set of knowledge bases, access options vary across different parts of the diagnostic process. Programmatic access is limited for information regarding clinical classifications of variants and for therapy recommendations. The main issue for databases used for biological classification of pathogenic variants and pathway context information is the lack of standardized interfaces. There is no single cancer variant interpretation tool that integrates all identified knowledge bases. Specialized tools are available and need to be further developed for different steps in the diagnostic process.
KW  - HiGHmed
KW  - personalized medicine
KW  - molecular tumor board
KW  - data integration
KW  - cancer therapy
Y1  - 2021
U6  - https://doi.org/10.1093/bib/bbab134
SN  - 1467-5463
SN  - 1477-4054
VL  - 22
IS  - 6
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - CHAP
A1  - Gundlach, Jana
A1  - Abramova, Olga
T1  - Newsfeed clutter as an inhibitor of sensemaking
T2  - AMCIS Proceedings 2021
N2  - As a central functionality of SNSs, the newsfeed is responsible for the way, how content is presented. This paper investigates the implications of current content presentation on Facebook, which has appeared to be a matter of users’ criticism. Leaning on the communication theory, we conceptualize clutter on a newsfeed as noise that hinders the receiver’s adequate message decoding (i.e., sensemaking). We further operationalize newsfeed clutter via perceived disorder, information overload, and system feature overload. Our participants browsed their Facebook newsfeed for at least 5 minutes. The follow-up survey results provide partial support for our hypotheses, with only perceived disorder significantly associated with lower sensemaking. These findings shed new light on user experience and underpin the importance of SNSs as communication systems, adding to the existent literature on the dark sides of social media.
Y1  - 2021
UR  - https://aisel.aisnet.org/amcis2021/virtual_communities/virtual_communities/3/
SN  - 978-1-7336325-8-4
PB  - AIS
CY  - Atlanta
ER  - 
TY  - CHAP
A1  - Brinkmann, Maik
T1  - Relevance of public administrations
BT  - visualization of shifting power relations in blockchain-based public service delivery
T2  - Proceedings of the 54th Hawaii International Conference on System Sciences 2021
N2  - Power relations within the area of blockchain governance are complex by definition and a comprehensive analysis that links technological and institutional elements is missing to date. The research that is presented with this article focuses on the visualization of the shifting power relations with the introduction of blockchain. For this purpose, the analysis leverages an adjusted version of the multi-stakeholder influence mapping tool. The analysis considers the various stakeholders within the multi-layered blockchain technology stack and compares three fundamental blockchain scenarios, including public and private blockchain settings. The findings show that public administrations face indeed less power with the introduction of blockchain, while new stakeholders come into play who wield influence rather uncontrolled. Nonetheless, public administrations are not powerless overall and remain influential stakeholders. This paper concludes that blockchain governance is not as democratic as blockchain enthusiasts tend to argue and derives corresponding opportunities for further research.
KW  - Emerging Topics in Digital Government
KW  - blockchain
KW  - influence mapping
KW  - power relations
KW  - stakeholder analysis
KW  - visualization
Y1  - 2021
SN  - 978-0-9981331-4-0
U6  - https://doi.org/10.24251/HICSS.2021.285
PB  - University of Hawaiʻi at Mānoa
CY  - Honolulu, HI
ER  - 
TY  - THES
A1  - Hecher, Markus
T1  - Advanced tools and methods for treewidth-based problem solving
N2  - In the last decades, there was a notable progress in solving the well-known Boolean satisfiability (Sat) problem, which can be witnessed by powerful Sat solvers. One of the reasons why these solvers are so fast are structural properties of instances that are utilized by the solver’s interna. This thesis deals with the well-studied structural property treewidth, which measures the closeness of an instance to being a tree. In fact, there are many problems parameterized by treewidth that are solvable in polynomial time in the instance size when parameterized by treewidth.
In this work, we study advanced treewidth-based methods and tools for problems in knowledge representation and reasoning (KR). Thereby, we provide means to establish precise runtime results (upper bounds) for canonical problems relevant to KR. Then, we present a new type of problem reduction, which we call decomposition-guided (DG) that
allows us to precisely monitor the treewidth when reducing from one problem to another problem. This new reduction type will be the basis for a long-open lower bound result for quantified Boolean formulas and allows us to design a new methodology for establishing runtime lower bounds for problems parameterized by treewidth.
Finally, despite these lower bounds, we provide an efficient implementation of algorithms that adhere to treewidth. Our approach finds suitable abstractions of instances, which are subsequently refined in a recursive fashion, and it uses Sat solvers for solving subproblems. It turns out that our resulting solver is quite competitive for two canonical counting problems related to Sat.
N2  - In den letzten Jahrzehnten konnte ein beachtlicher Fortschritt im Bereich der Aussagenlogik verzeichnet werden. Dieser äußerte sich dadurch, dass für das wichtigste Problem in diesem Bereich, genannt „Sat“, welches sich mit der Fragestellung befasst, ob eine gegebene aussagenlogische Formel erfüllbar ist oder nicht, überwältigend schnelle Computerprogramme („Solver“) entwickelt werden konnten. Interessanterweise liefern diese Solver eine beeindruckende Leistung, weil sie oft selbst Probleminstanzen mit mehreren Millionen von Variablen spielend leicht lösen können. Auf der anderen Seite jedoch glaubt man in der Wissenschaft weitgehend an die Exponentialzeithypothese (ETH), welche besagt, dass man im schlimmsten Fall für das Lösen einer Instanz in diesem Bereich exponentielle Laufzeit in der Anzahl der Variablen benötigt. Dieser vermeintliche Widerspruch ist noch immer nicht vollständig geklärt, denn wahrscheinlich gibt es viele ineinandergreifende Gründe für die Schnelligkeit aktueller Sat Solver. Einer dieser Gründe befasst sich weitgehend mit strukturellen Eigenschaften von Probleminstanzen, die wohl indirekt und intern von diesen Solvern ausgenützt werden.

Diese Dissertation beschäftigt sich mit solchen strukturellen Eigenschaften, nämlich mit der sogenannten Baumweite. Die Baumweite ist sehr gut erforscht und versucht zu messen, wie groß der Abstand von Probleminstanzen zu Bäumen ist (Baumnähe). Allerdings ist dieser Parameter sehr generisch und bei Weitem nicht auf Problemstellungen der Aussagenlogik beschränkt. Tatsächlich gibt es viele weitere Probleme, die parametrisiert mit Baumweite in polynomieller Zeit gelöst werden können. Interessanterweise gibt es auch viele Probleme in der Wissensrepräsentation (KR), von denen man davon ausgeht, dass sie härter sind als das Problem Sat, die bei beschränkter Baumweite in polynomieller Zeit gelöst werden können. Ein prominentes Beispiel solcher Probleme ist das Problem QSat, welches sich für die Gültigkeit einer gegebenen quantifizierten, aussagenlogischen Formel (QBF), das sind aussagenlogische Formeln, wo gewisse Variablen existenziell bzw. universell quantifiziert werden können, befasst. Bemerkenswerterweise wird allerdings auch im Zusammenhang mit Baumweite, ähnlich zu Methoden der klassischen Komplexitätstheorie, die tatsächliche Komplexität (Härte) solcher Problemen quantifiziert, wo man die exakte Laufzeitabhängigkeit beim Problemlösen in der Baumweite (Stufe der Exponentialität) beschreibt.

Diese Arbeit befasst sich mit fortgeschrittenen, Baumweite-basierenden Methoden und Werkzeugen für Probleme der Wissensrepräsentation und künstlichen Intelligenz (AI). Dabei präsentieren wir Methoden, um präzise Laufzeitresultate (obere Schranken) für prominente Fragmente der Antwortmengenprogrammierung (ASP), welche ein kanonisches Paradigma zum Lösen von Problemen der Wissensrepräsentation darstellt, zu erhalten. Unsere Resultate basieren auf dem Konzept der dynamischen Programmierung, die angeleitet durch eine sogenannte Baumzerlegung und ähnlich dem Prinzip „Teile-und-herrsche“ funktioniert. Solch eine Baumzerlegung ist eine konkrete, strukturelle Zerlegung einer Probleminstanz, die sich stark an der Baumweite orientiert.

Des Weiteren präsentieren wir einen neuen Typ von Problemreduktion, den wir als „decomposition-guided (DG)“, also „zerlegungsangeleitet“, bezeichnen. Dieser Reduktionstyp erlaubt es, Baumweiteerhöhungen und -verringerungen während einer Problemreduktion von einem bestimmten Problem zu einem anderen Problem präzise zu untersuchen und zu kontrollieren. Zusätzlich ist dieser neue Reduktionstyp die Basis, um ein lange offen gebliebenes Resultat betreffend quantifizierter, aussagenlogischer Formeln zu zeigen. Tatsächlich sind wir damit in der Lage, präzise untere Schranken, unter der Annahme der Exponentialzeithypothese, für das Problem QSat bei beschränkter Baumweite zu zeigen. Genauer gesagt können wir mit diesem Konzept der DG Reduktionen zeigen, dass das Problem QSat, beschränkt auf Quantifizierungsrang ` und parametrisiert mit Baumweite k, im Allgemeinen nicht besser als in einer Laufzeit, die `-fach exponentiell in der Baumweite und polynomiell in der Instanzgröße ist1, lösen. Dieses Resultat hebt auf nicht-inkrementelle Weise ein bekanntes Ergebnis für Quantifizierungsrang 2 auf beliebige Quantifizierungsränge, allerdings impliziert es auch sehr viele weitere Konsequenzen.

Das Resultat über die untere Schranke des Problems QSat erlaubt es, eine neue Methodologie zum Zeigen unterer Schranken einer Vielzahl von Problemen der Wissensrepräsentation und künstlichen Intelligenz, zu etablieren. In weiterer Konsequenz können wir damit auch zeigen, dass die oberen Schranken sowie die DG Reduktionen dieser Arbeit unter der Hypothese ETH „eng“ sind, d.h., sie können wahrscheinlich nicht mehr signifikant verbessert werden. Die Ergebnisse betreffend der unteren Schranken für QSat und die dazugehörige Methodologie konstituieren in gewisser Weise eine Hierarchie von über Baumweite parametrisierte Laufzeitklassen. Diese Laufzeitklassen können verwendet werden, um die Härte von Problemen für das Ausnützen von Baumweite zu quantifizieren und diese entsprechend ihrer Laufzeitabhängigkeit bezüglich Baumweite zu kategorisieren.

Schlussendlich und trotz der genannten Resultate betreffend unterer Schranken sind wir im Stande, eine effiziente Implementierung von Algorithmen basierend auf dynamischer Programmierung, die entlang einer Baumzerlegung angeleitet wird, zur Verfügung zu stellen. Dabei funktioniert unser Ansatz dahingehend, indem er probiert, passende Abstraktionen von Instanzen zu finden, die dann im Endeffekt sukzessive und auf rekursive Art und Weise verfeinert und verbessert werden. Inspiriert durch die enorme Effizienz und Effektivität der Sat Solver, ist unsere Implementierung ein hybrider Ansatz, weil sie den starken Gebrauch von Sat Solvern zum Lösen diverser Subprobleme, die während der dynamischen Programmierung auftreten, pflegt. Dabei stellt sich heraus, dass der resultierende Solver unserer Implementierung im Bezug auf Effizienz beim Lösen von zwei kanonischen, Sat-verwandten Zählproblemen mit bestehenden Solvern locker mithalten kann. Tatsächlich sind wir im Stande, Instanzen, wo die oberen Schranken von Baumweite 260 übersteigen, zu lösen. Diese überraschende Beobachtung zeigt daher, dass Baumweite ein wichtiger Parameter sein könnte, der wohl in modernen Designs von Solvern berücksichtigt werden sollte.
KW  - Treewidth
KW  - Dynamic Programming
KW  - Knowledge Representation and Reasoning
KW  - Artificial Intelligence
KW  - Computational Complexity
KW  - Parameterized Complexity
KW  - Answer Set Programming
KW  - Exponential Time Hypothesis
KW  - Lower Bounds
KW  - Algorithms
KW  - Algorithmen
KW  - Antwortmengenprogrammierung
KW  - Künstliche Intelligenz
KW  - Komplexitätstheorie
KW  - Dynamische Programmierung
KW  - Exponentialzeit Hypothese
KW  - Wissensrepräsentation und Schlussfolgerung
KW  - Untere Schranken
KW  - Parametrisierte Komplexität
KW  - Baumweite
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-512519
ER  - 
TY  - JOUR
A1  - Schneider, Johannes
A1  - Wenig, Phillip
A1  - Papenbrock, Thorsten
T1  - Distributed detection of sequential anomalies in univariate time series
JF  - The VLDB journal : the international journal on very large data bases
N2  - The automated detection of sequential anomalies in time series is an essential task for many applications, such as the monitoring of technical systems, fraud detection in high-frequency trading, or the early detection of disease symptoms. All these applications require the detection to find all sequential anomalies possibly fast on potentially very large time series. In other words, the detection needs to be effective, efficient and scalable w.r.t. the input size. Series2Graph is an effective solution based on graph embeddings that are robust against re-occurring anomalies and can discover sequential anomalies of arbitrary length and works without training data. Yet, Series2Graph is no t scalable due to its single-threaded approach; it cannot, in particular, process arbitrarily large sequences due to the memory constraints of a single machine. In this paper, we propose our distributed anomaly detection system, short DADS, which is an efficient and scalable adaptation of Series2Graph. Based on the actor programming model, DADS distributes the input time sequence, intermediate state and the computation to all processors of a cluster in a way that minimizes communication costs and synchronization barriers. Our evaluation shows that DADS is orders of magnitude faster than S2G, scales almost linearly with the number of processors in the cluster and can process much larger input sequences due to its scale-out property.
KW  - Distributed programming
KW  - Sequential anomaly
KW  - Actor model
KW  - Data mining
KW  - Time series
Y1  - 2021
U6  - https://doi.org/10.1007/s00778-021-00657-6
SN  - 1066-8888
SN  - 0949-877X
VL  - 30
IS  - 4
SP  - 579
EP  - 602
PB  - Springer
CY  - Berlin
ER  - 
TY  - JOUR
A1  - Fandiño, Jorge
A1  - Laferriere, Francois
A1  - Romero, Javier
A1  - Schaub, Torsten
A1  - Son, Tran Cao
T1  - Planning with incomplete information in quantified answer set programming
JF  - Theory and practice of logic programming
N2  - We present a general approach to planning with incomplete information in Answer Set Programming (ASP). More precisely, we consider the problems of conformant and conditional planning with sensing actions and assumptions. We represent planning problems using a simple formalism where logic programs describe the transition function between states, the initial states and the goal states. For solving planning problems, we use Quantified Answer Set Programming (QASP), an extension of ASP with existential and universal quantifiers over atoms that is analogous to Quantified Boolean Formulas (QBFs). We define the language of quantified logic programs and use it to represent the solutions different variants of conformant and conditional planning. On the practical side, we present a translation-based QASP solver that converts quantified logic programs into QBFs and then executes a QBF solver, and we evaluate experimentally the approach on conformant and conditional planning benchmarks.
KW  - answer set programming
KW  - planning
KW  - quantified logics
Y1  - 2021
U6  - https://doi.org/10.1017/S1471068421000259
SN  - 1471-0684
SN  - 1475-3081
VL  - 21
IS  - 5
SP  - 663
EP  - 679
PB  - Cambridge University Press
CY  - Cambridge
ER  - 
TY  - CHAP
A1  - Dehnert, Maik
A1  - Gleiß, Alexander
A1  - Reiss, Frederick
T1  - What makes a data-driven business model?
BT  - a consolidated taxonomy
T2  - ECIS Proceedings 2021
N2  - The usage of data to improve or create business models has become vital for companies in the 21st century. However, to extract value from data it is important to understand the business model. Taxonomies for data-driven business models (DDBM) aim to provide guidance for the development and ideation of new business models relying on data. In IS research, however, different taxonomies have emerged in recent years, partly redundant, partly contradictory. Thus, there is a need to synthesize the common ground of these taxonomies within IS research. Based on 26 IS-related taxonomies and 30 cases, we derive and define 14 generic building blocks of DDBM to develop a consolidated taxonomy that represents the current state-of-the-art. Thus, we integrate existing research on DDBM and provide avenues for further exploration of data-induced potentials for business models as well as for the development and analysis of general or industry-specific DDBM.
Y1  - 2021
UR  - https://aisel.aisnet.org/ecis2021_rp/139
SN  - 978-1-7336325-6-0
PB  - AIS
CY  - Atlanta
ER  - 
TY  - BOOK
A1  - Meinel, Christoph
A1  - Döllner, Jürgen Roland Friedrich
A1  - Weske, Mathias
A1  - Polze, Andreas
A1  - Hirschfeld, Robert
A1  - Naumann, Felix
A1  - Giese, Holger
A1  - Baudisch, Patrick
A1  - Friedrich, Tobias
A1  - Böttinger, Erwin
A1  - Lippert, Christoph
A1  - Dörr, Christian
A1  - Lehmann, Anja
A1  - Renard, Bernhard
A1  - Rabl, Tilmann
A1  - Uebernickel, Falk
A1  - Arnrich, Bert
A1  - Hölzle, Katharina
T1  - Proceedings of the HPI Research School on Service-oriented Systems Engineering 2020 Fall Retreat
N2  - Design and Implementation of service-oriented architectures imposes a huge number of research questions from the fields of software engineering, system analysis and modeling, adaptability, and application integration. Component orientation and web services are two approaches for design and realization of complex web-based system. Both approaches allow for dynamic application adaptation as well as integration of enterprise application.

Service-Oriented Systems Engineering represents a symbiosis of best practices in object-orientation, component-based development, distributed computing, and business process management. It provides integration of business and IT concerns.

The annual Ph.D. Retreat of the Research School provides each member the opportunity to present his/her current state of their research and to give an outline of a prospective Ph.D. thesis. Due to the interdisciplinary structure of the research school, this technical report covers a wide range of topics. These include but are not limited to: Human Computer Interaction and Computer Vision as Service; Service-oriented Geovisualization Systems; Algorithm Engineering for Service-oriented Systems; Modeling and Verification of Self-adaptive Service-oriented Systems; Tools and Methods for Software Engineering in Service-oriented Systems; Security Engineering of Service-based IT Systems; Service-oriented Information Systems; Evolutionary Transition of Enterprise Applications to Service Orientation; Operating System Abstractions for Service-oriented Computing; and Services Specification, Composition, and Enactment.
N2  - Der Entwurf und die Realisierung dienstbasierender Architekturen wirft eine Vielzahl von Forschungsfragestellungen aus den Gebieten der Softwaretechnik, der Systemmodellierung und -analyse, sowie der Adaptierbarkeit und Integration von Applikationen auf. Komponentenorientierung und WebServices sind zwei Ansätze für den effizienten Entwurf und die Realisierung komplexer Web-basierender Systeme. Sie ermöglichen die Reaktion auf wechselnde Anforderungen ebenso, wie die Integration großer komplexer Softwaresysteme.

"Service-Oriented Systems Engineering" repräsentiert die Symbiose bewährter Praktiken aus den Gebieten der Objektorientierung, der Komponentenprogrammierung, des verteilten Rechnen sowie der Geschäftsprozesse und berücksichtigt auch die Integration von Geschäftsanliegen und Informationstechnologien.

Die Klausurtagung des Forschungskollegs "Service-oriented Systems Engineering" findet einmal jährlich statt und bietet allen Kollegiaten die Möglichkeit den Stand ihrer aktuellen Forschung darzulegen. Bedingt durch die Querschnittstruktur des Kollegs deckt dieser Bericht ein weites Spektrum aktueller Forschungsthemen ab. Dazu zählen unter anderem Human Computer Interaction and Computer Vision as Service; Service-oriented Geovisualization Systems; Algorithm Engineering for Service-oriented Systems; Modeling and Verification of Self-adaptive Service-oriented Systems; Tools and Methods for Software Engineering in Service-oriented Systems; Security Engineering of Service-based IT Systems; Service-oriented Information Systems; Evolutionary Transition of Enterprise Applications to Service Orientation; Operating System Abstractions for Service-oriented Computing; sowie Services Specification, Composition, and Enactment.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 138 
KW  - Hasso Plattner Institute
KW  - research school
KW  - Ph.D. retreat
KW  - service-oriented systems engineering
KW  - Hasso-Plattner-Institut
KW  - Forschungskolleg
KW  - Klausurtagung
KW  - Service-oriented Systems Engineering
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-504132
SN  - 978-3-86956-513-2
SN  - 1613-5652
SN  - 2191-1665
IS  - 138
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Kaya, Adem
A1  - Freitag, Melina A.
T1  - Conditioning analysis for discrete Helmholtz problems
JF  - Computers and mathematics with applications : an international journal
N2  - In this paper, we examine conditioning of the discretization of the Helmholtz problem. Although the discrete Helmholtz problem has been studied from different perspectives, to the best of our knowledge, there is no conditioning analysis for it. We aim to fill this gap in the literature. We propose a novel method in 1D to observe the near-zero eigenvalues of a symmetric indefinite matrix. Standard classification of ill-conditioning based on the matrix condition number is not true for the discrete Helmholtz problem. We relate the ill-conditioning of the discretization of the Helmholtz problem with the condition number of the matrix. We carry out analytical conditioning analysis in 1D and extend our observations to 2D with numerical observations. We examine several discretizations. We find different regions in which the condition number of the problem shows different characteristics. We also explain the general behavior of the solutions in these regions.
KW  - Helmholtz problem
KW  - Condition number
KW  - Ill-conditioning
KW  - Indefinite
KW  - matrices
Y1  - 2022
U6  - https://doi.org/10.1016/j.camwa.2022.05.016
SN  - 0898-1221
SN  - 1873-7668
VL  - 118
SP  - 171
EP  - 182
PB  - Elsevier Science
CY  - Amsterdam
ER  - 
TY  - JOUR
A1  - Mattis, Toni
A1  - Beckmann, Tom
A1  - Rein, Patrick
A1  - Hirschfeld, Robert
T1  - First-class concepts
BT  - Reified architectural knowledge beyond dominant decompositions
JF  - Journal of object technology : JOT / ETH Zürich, Department of Computer Science
N2  - Ideally, programs are partitioned into independently maintainable and understandable modules. As a system grows, its architecture gradually loses the capability to accommodate new concepts in a modular way. While refactoring is expensive and not always possible, and the programming language might lack dedicated primary language constructs to express certain cross-cutting concerns, programmers are still able to explain and delineate convoluted concepts through secondary means: code comments, use of whitespace and arrangement of code, documentation, or communicating tacit knowledge. <br /> Secondary constructs are easy to change and provide high flexibility in communicating cross-cutting concerns and other concepts among programmers. However, such secondary constructs usually have no reified representation that can be explored and manipulated as first-class entities through the programming environment. <br /> In this exploratory work, we discuss novel ways to express a wide range of concepts, including cross-cutting concerns, patterns, and lifecycle artifacts independently of the dominant decomposition imposed by an existing architecture. We propose the representation of concepts as first-class objects inside the programming environment that retain the capability to change as easily as code comments. We explore new tools that allow programmers to view, navigate, and change programs based on conceptual perspectives. In a small case study, we demonstrate how such views can be created and how the programming experience changes from draining programmers' attention by stretching it across multiple modules toward focusing it on cohesively presented concepts. Our designs are geared toward facilitating multiple secondary perspectives on a system to co-exist in symbiosis with the original architecture, hence making it easier to explore, understand, and explain complex contexts and narratives that are hard or impossible to express using primary modularity constructs.
KW  - software engineering
KW  - modularity
KW  - exploratory programming
KW  - program
KW  - comprehension
KW  - remodularization
KW  - architecture recovery
Y1  - 2022
U6  - https://doi.org/10.5381/jot.2022.21.2.a6
SN  - 1660-1769
VL  - 21
IS  - 2
SP  - 1
EP  - 15
PB  - ETH Zürich, Department of Computer Science
CY  - Zürich
ER  - 
TY  - JOUR
A1  - Schmidl, Sebastian
A1  - Papenbrock, Thorsten
T1  - Efficient distributed discovery of bidirectional order dependencies
JF  - The VLDB journal
N2  - Bidirectional order dependencies (bODs) capture order relationships between lists of attributes in a relational table. They can express that, for example, sorting books by publication date in ascending order also sorts them by age in descending order. The knowledge about order relationships is useful for many data management tasks, such as query optimization, data cleaning, or consistency checking. Because the bODs of a specific dataset are usually not explicitly given, they need to be discovered. The discovery of all minimal bODs (in set-based canonical form) is a task with exponential complexity in the number of attributes, though, which is why existing bOD discovery algorithms cannot process datasets of practically relevant size in a reasonable time. In this paper, we propose the distributed bOD discovery algorithm DISTOD, whose execution time scales with the available hardware. DISTOD is a scalable, robust, and elastic bOD discovery approach that combines efficient pruning techniques for bOD candidates in set-based canonical form with a novel, reactive, and distributed search strategy. Our evaluation on various datasets shows that DISTOD outperforms both single-threaded and distributed state-of-the-art bOD discovery algorithms by up to orders of magnitude; it can, in particular, process much larger datasets.
KW  - Bidirectional order dependencies
KW  - Distributed computing
KW  - Actor
KW  - programming
KW  - Parallelization
KW  - Data profiling
KW  - Dependency discovery
Y1  - 2021
U6  - https://doi.org/10.1007/s00778-021-00683-4
SN  - 1066-8888
SN  - 0949-877X
VL  - 31
IS  - 1
SP  - 49
EP  - 74
PB  - Springer
CY  - Berlin ; Heidelberg ; New York
ER  - 
TY  - THES
A1  - Dreseler, Markus
T1  - Automatic tiering for in-memory database systems
N2  - A decade ago, it became feasible to store multi-terabyte databases in main memory. These in-memory databases (IMDBs) profit from DRAM's low latency and high throughput as well as from the removal of costly abstractions used in disk-based systems, such as the buffer cache. However, as the DRAM technology approaches physical limits, scaling these databases becomes difficult. Non-volatile memory (NVM) addresses this challenge. This new type of memory is persistent, has more capacity than DRAM (4x), and does not suffer from its density-inhibiting limitations. Yet, as NVM has a higher latency (5-15x) and a lower throughput (0.35x), it cannot fully replace DRAM.

IMDBs thus need to navigate the trade-off between the two memory tiers. We present a solution to this optimization problem. Leveraging information about access frequencies and patterns, our solution utilizes NVM's additional capacity while minimizing the associated access costs. Unlike buffer cache-based implementations, our tiering abstraction does not add any costs when reading data from DRAM. As such, it can act as a drop-in replacement for existing IMDBs. Our contributions are as follows:

(1) As the foundation for our research, we present Hyrise, an open-source, columnar IMDB that we re-engineered and re-wrote from scratch. Hyrise enables realistic end-to-end benchmarks of SQL workloads and offers query performance which is competitive with other research and commercial systems. At the same time, Hyrise is easy to understand and modify as repeatedly demonstrated by its uses in research and teaching.

(2) We present a novel memory management framework for different memory and storage tiers. By encapsulating the allocation and access methods of these tiers, we enable existing data structures to be stored on different tiers with no modifications to their implementation. Besides DRAM and NVM, we also support and evaluate SSDs and have made provisions for upcoming technologies such as disaggregated memory.

(3) To identify the parts of the data that can be moved to (s)lower tiers with little performance impact, we present a tracking method that identifies access skew both in the row and column dimensions and that detects patterns within consecutive accesses. Unlike existing methods that have substantial associated costs, our access counters exhibit no identifiable overhead in standard benchmarks despite their increased accuracy.

(4) Finally, we introduce a tiering algorithm that optimizes the data placement for a given memory budget. In the TPC-H benchmark, this allows us to move 90% of the data to NVM while the throughput is reduced by only 10.8% and the query latency is increased by 11.6%. With this, we outperform approaches that ignore the workload's access skew and access patterns and increase the query latency by 20% or more.

Individually, our contributions provide novel approaches to current challenges in systems engineering and database research. Combining them allows IMDBs to scale past the limits of DRAM while continuing to profit from the benefits of in-memory computing.
N2  - Seit etwa einem Jahrzehnt können Datenbanken mit einer Größe von mehreren Terabytes im Hauptspeicher abgelegt werden. Diese Hauptspeicherdatenbanken (In-Memory Databases) profitieren einerseits von der niedrigen Latenz und dem hohen Durchsatz von DRAM und andererseits vom Fehlen teurer Abstraktionsschichten, wie dem Buffer Cache, welcher in Festplatten-basierten Datenbanksystemen von Nöten war. Dadurch, dass die Entwicklung der DRAM-Technologie mehr und mehr auf physikalische Grenzen stößt, wird es jedoch zunehmend schwierig, Hauptspeicherdatenbanken zu skalieren. Non-volatile Memory (NVM) adressiert diese Herausforderung. Dieser neue Speichertyp ist persistent, hat eine um einen Faktor 4 höhere Kapazität als DRAM und leidet nicht unter den Einschränkungen, welche die Erhöhung der Speicherdichte von DRAM limitieren. Da NVM jedoch eine höhere Latenz (5-15x) und einen niedrigeren Durchsatz (0.35x) aufweist als DRAM, kann es DRAM noch nicht vollständig ersetzen.

Bei der Entwicklung von Hauptspeicherdatenbanken muss daher der Zielkonflikt zwischen den beiden Speichertypen ausbalanciert werden. Die vorliegende Arbeit präsentiert eine Lösung für dieses Optimierungsproblem. Indem wir Informationen zu Zugriffshäufigkeiten und -mustern auswerten, können wir die zusätzliche Kapazität von NVM ausnutzen und gleichzeitig die mit NVM verbundene Erhöhung von Zugriffskosten minimieren. Anders als bei bestehenden Ansätzen, welche auf einen Buffer Cache aufsetzen, bleiben bei unserer Ansatz die Kosten von Zugriffen auf DRAM unverändert. Dadurch kann unsere Lösung als unmittelbarer Ersatz für existierende Hauptspeicherdatenbanken genutzt werden. Unsere Arbeit leistet hierfür die folgenden Beiträge:

(1) Als Grundlage für unsere Forschung präsentieren wir Hyrise, eine quelloffene, spaltenorientierte Hauptspeicherdatenbank, welche wir von Grund auf neu entwickelt haben. Hyrise ermöglicht realistische End-to-End Benchmarks von SQL Workloads und weist dabei eine Performance auf, welche mit anderen Datenbanksystemen aus Industrie und Forschung vergleichbar ist. Hierbei ist Hyrise leicht zu verstehen und modifizieren. Dies wurde durch den wiederholten Einsatz in Forschung und Lehre demonstriert.

(2) Wir präsentieren ein neuartiges Speicherverwaltungs-Framework, welches verschiedene Speicherebenen (Tiers) unterstützt. Indem wir die Allokations- und Zugriffsmethoden dieser Speicherebenen kapseln, ermöglichen wir es, bestehende Datenstrukturen auf diese Ebenen aufzuteilen ohne ihre Implementierung anpassen zu müssen. Neben DRAM und NVM unterstützt unser Ansatz SSDs und ist auf zukünftige Technologien wie Disaggregated Memory vorbereitet.

(3) Um jene Teile der Daten zu identifizieren, welche auf langsamere Ebenen verschoben werden können, ohne dass die Performance des Systems als Ganzes negativ beeinträchtigt wird, stellen wir mit unseren Access Countern eine Tracking-Methode vor, welche ungleich verteilte Zugriffshäufigkeiten sowohl in der Zeilen- als auch in der Spaltendimension erkennt. Ebenfalls erkennt die Tracking-Methode Zugriffsmuster in aufeinanderfolgenden Zugriffsoperationen. Trotz ihrer hohen Genauigkeit weisen unsere Access Counter keine messbaren Mehrkosten auf. Dies unterscheidet sie von bestehenden Ansätzen, welche ungleichverteilte Zugriffsmuster weniger gut erkennen, gleichzeitig aber Mehrkosten von 20% verursachen.

(4) Abschließend stellen wir einen Tiering-Algorithmus vor, welcher die Verteilung von Daten auf die verschiedenen Speicherebenen optimiert. Am Beispiel des TPC-H-Benchmarks zeigen wir, wie 90% der Daten auf NVM verschoben werden können, wobei der Durchsatz nur um 10.8% reduziert und die durchschnittliche Antwortzeit um 11.6% erhöht wird. Damit übertreffen wir Ansätze, welche Ungleichverteilungen in den Zugriffshäufigkeiten und -mustern ignorieren.

Einzeln betrachtet stellen unsere Beiträge neue Herangehensweisen für aktuelle Herausforderungen in der systemnahen Entwicklung und der Datenbankforschung dar. In ihrem Zusammenspiel ermöglichen sie es, Hauptspeicherdatenbanken über die Grenzen von DRAM hinaus zu skalieren und dabei weiterhin von den Vorteilen des In-Memory Computings zu profitieren.
T2  - Automatisches Tiering für Hauptspeicherdatenbanken
KW  - dbms
KW  - imdb
KW  - tiering
KW  - nvm
KW  - hyrise
KW  - scm
KW  - dbms
KW  - imdb
KW  - mmdb
KW  - Datenbanken
KW  - tiering
KW  - nvm
KW  - hyrise
KW  - scm
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-558253
ER  - 
TY  - THES
A1  - Böken, Björn
T1  - Improving prediction accuracy using dynamic information
N2  - Accurately solving classification problems nowadays is likely to be the most relevant machine learning task. Binary classification separating two classes only is algorithmically simpler but has fewer potential applications as many real-world problems are multi-class. On the reverse, separating only a subset of classes simplifies the classification task. Even though existing multi-class machine learning algorithms are very flexible regarding the number of classes, they assume that the target set Y is fixed and cannot be restricted once the training is finished. On the other hand, existing state-of-the-art production environments are becoming increasingly interconnected with the advance of Industry 4.0 and related technologies such that additional information can simplify the respective classification problems. In light of this, the main aim of this thesis is to introduce dynamic classification that generalizes multi-class classification such that the target class set can be restricted arbitrarily to a non-empty class subset M of Y at any time between two consecutive predictions.

This task is solved by a combination of two algorithmic approaches. First, classifier calibration, which transforms predictions into posterior probability estimates that are intended to be well calibrated. The analysis provided focuses on monotonic calibration and in particular corrects wrong statements that appeared in the literature. It also reveals that bin-based evaluation metrics, which became popular in recent years, are unjustified and should not be used at all. Next, the validity of Platt scaling, which is the most relevant parametric calibration approach, is analyzed in depth. In particular, its optimality for classifier predictions distributed according to four different families of probability distributions as well its equivalence with Beta calibration up to a sigmoidal preprocessing are proven. For non-monotonic calibration, extended variants on kernel density estimation and the ensemble method EKDE are introduced. Finally, the calibration techniques are evaluated using a simulation study with complete information as well as on a selection of 46 real-world data sets.

Building on this, classifier calibration is applied as part of decomposition-based classification that aims to reduce multi-class problems to simpler (usually binary) prediction tasks. For the involved fusing step performed at prediction time, a new approach based on evidence theory is presented that uses classifier calibration to model mass functions. This allows the analysis of decomposition-based classification against a strictly formal background and to prove closed-form equations for the overall combinations. Furthermore, the same formalism leads to a consistent integration of dynamic class information, yielding a theoretically justified and computationally tractable dynamic classification model. The insights gained from this modeling are combined with pairwise coupling, which is one of the most relevant reduction-based classification approaches, such that all individual predictions are combined with a weight. This not only generalizes existing works on pairwise coupling but also enables the integration of dynamic class information.

Lastly, a thorough empirical study is performed that compares all newly introduced approaches to existing state-of-the-art techniques. For this, evaluation metrics for dynamic classification are introduced that depend on corresponding sampling strategies. Thereafter, these are applied during a three-part evaluation. First, support vector machines and random forests are applied on 26 data sets from the UCI Machine Learning Repository. Second, two state-of-the-art deep neural networks are evaluated on five benchmark data sets from a relatively recent reference work. Here, computationally feasible strategies to apply the presented algorithms in combination with large-scale models are particularly relevant because a naive application is computationally intractable. Finally, reference data from a real-world process allowing the inclusion of dynamic class information are collected and evaluated. The results show that in combination with support vector machines and random forests, pairwise coupling approaches yield the best results, while in combination with deep neural networks, differences between the different approaches are mostly small to negligible. Most importantly, all results empirically confirm that dynamic classification succeeds in improving the respective prediction accuracies. Therefore, it is crucial to pass dynamic class information in respective applications, which requires an appropriate digital infrastructure.
N2  - Klassifikationsprobleme akkurat zu lösen ist heutzutage wahrscheinlich die relevanteste Machine-Learning-Aufgabe. Binäre Klassifikation zur Unterscheidung von nur zwei Klassen ist algorithmisch einfacher, hat aber weniger potenzielle Anwendungen, da in der Praxis oft Mehrklassenprobleme auftreten. Demgegenüber vereinfacht die Unterscheidung nur innerhalb einer Untermenge von Klassen die Problemstellung. Obwohl viele existierende Machine-Learning-Algorithmen sehr flexibel mit Blick auf die Anzahl der Klassen sind, setzen sie voraus, dass die Zielmenge Y fest ist und nicht mehr eingeschränkt werden kann, sobald das Training abgeschlossen ist. Allerdings sind moderne Produktionsumgebungen mit dem Voranschreiten von Industrie 4.0 und entsprechenden Technologien zunehmend digital verbunden, sodass zusätzliche Informationen die entsprechenden Klassifikationsprobleme vereinfachen können. Vor diesem Hintergrund ist das Hauptziel dieser Arbeit, dynamische Klassifikation als Verallgemeinerung von Mehrklassen-Klassifikation einzuführen, bei der die Zielmenge jederzeit zwischen zwei aufeinanderfolgenden Vorhersagen zu einer beliebigen, nicht leeren Teilmenge eingeschränkt werden kann.

 Diese Aufgabe wird durch die Kombination von zwei algorithmischen Ansätzen gelöst. Zunächst wird Klassifikator-Kalibrierung eingesetzt, mittels der Vorhersagen in Schätzungen der A-Posteriori-Wahrscheinlichkeiten transformiert werden, die gut kalibriert sein sollen. Die durchgeführte Analyse zielt auf monotone Kalibrierung ab und korrigiert insbesondere Falschaussagen, die in Referenzarbeiten veröffentlicht wurden. Außerdem zeigt sie, dass Bin-basierte Fehlermaße, die in den letzten Jahren populär geworden sind, ungerechtfertigt sind und nicht verwendet werden sollten. Weiterhin wird die Validität von Platt Scaling, dem relevantesten, parametrischen Kalibrierungsverfahren, genau analysiert. Insbesondere wird seine Optimalität für Klassifikatorvorhersagen, die gemäß vier Familien von Verteilungsfunktionen verteilt sind, sowie die Äquivalenz zu Beta-Kalibrierung bis auf eine sigmoidale Vorverarbeitung gezeigt. Für nicht monotone Kalibrierung werden erweiterte Varianten der Kerndichteschätzung und die Ensemblemethode EKDE eingeführt. Schließlich werden die Kalibrierungsverfahren im Rahmen einer Simulationsstudie mit vollständiger Information sowie auf 46 Referenzdatensätzen ausgewertet.

 Hierauf aufbauend wird Klassifikator-Kalibrierung als Teil von reduktionsbasierter Klassifikation eingesetzt, die zum Ziel hat, Mehrklassenprobleme auf einfachere (üblicherweise binäre) Entscheidungsprobleme zu reduzieren. Für den zugehörigen, während der Vorhersage notwendigen Fusionsschritt wird ein neuer, auf Evidenztheorie basierender Ansatz eingeführt, der Klassifikator-Kalibrierung zur Modellierung von Massefunktionen nutzt. Dies ermöglicht die Analyse von reduktionsbasierter Klassifikation in einem formalen Kontext sowie geschlossene Ausdrücke für die entsprechenden Gesamtkombinationen zu beweisen. Zusätzlich führt derselbe Formalismus zu einer konsistenten Integration von dynamischen Klasseninformationen, sodass sich ein theoretisch fundiertes und effizient zu berechnendes, dynamisches Klassifikationsmodell ergibt. Die hierbei gewonnenen Einsichten werden mit Pairwise Coupling, einem der relevantesten Verfahren für reduktionsbasierte Klassifikation, verbunden, wobei alle individuellen Vorhersagen mit einer Gewichtung kombiniert werden. Dies verallgemeinert nicht nur existierende Ansätze für Pairwise Coupling, sondern führt darüber hinaus auch zu einer Integration von dynamischen Klasseninformationen.

 Abschließend wird eine umfangreiche empirische Studie durchgeführt, die alle neu eingeführten Verfahren mit denen aus dem Stand der Forschung vergleicht. Hierfür werden Bewertungsfunktionen für dynamische Klassifikation eingeführt, die auf Sampling-Strategien basieren. Anschließend werden diese im Rahmen einer dreiteiligen Studie angewendet. Zunächst werden Support Vector Machines und Random Forests auf 26 Referenzdatensätzen aus dem UCI Machine Learning Repository angewendet. Im zweiten Teil werden zwei moderne, tiefe neuronale Netze auf fünf Referenzdatensätzen aus einer relativ aktuellen Referenzarbeit ausgewertet. Hierbei sind insbesondere Strategien relevant, die die Anwendung der eingeführten Verfahren in Verbindung mit großen Modellen ermöglicht, da eine naive Vorgehensweise nicht durchführbar ist. Schließlich wird ein Referenzdatensatz aus einem Produktionsprozess gewonnen, der die Integration von dynamischen Klasseninformationen ermöglicht, und ausgewertet. Die Ergebnisse zeigen, dass Pairwise-Coupling-Verfahren in Verbindung mit Support Vector Machines und Random Forests die besten Ergebnisse liefern, während in Verbindung mit tiefen neuronalen Netzen die Unterschiede zwischen den Verfahren oft klein bis vernachlässigbar sind. Am wichtigsten ist, dass alle Ergebnisse zeigen, dass dynamische Klassifikation die entsprechenden Erkennungsgenauigkeiten verbessert. Daher ist es entscheidend, dynamische Klasseninformationen in den entsprechenden Anwendungen zur Verfügung zu stellen, was eine entsprechende digitale Infrastruktur erfordert.
KW  - dynamic classification
KW  - multi-class classification
KW  - classifier calibration
KW  - evidence theory
KW  - Dempster–Shafer theory
KW  - Deep Learning
KW  - Deep Learning
KW  - Dempster-Shafer-Theorie
KW  - Klassifikator-Kalibrierung
KW  - dynamische Klassifikation
KW  - Evidenztheorie
KW  - Mehrklassen-Klassifikation
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-585125
ER  - 
TY  - JOUR
A1  - Krause, Hannes-Vincent
A1  - Große Deters, Fenne
A1  - Baumann, Annika
A1  - Krasnova, Hanna
T1  - Active social media use and its impact on well-being
BT  - an experimental study on the effects of posting pictures on Instagram
JF  - Journal of computer-mediated communication : a journal of the International Communication Association
N2  - Active use of social networking sites (SNSs) has long been assumed to benefit users' well-being. However, this established hypothesis is increasingly being challenged, with scholars criticizing its lack of empirical support and the imprecise conceptualization of active use. Nevertheless, with considerable heterogeneity among existing studies on the hypothesis and causal evidence still limited, a final verdict on its robustness is still pending. To contribute to this ongoing debate, we conducted a week-long randomized control trial with N = 381 adult Instagram users recruited via Prolific. Specifically, we tested how active SNS use, operationalized as picture postings on Instagram, affects different dimensions of well-being. The results depicted a positive effect on users' positive affect but null findings for other well-being outcomes. The findings broadly align with the recent criticism against the active use hypothesis and support the call for a more nuanced view on the impact of SNSs. <br /> Lay Summary Active use of social networking sites (SNSs) has long been assumed to benefit users' well-being. However, this established assumption is increasingly being challenged, with scholars criticizing its lack of empirical support and the imprecise conceptualization of active use. Nevertheless, with great diversity among conducted studies on the hypothesis and a lack of causal evidence, a final verdict on its viability is still pending. To contribute to this ongoing debate, we conducted a week-long experimental investigation with 381 adult Instagram users. Specifically, we tested how posting pictures on Instagram affects different aspects of well-being. The results of this study depicted a positive effect of posting Instagram pictures on users' experienced positive emotions but no effects on other aspects of well-being. The findings broadly align with the recent criticism against the active use hypothesis and support the call for a more nuanced view on the impact of SNSs on users.
KW  - social networking sites
KW  - social media
KW  - Instagram
KW  - well-being
KW  - experiment
KW  - randomized control trial
Y1  - 2022
U6  - https://doi.org/10.1093/jcmc/zmac037
SN  - 1083-6101
VL  - 28
IS  - 1
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - THES
A1  - Plauth, Max Frederik
T1  - Improving the Accessibility of Heterogeneous System Resources for Application Developers using Programming Abstractions
T1  - Verbesserung der Zugänglichkeit heterogener Systemressourcen für Anwendungsentwickler durch Programmierabstraktionen
N2  - The heterogeneity of today's state-of-the-art computer architectures is confronting application developers with an immense degree of complexity which results from two major challenges. First, developers need to acquire profound knowledge about the programming models or the interaction models associated with each type of heterogeneous system resource to make efficient use thereof. Second, developers must take into account that heterogeneous system resources always need to exchange data with each other in order to work on a problem together. However, this data exchange is always associated with a certain amount of overhead, which is why the amounts of data exchanged should be kept as low as possible.

This thesis proposes three programming abstractions to lessen the burdens imposed by these major challenges with the goal of making heterogeneous system resources accessible to a wider range of application developers. The lib842 compression library provides the first method for accessing the compression and decompression facilities of the NX-842 on-chip compression accelerator available in IBM Power CPUs from user space applications running on Linux. Addressing application development of scale-out GPU workloads, the CloudCL framework makes the resources of GPU clusters more accessible by hiding many aspects of distributed computing while enabling application developers to focus on the aspects of the data parallel programming model associated with GPUs. Furthermore, CloudCL is augmented with transparent data compression facilities based on the lib842 library in order to improve the efficiency of data transfers among cluster nodes. The improved data transfer efficiency provided by the integration of transparent data compression yields performance improvements ranging between 1.11x and 2.07x across four data-intensive scale-out GPU workloads. To investigate the impact of programming abstractions for data placement in NUMA systems, a comprehensive evaluation of the PGASUS framework for NUMA-aware C++ application development is conducted. On a wide range of test systems, the evaluation demonstrates that PGASUS does not only improve the developer experience across all workloads, but that it is also capable of outperforming NUMA-agnostic implementations with average performance improvements of 1.56x.

Based on these programming abstractions, this thesis demonstrates that by providing a sufficient degree of abstraction, the accessibility of heterogeneous system resources can be improved for application developers without occluding performance-critical properties of the underlying hardware.
N2  - Die Heterogenität heutiger Rechnerarchitekturen konfrontiert Anwendungsentwickler mit einem immensen Maß an Komplexität, welches sich aus zwei großen Herausforderungen ergibt. Erstens müssen Entwickler fundierte Kenntnisse über die Programmiermodelle oder Interaktionsmodelle verfügen, welche eine Voraussetzung sind um die jeweiligen heterogenen Systemressourcen effizient nutzen zu können. Zweitens müssen Entwickler berücksichtigen, dass heterogene Systemressourcen immer auch Daten untereinander austauschen müssen, um ein Problem gemeinsam zu bearbeiten. Dieser Datenaustausch ist aber auch immer mit einem gewissen Mehraufwand verbunden, weshalb die ausgetauschten Datenmengen so gering wie möglich gehalten werden sollten.

Diese Dissertation schlägt drei Programmierabstraktionen vor und ermöglicht es so, Anwendungsentwickler bei der Bewältigung dieser Herausforderungen zu entlasten, so dass heterogene Systemressourcen für eine größere Anzahl von Anwendungsentwicklern zugänglich werden. Die lib842-Kompressionsbibliothek bietet Anwendungen erstmals die Möglichkeit, die Kompressions- und Dekompressionsfunktionen des in IBM Power Prozessoren integrierten NX-842 Kompressionsbeschleunigers unter Linux zu verwenden. Das CloudCL-Framework richtet sich an die Entwicklung von GPU-beschleunigten, verteilten Anwendungen und macht die Ressourcen von GPU-Clustern vereinfacht nutzbar, indem es viele Aspekte des verteilten Rechnens ausblendet und es so Anwendungsentwicklern ermöglicht, sich auf die Aspekte des auf GPUs üblichen, datenparallelen Programmiermodells zu konzentrieren. CloudCL wurde weitergehend über transparente Datenkompressionsfunktionalität auf Basis der lib842 Programmbibliothek erweitert, um die Datenübertragungseffizienz zwischen Clusterknoten zu verbessern. Die verbesserte Datentransfereffizienz führt zu Leistungsverbesserungen zwischen 1, 11-fach und 2, 07- fach bei der Verwendung von vier datenintesiven, verteilten, und GPU-beschleunigten Arbeitslasten.

Um die Auswirkungen von Programmierabstraktionen auf die Datenplatzierung in NUMA-Systemen zu untersuchen, wird eine umfassende Evaluierung des PGASUSFrameworks für NUMA-gewahre C++-Anwendungsentwicklung durchgeführt. Unter Verwendung einer breiten Palette von Testsystemen zeigt die Evaluierung, dass PGASUS nicht nur die Entwicklung von NUMA-gewahren Anwendungen erleichtert, sondern auch in der Lage ist, die Leistung von NUMA-agnostischen Implementierungen im Mittel um 1, 56× zu übertreffen.
Auf der Grundlage dieser Programmierabstraktionen zeigt diese Dissertation, dass heterogene Systemressourcen durch die Bereitstellung angemessener Abstraktionsmechanismen einfacher von Anwendungsentwicklern erschlossen werden können, ohne dass leistungsrelevante Eigenschaften der zugrunde liegenden Hardware verdeckt werden.
KW  - heterogeneous computing
KW  - programming abstraction
KW  - heterogenes Rechnen
KW  - Programmierabstraktionen
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-558118
ER  - 
TY  - GEN
A1  - Ullrich, André
A1  - Vladova, Gergana
A1  - Eigelshoven, Felix
A1  - Renz, André
T1  - Data mining of scientific research on artificial intelligence in teaching and administration in higher education institutions
BT  - a bibliometrics analysis and recommendation for future research
T2  - Zweitveröffentlichungen der Universität Potsdam : Wirtschafts- und Sozialwissenschaftliche Reihe
N2  - Teaching and learning as well as administrative processes are still experiencing intensive changes with the rise of artificial intelligence (AI) technologies and its diverse application opportunities in the context of higher education. Therewith, the scientific interest in the topic in general, but also specific focal points rose as well. However, there is no structured overview on AI in teaching and administration processes in higher education institutions that allows to identify major research topics and trends, and concretizing peculiarities and develops recommendations for further action. To overcome this gap, this study seeks to systematize the current scientific discourse on AI in teaching and administration in higher education institutions. This study identified an (1) imbalance in research on AI in educational and administrative contexts, (2) an imbalance in disciplines and lack of interdisciplinary research, (3) inequalities in cross-national research activities, as well as (4) neglected research topics and paths. In this way, a comparative analysis between AI usage in administration and teaching and learning processes, a systematization of the state of research, an identification of research gaps as well as further research path on AI in higher education institutions are contributed to research.
T3  - Zweitveröffentlichungen der Universität Potsdam : Wirtschafts- und Sozialwissenschaftliche Reihe - 160 
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-589077
SN  - 1867-5808
IS  - 160
ER  - 
TY  - THES
A1  - Draisbach, Uwe
T1  - Efficient duplicate detection and the impact of transitivity
T1  - Effiziente Dublettenerkennung und der Einfluss von Transitivität
N2  - Duplicate detection describes the process of finding multiple representations of the same real-world entity in the absence of a unique identifier, and has many application areas, such as customer relationship management, genealogy and social sciences, or online shopping. Due to the increasing amount of data in recent years, the problem has become even more challenging on the one hand, but has led to a renaissance in duplicate detection research on the other hand.
This thesis examines the effects and opportunities of transitive relationships on the duplicate detection process. Transitivity implies that if record pairs ⟨ri,rj⟩ and ⟨rj,rk⟩ are classified as duplicates, then also record pair ⟨ri,rk⟩ has to be a duplicate. However, this reasoning might contradict with the pairwise classification, which is usually based on the similarity of objects. An essential property of similarity, in contrast to equivalence, is that similarity is not necessarily transitive.
First, we experimentally evaluate the effect of an increasing data volume on the threshold selection to classify whether a record pair is a duplicate or non-duplicate. Our experiments show that independently of the pair selection algorithm and the used similarity measure, selecting a suitable threshold becomes more difficult with an increasing number of records due to an increased probability of adding a false duplicate to an existing cluster. Thus, the best threshold changes with the dataset size, and a good threshold for a small (possibly sampled) dataset is not necessarily a good threshold for a larger (possibly complete) dataset. As data grows over time, earlier selected thresholds are no longer a suitable choice, and the problem becomes worse for datasets with larger clusters.
Second, we present with the Duplicate Count Strategy (DCS) and its enhancement DCS++ two alternatives to the standard Sorted Neighborhood Method (SNM) for the selection of candidate record pairs. DCS adapts SNMs window size based on the number of detected duplicates and DCS++ uses transitive dependencies to save complex comparisons for finding duplicates in larger clusters. We prove that with a proper (domain- and data-independent!) threshold, DCS++ is more efficient than SNM without loss of effectiveness.
Third, we tackle the problem of contradicting pairwise classifications. Usually, the transitive closure is used for pairwise classifications to obtain a transitively closed result set. However, the transitive closure disregards negative classifications. We present three new and several existing clustering algorithms and experimentally evaluate them on various datasets and under various algorithm configurations. The results show that the commonly used transitive closure is inferior to most other clustering algorithms, especially for the precision of results. In scenarios with larger clusters, our proposed EMCC algorithm is, together with Markov Clustering, the best performing clustering approach for duplicate detection, although its runtime is longer than Markov Clustering due to the subexponential time complexity. EMCC especially outperforms Markov Clustering regarding the precision of the results and additionally has the advantage that it can also be used in scenarios where edge weights are not available.
N2  - Dubletten sind mehrere Repräsentationen derselben Entität in einem Datenbestand. Diese zu identifizieren ist das Ziel der Dublettenerkennung, wobei in der Regel Paare von Datensätzen anhand von Ähnlichkeitsmaßen miteinander verglichen und unter Verwendung eines Schwellwerts als Dublette oder Nicht-Dublette klassifiziert werden. Für Dublettenerkennung existieren verschiedene Anwendungsbereiche, beispielsweise im Kundenbeziehungsmanagement, beim Onlineshopping, der Genealogie und in den Sozialwissenschaften. Der in den letzten Jahren zu beobachtende Anstieg des gespeicherten Datenvolumens erschwert die Dublettenerkennung, da die Anzahl der benötigten Vergleiche quadratisch mit der Anzahl der Datensätze wächst. Durch Verwendung eines geeigneten Paarauswahl-Algorithmus kann die Anzahl der zu vergleichenden Paare jedoch reduziert und somit die Effizienz gesteigert werden.
Die Dissertation untersucht die Auswirkungen und Möglichkeiten transitiver Beziehungen auf den Dublettenerkennungsprozess. Durch Transitivität lässt sich beispielsweise ableiten, dass aufgrund einer Klassifikation der Datensatzpaare ⟨ri,rj⟩ und ⟨rj,rk⟩ als Dublette auch die Datensätze ⟨ri,rk⟩ eine Dublette sind. Dies kann jedoch im Widerspruch zu einer paarweisen Klassifizierung stehen, denn im Unterschied zur Äquivalenz ist die Ähnlichkeit von Objekten nicht notwendigerweise transitiv.
Im ersten Teil der Dissertation wird die Auswirkung einer steigenden Datenmenge auf die Wahl des Schwellwerts zur Klassifikation von Datensatzpaaren als Dublette oder Nicht-Dublette untersucht. Die Experimente zeigen, dass unabhängig von dem gewählten Paarauswahl-Algorithmus und des gewählten Ähnlichkeitsmaßes die Wahl eines geeigneten Schwellwerts mit steigender Datensatzanzahl schwieriger wird, da die Gefahr fehlerhafter Cluster-Zuordnungen steigt. Der optimale Schwellwert eines Datensatzes variiert mit dessen Größe. So ist ein guter Schwellwert für einen kleinen Datensatz (oder eine Stichprobe) nicht notwendigerweise ein guter Schwellwert für einen größeren (ggf. vollständigen) Datensatz. Steigt die Datensatzgröße im Lauf der Zeit an, so muss ein einmal gewählter Schwellwert ggf. nachjustiert werden. Aufgrund der Transitivität ist dies insbesondere bei Datensätzen mit größeren Clustern relevant.
Der zweite Teil der Dissertation beschäftigt sich mit Algorithmen zur Auswahl geeigneter Datensatz-Paare für die Klassifikation. Basierend auf der Sorted Neighborhood Method (SNM) werden mit der Duplicate Count Strategy (DCS) und ihrer Erweiterung DCS++ zwei neue Algorithmen vorgestellt. DCS adaptiert die Fenstergröße in Abhängigkeit der Anzahl gefundener Dubletten und DCS++ verwendet zudem die transitive Abhängigkeit, um kostspielige Vergleiche einzusparen und trotzdem größere Cluster von Dubletten zu identifizieren. Weiterhin wird bewiesen, dass mit einem geeigneten Schwellwert DCS++ ohne Einbußen bei der Effektivität effizienter als die Sorted Neighborhood Method ist.
Der dritte und letzte Teil der Arbeit beschäftigt sich mit dem Problem widersprüchlicher paarweiser Klassifikationen. In vielen Anwendungsfällen wird die Transitive Hülle zur Erzeugung konsistenter Cluster verwendet, wobei hierbei paarweise Klassifikationen als Nicht-Dublette missachtet werden. Es werden drei neue und mehrere existierende Cluster-Algorithmen vorgestellt und experimentell mit verschiedenen Datensätzen und Konfigurationen evaluiert. Die Ergebnisse zeigen, dass die Transitive Hülle den meisten anderen Clustering-Algorithmen insbesondere bei der Precision, definiert als Anteil echter Dubletten an der Gesamtzahl klassifizierter Dubletten, unterlegen ist. In Anwendungsfällen mit größeren Clustern ist der vorgeschlagene EMCC-Algorithmus trotz seiner subexponentiellen Laufzeit zusammen mit dem Markov-Clustering der beste Clustering-Ansatz für die Dublettenerkennung. EMCC übertrifft Markov Clustering insbesondere hinsichtlich der Precision der Ergebnisse und hat zusätzlich den Vorteil, dass dieser auch ohne Ähnlichkeitswerte eingesetzt werden kann.
KW  - Datenqualität
KW  - Datenintegration
KW  - Dubletten
KW  - Duplikaterkennung
KW  - data quality
KW  - data integration
KW  - duplicate detection
KW  - deduplication
KW  - entity resolution
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-572140
ER  - 
TY  - GEN
A1  - Al Laban, Firas
A1  - Reger, Martin
A1  - Lucke, Ulrike
T1  - Closing the Policy Gap in the Academic Bridge
T2  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe
N2  - The highly structured nature of the educational sector demands effective policy mechanisms close to the needs of the field. That is why evidence-based policy making, endorsed by the European Commission under Erasmus+ Key Action 3, aims to make an alignment between the domains of policy and practice. Against this background, this article addresses two issues: First, that there is a vertical gap in the translation of higher-level policies to local strategies and regulations. Second, that there is a horizontal gap between educational domains regarding the policy awareness of individual players. This was analyzed in quantitative and qualitative studies with domain experts from the fields of virtual mobility and teacher training. From our findings, we argue that the combination of both gaps puts the academic bridge from secondary to tertiary education at risk, including the associated knowledge proficiency levels. We discuss the role of digitalization in the academic bridge by asking the question: which value does the involved stakeholders expect from educational policies? As a theoretical basis, we rely on the model of value co-creation for and by stakeholders. We describe the used instruments along with the obtained results and proposed benefits. Moreover, we reflect on the methodology applied, and we finally derive recommendations for future academic bridge policies.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 1310 
KW  - policy evaluation
KW  - higher education
KW  - virtual mobility
KW  - teacher training
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-583572
SN  - 1866-8372
IS  - 1310
ER  - 
TY  - THES
A1  - Melnichenko, Anna
T1  - Selfish Creation of Realistic Networks
N2  - Complex networks like the Internet or social networks are fundamental parts of our everyday lives. It is essential to understand their structural properties and how these networks are formed. A game-theoretic approach to network design problems has become of high interest in the last decades. The reason is that many real-world networks are the outcomes of decentralized strategic behavior of independent agents without central coordination. Fabrikant, Luthra, Maneva, Papadimitriou, and Schenker proposed a game-theoretic model aiming to explain the formation of the Internet-like networks. In this model, called the Network Creation Game, agents are associated with nodes of a network. Each agent seeks to maximize her centrality by establishing costly connections to other agents. The model is relatively simple but shows a high potential in modeling complex real-world networks. In this thesis, we contribute to the line of research on variants of the Network Creation Games. Inspired by real-world networks, we propose and analyze several novel network creation models. We aim to understand the impact of certain realistic modeling assumptions on the structure of the created networks and the involved agents’ behavior.
The first natural additional objective that we consider is the network’s robustness. We consider a game where the agents seek to maximize their centrality and, at the same time, the stability of the created network against random edge failure.
Our second point of interest is a model that incorporates an underlying geometry. We consider a network creation model where the agents correspond to points in some underlying space and where edge lengths are equal to the distances between the endpoints in that space. The geometric setting captures many physical real-world networks like transport networks and fiber-optic communication networks.
We focus on the formation of social networks and consider two models that incorporate particular realistic behavior observed in real-world networks. In the first model, we embed the anti-preferential attachment link formation. Namely, we assume that the cost of the connection is proportional to the popularity of the targeted agent. Our second model is based on the observation that the probability of two persons to connect is inversely proportional to the length of their shortest chain of mutual acquaintances.
For each of the four models above, we provide a complete game-theoretical analysis. In particular, we focus on distinctive structural properties of the equilibria, the hardness of computing a best response, the quality of equilibria in comparison to the centrally designed socially optimal networks. We also analyze the game dynamics, i.e., the process of sequential strategic improvements by the agents, and analyze the convergence to an equilibrium state and its properties.
N2  - Komplexe Netzwerke, wie das Internet oder soziale Netzwerke, sind fundamentale Bestandteile unseres Alltags. Deshalb ist es wichtig, ihre strukturellen Eigenschaften zu verstehen und zu wissen, wie sie gebildet werden. Um dies zu erreichen, wurden in den letzten Jahrzehnten spieltheoretische Ansätze für Netzwerkdesignprobleme populär. Der Grund dafür ist, dass viele reale Netzwerke das Ergebnis von dezentralem strategischem Verhalten unabhängiger Agenten ohne zentrale Koordination sind. Fabrikant, Luthra, Maneva, Papadimitriou und Schenker haben ein solches spieltheoretisches Modell vorgeschlagen, um die Entstehung von internetähnlichen Netzwerken zu erklären.
In diesem Modell, dem sogenannten Network Creation Game, repräsentieren die Agenten die Knoten eines Netzwerks. Jeder Agent versucht, durch den Kauf von Verbindungen zu anderen Agenten seine Zentralität im erzeugten Netzwerk zu maximieren. Dieses Modell ist relativ einfach, aber es hat ein großes Potenzial, reale Netzwerke modellieren zu können. In der vorliegenden Arbeit tragen wir zur aktuellen Forschungsrichtung, die sich der Untersuchung von Varianten der Network Creation Games widmet, bei. Inspiriert von realen Netzwerken, schlagen wir verschiedene neuartige Netzwerkbildungsmodelle vor und analysieren diese. Wir wollen hierbei die Auswirkungen bestimmter realistischer Modellierungsannahmen auf die Struktur der erstellten Netzwerke und das Verhalten der beteiligten Agenten verstehen.
Die erste natürliche zusätzliche Modellierungsannahme, die wir betrachten, ist ein Fokus auf die Robustheit des erzeugten Netzwerks. In diesem Modell haben die Agenten das Ziel, ihre Zentralität zu maximieren und gleichzeitig das erstellte Netzwerk robust gegenüber zufällige Verbindungsausfälle zu machen.
Das zweite neue Modell, das wir hier betrachten, bezieht eine zu Grunde liegende Geometrie mit ein. Hierbei entspricht jeder Agent einem Punkt in einem gegebenen Raum und die Länge einer Netzwerkverbindung entspricht der Distanz zwischen den jeweiligen Endpunkten in diesem Raum. Diese geometrische Variante erlaubt die Modellierung vieler realer physischer Netzwerke, wie z.B. Transportnetzwerke und Glasfaserkommunikationsnetzwerke.
Des Weiteren fokussieren wir uns auf die Bildung von sozialen Netzwerken und betrachten zwei Modelle, die ein bestimmtes realistisches Verhalten einbeziehen, das in realen sozialen Netzwerken beobachtet werden kann. Das erste Modell basiert auf einer anti-präferentiellen Kantenerzeugung. Dabei nehmen wir an, dass die Kosten einer Verbindung proportional zur Popularität des Agenten am anderen Endpunkt sind. Das zweite betrachtete Modell basiert auf der Beobachtung, dass die Wahrscheinlichkeit, dass zwei Personen verbunden sind, proportional zur Länge ihrer kürzesten Kette von gegenseitigen Bekanntschaften ist.
Für jedes der vier oben genannten Modelle liefern wir eine komplette spieltheoretische Analyse. Insbesondere fokussieren wir uns auf charakteristische strukturelle Eigenschaften der spieltheoretischen Gleichgewichte, die Komplexität der Berechnung einer optimalen Strategie und die Qualität der Gleichgewichte im Vergleich zu den zentral entworfenen sozial optimalen Netzwerken. Außerdem analysieren wir die Spieldynamik, d.h. den Prozess von sequentiellen verbessernden Strategieänderungen der Agenten. Dabei untersuchen wir die Konvergenz zu einem Gleichgewichtszustand und die Eigenschaften solcher Konvergenzprozesse.
T2  - Spieltheoretische Erzeugung von realistischen Netzwerken
KW  - Algorithmic Game Theory
KW  - Network Creation Game
KW  - Price of Anarchy
KW  - Nash Equilibrium
KW  - Game Dynamics
KW  - Computational Hardness
KW  - Algorithmische Spieltheorie
KW  - Network Creation Game
KW  - Preis der Anarchie
KW  - Spieldynamik
KW  - Komplexität der Berechnung
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-548141
ER  - 
TY  - THES
A1  - Haarmann, Stephan
T1  - WICKR: A Joint Semantics for Flexible Processes and Data
N2  - Knowledge-intensive business processes are flexible and data-driven. Therefore, traditional process modeling languages do not meet their requirements: These languages focus on highly structured processes in which data plays a minor role. As a result, process-oriented information systems fail to assist knowledge workers on executing their processes. We propose a novel case management approach that combines flexible activity-centric processes with data models, and we provide a joint semantics using colored Petri nets. The approach is suited to model, verify, and enact knowledge-intensive processes and can aid the development of information systems that support knowledge work.

Knowledge-intensive processes are human-centered, multi-variant, and data-driven. Typical domains include healthcare, insurances, and law. The processes cannot be fully modeled, since the underlying knowledge is too vast and changes too quickly. Thus, models for knowledge-intensive processes are necessarily underspecified. In fact, a case emerges gradually as knowledge workers make informed decisions. Knowledge work imposes special requirements on modeling and managing respective processes. They include flexibility during design and execution, ad-hoc adaption to unforeseen situations, and the integration of behavior and data. However, the predominantly used process modeling languages (e.g., BPMN) are unsuited for this task.

Therefore, novel modeling languages have been proposed. Many of them focus on activities' data requirements and declarative constraints rather than imperative control flow. Fragment-Based Case Management, for example, combines activity-centric imperative process fragments with declarative data requirements. At runtime, fragments can be combined dynamically, and new ones can be added. Yet, no integrated semantics for flexible activity-centric process models and data models exists.

In this thesis, Wickr, a novel case modeling approach extending fragment-based Case Management, is presented. It supports batch processing of data, sharing data among cases, and a full-fledged data model with associations and multiplicity constraints. We develop a translational semantics for Wickr targeting (colored) Petri nets. The semantics assert that a case adheres to the constraints in both the process fragments and the data models. Among other things, multiplicity constraints must not be violated. Furthermore, the semantics are extended to multiple cases that operate on shared data. Wickr shows that the data structure may reflect process behavior and vice versa. Based on its semantics, prototypes for executing and verifying case models showcase the feasibility of  Wickr. Its applicability to knowledge-intensive and to data-centric processes is evaluated using well-known requirements from related work.
N2  - Traditionelle Prozessmodellierungssprachen sind auf hoch strukturierte Prozesse ausgelegt, in denen Daten nur eine Nebenrolle spielen. Sie eignen sich daher nicht für wissensintensive Prozesse, die flexibel und datengetrieben sind. Deshalb können prozessorientierte Informationssysteme Fachexperten nicht gänzlich unterstützen. Diese Arbeit beinhaltet eine neue Modellierungssprache, die flexible Prozessmodelle mit Datenmodellen kombiniert. Die Semantik dieser Sprache ist mittels gefärbten Petri-Netzen formal definiert. Wissensintensive Prozesse können so modelliert, verifiziert und ausgeführt werden.

Wissensintensive Prozesse sind variantenreich und involvieren Fachexperten, die mit ihren Entscheidungen die Prozessausführung prägen. Typische Anwendungsbereiche sind das Gesundheitswesen, Rechtswesen und Versicherungen. Diese Prozesse können i.d.R. nicht vollständig spezifiziert werden, da das zugrundeliegende Wissen zu umfangreich ist und sich außerdem zu schnell verändert. Die genaue Reihenfolge der Aktivitäten wird erst durch die Fachexperten zur Laufzeit festgelegt. Deshalb erfordern dieser Prozesse Flexibilität sowohl zur Entwurfszeit wie zur Laufzeit, Daten und Verhalten müssen in enger Beziehung betrachtet werden. Zudem muss es möglich sein, den Prozess anzupassen, falls eine unvorhergesehene Situation eintreten. Etablierte Prozessmodellierungssprachen, wie z.B. BPMN, sind daher ungeeignet.

Deshalb werden neue Sprachen entwickelt, in denen sich generell zwei Tendenzen beobachten lassen: ein Wechseln von imperativer zu deklarativer Modellierung und eine zunehmende Integration von Daten. Im Fragment-Basierten-Case-Management können imperative Prozessfragmente zur Laufzeit flexibel kombiniert werden, solange spezifizierten Datenanforderungen erfüllt sind.

In dieser Arbeit wird Wickr vorgestellt. Dabei handelt es sich um eine Modellierungssprache, die das
Fragment-Basierte-Case-Management erweitert. Wickr kombiniert Prozessfragmente mit einem Datenmodell inklusive Assoziationen und zwei Arten an Multiplizitätseinschränkungen: Die erste Art muss immer gelten, wohingegen die zweite nur am Ende eines Falls gelten muss. Zusätzlich unterstützt Wickr Stapelverarbeitung und Datenaustausch zwischen Fällen.
Des Weiteren entwickeln wir eine translationale Semantik, die Wickr in gefärbte Petri-Netze übersetzt. Die Semantik berücksichtigt sowohl die Vorgaben des Prozessmodells wie auch die des Datenmodells. Die Semantik eignet sich nicht nur  für die Beschreibung eines einzelnen Falls, sondern kann auch mehrere untereinander in Beziehung stehende Fälle abdecken. Durch Prototypen wird die Umsetzbarkeit von Wickr demonstriert und mittels bekannten Anforderungslisten die Einsatzmöglichkeit für wissensintensive und datengetriebene Prozesse evaluiert.
T2  - Wickr: Eine gemeinsame Semantik für flexible Prozesse und Daten
KW  - Case Management
KW  - Business Process Management
KW  - Process Modeling
KW  - Data Modeling
KW  - Execution Semantics
KW  - Geschäftsprozessmanagement
KW  - Fallmanagement
KW  - Datenmodellierung
KW  - Ausführungssemantiken
KW  - Prozessmodellierung
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-546137
ER  - 
TY  - JOUR
A1  - Ullrich, André
A1  - Vladova, Gergana
A1  - Eigelshoven, Felix
A1  - Renz, André
T1  - Data mining of scientific research on artificial intelligence in teaching and administration in higher education institutions
BT  - a bibliometrics analysis and recommendation for future research
JF  - Discover artificial intelligence
N2  - Teaching and learning as well as administrative processes are still experiencing intensive changes with the rise of artificial intelligence (AI) technologies and its diverse application opportunities in the context of higher education. Therewith, the scientific interest in the topic in general, but also specific focal points rose as well. However, there is no structured overview on AI in teaching and administration processes in higher education institutions that allows to identify major research topics and trends, and concretizing peculiarities and develops recommendations for further action. To overcome this gap, this study seeks to systematize the current scientific discourse on AI in teaching and administration in higher education institutions. This study identified an (1) imbalance in research on AI in educational and administrative contexts, (2) an imbalance in disciplines and lack of interdisciplinary research, (3) inequalities in cross-national research activities, as well as (4) neglected research topics and paths. In this way, a comparative analysis between AI usage in administration and teaching and learning processes, a systematization of the state of research, an identification of research gaps as well as further research path on AI in higher education institutions are contributed to research.
Y1  - 2022
U6  - https://doi.org/10.1007/s44163-022-00031-7
SN  - 2731-0809
VL  - 2
PB  - Springer
CY  - Cham
ER  - 
TY  - JOUR
A1  - Bender, Benedict
A1  - Körppen, Tim
T1  - Integriert statt isoliert
BT  - Technologien für die erfolgreiche Umsetzung von datengetriebenem Management
JF  - Digital business : cloud
N2  - Dass Daten und Analysen Innovationstreiber sind und nicht mehr nur einen Hygienefaktor darstellen, haben viele Unternehmen erkannt. Um Potenziale zu heben, müssen Daten zielführend integriert werden. Komplexe Systemlandschaften und isolierte Datenbestände erschweren dies. Technologien für die erfolgreiche Umsetzung von datengetriebenem Management müssen richtig eingesetzt werden.
N2  - The fact that data and analyses are innovation drivers and no longer just represent a hygiene factor is nowadays understood by many companies. An important step for the development of this hidden potential is the target-oriented utilization of the existing data stocks in one's own company. In doing so, many companies face the hurdle of complex system landscapes and isolated data stocks. This article provides an overview of solutions for analysis-oriented data integration and helps decision-makers to select a suitable technology for their own company.
KW  - data analytics
KW  - data requirements
KW  - software selection
Y1  - 2022
UR  - https://www.wiso-net.de/document/DBC__584ddfcbfbc5ff400cb2ffb0f31eba6e6903fb3d
SN  - 2510-344X
VL  - 26
IS  - 1
SP  - 26
EP  - 27
PB  - WIN-Verlag GmbH & Co. KG
CY  - Vaterstetten
ER  - 
TY  - JOUR
A1  - Steinert, Fritjof
A1  - Stabernack, Benno
T1  - Architecture of a low latency H.264/AVC video codec for robust ML based image classification how region of interests can minimize the impact of coding artifacts
JF  - Journal of Signal Processing Systems for Signal, Image, and Video Technology
N2  - The use of neural networks is considered as the state of the art in the field of image classification. A large number of different networks are available for this purpose, which, appropriately trained, permit a high level of classification accuracy. Typically, these networks are applied to uncompressed image data, since a corresponding training was also carried out using image data of similar high quality. However, if image data contains image errors, the classification accuracy deteriorates drastically. This applies in particular to coding artifacts which occur due to image and video compression. Typical application scenarios for video compression are narrowband transmission channels for which video coding is required but a subsequent classification is to be carried out on the receiver side. In this paper we present a special H.264/Advanced Video Codec (AVC) based video codec that allows certain regions of a picture to be coded with near constant picture quality in order to allow a reliable classification using neural networks, whereas the remaining image will be coded using constant bit rate. We have combined this feature with the ability to run with lowest latency properties, which is usually also required in remote control applications scenarios. The codec has been implemented as a fully hardwired High Definition video capable hardware architecture which is suitable for Field Programmable Gate Arrays.
KW  - H.264
KW  - Advanced Video Codec (AVC)
KW  - Low Latency
KW  - Region of Interest
KW  - Machine Learning
KW  - Inference
KW  - FPGA
KW  - Hardware accelerator
Y1  - 2022
U6  - https://doi.org/10.1007/s11265-021-01727-2
SN  - 1939-8018
SN  - 1939-8115
VL  - 94
IS  - 7
SP  - 693
EP  - 708
PB  - Springer
CY  - New York
ER  - 
TY  - JOUR
A1  - Bonifati, Angela
A1  - Mior, Michael J.
A1  - Naumann, Felix
A1  - Noack, Nele Sina
T1  - How inclusive are we?
BT  - an analysis of gender diversity in database venues
JF  - SIGMOD record / Association for Computing Machinery, Special Interest Group on Management of Data
N2  - ACM SIGMOD, VLDB and other database organizations have committed to fostering an inclusive and diverse community, as do many other scientific organizations. Recently, different measures have been taken to advance these goals, especially for underrepresented groups. One possible measure is double-blind reviewing, which aims to hide gender, ethnicity, and other properties of the authors. <br /> We report the preliminary results of a gender diversity analysis of publications of the database community across several peer-reviewed venues, and also compare women's authorship percentages in both single-blind and double-blind venues along the years. We also obtained a cross comparison of the obtained results in data management with other relevant areas in Computer Science.
Y1  - 2022
U6  - https://doi.org/10.1145/3516431.3516438
SN  - 0163-5808
SN  - 1943-5835
VL  - 50
IS  - 4
SP  - 30
EP  - 35
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - JOUR
A1  - Alnoor, Alhamzah
A1  - Tiberius, Victor
A1  - Atiyah, Abbas Gatea
A1  - Khaw, Khai Wah
A1  - Yin, Teh Sin
A1  - Chew, XinYing
A1  - Abbas, Sammar
T1  - How positive and negative electronic word of mouth (eWOM) affects customers’ intention to use social commerce?
BT  - a dual-stage multi group-SEM and ANN analysis
JF  - International journal of human computer interaction
N2  - Advances in Web 2.0 technologies have led to the widespread assimilation of electronic commerce platforms as an innovative shopping method and an alternative to traditional shopping. However, due to pro-technology bias, scholars focus more on adopting technology, and slightly less attention has been given to the impact of electronic word of mouth (eWOM) on customers’ intention to use social commerce. This study addresses the gap by examining the intention through exploring the effect of eWOM on males’ and females’ intentions and identifying the mediation of perceived crowding. To this end, we adopted a dual-stage multi-group structural equation modeling and artificial neural network (SEM-ANN) approach. We successfully extended the eWOM concept by integrating negative and positive factors and perceived crowding. The results reveal the causal and non-compensatory relationships between the constructs. The variables supported by the SEM analysis are adopted as the ANN model’s input neurons. According to the natural significance obtained from the ANN approach, males’ intentions to accept social commerce are related mainly to helping the company, followed by core functionalities. In contrast, females are highly influenced by technical aspects and mishandling. The ANN model predicts customers’ intentions to use social commerce with an accuracy of 97%. We discuss the theoretical and practical implications of increasing customers’ intention toward social commerce channels among consumers based on our findings.
Y1  - 2022
U6  - https://doi.org/10.1080/10447318.2022.2125610
SN  - 1044-7318
SN  - 1532-7590
SP  - 1
EP  - 30
PB  - Taylor & Francis
CY  - New York
ER  - 
TY  - BOOK
A1  - Eichenroth, Friedrich
A1  - Rein, Patrick
A1  - Hirschfeld, Robert
T1  - Fast packrat parsing in a live programming environment
BT  - improving left-recursion in parsing expression grammars
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam
N2  - Language developers who design domain-specific languages or new language features need a way to make fast changes to language definitions. Those fast changes require immediate feedback. Also, it should be possible to parse the developed languages quickly to handle extensive sets of code.

Parsing expression grammars provides an easy to understand method for language definitions. Packrat parsing is a method to parse grammars of this kind, but this method is unable to handle left-recursion properly. Existing solutions either partially rewrite left-recursive rules and partly forbid them, or use complex extensions to packrat parsing that are hard to understand and cost-intensive. We investigated methods to make parsing as fast as possible, using easy to follow algorithms while not losing the ability to make fast changes to grammars.

We focused our efforts on two approaches.

One is to start from an existing technique for limited left-recursion rewriting and enhance it to work for general left-recursive grammars. The second approach is to design a grammar compilation process to find left-recursion before parsing, and in this way, reduce computational costs wherever possible and generate ready to use parser classes.

Rewriting parsing expression grammars is a task that, if done in a general way, unveils a large number of cases such that any rewriting algorithm surpasses the complexity of other left-recursive parsing algorithms. Lookahead operators introduce this complexity. However, most languages have only little portions that are left-recursive and in virtually all cases, have no indirect or hidden left-recursion. This means that the distinction of left-recursive parts of grammars from components that are non-left-recursive holds great improvement potential for existing parsers.

In this report, we list all the required steps for grammar rewriting to handle left-recursion, including grammar analysis, grammar rewriting itself, and syntax tree restructuring. Also, we describe the implementation of a parsing expression grammar framework in Squeak/Smalltalk and the possible interactions with the already existing parser Ohm/S. We quantitatively benchmarked this framework directing our focus on parsing time and the ability to use it in a live programming context. Compared with Ohm, we achieved massive parsing time improvements while preserving the ability to use our parser it as a live programming tool.

The work is essential because, for one, we outlined the difficulties and complexity that come with grammar rewriting. Also, we removed the existing limitations that came with left-recursion by eliminating them before parsing.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 135 
KW  - packrat parsing
KW  - parsing expression grammars
KW  - left recursion
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-491242
SN  - 978-3-86956-503-3
SN  - 1613-5652
SN  - 2191-1665
IS  - 135
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Freund, Rieke
A1  - Rätsch, Jan Philip
A1  - Hradilak, Franziska
A1  - Vidic, Benedikt
A1  - Heß, Oliver
A1  - Lißner, Nils
A1  - Wölert, Hendrik
A1  - Lincke, Jens
A1  - Beckmann, Tom
A1  - Hirschfeld, Robert
T1  - Implementing a crowd-sourced picture archive for Bad Harzburg
N2  - Pictures are a medium that helps make the past tangible and preserve memories. Without context, they are not able to do so. Pictures are brought to life by their associated stories. However, the older pictures become, the fewer contemporary witnesses can tell these stories.
Especially for large, analog picture archives, knowledge and memories are spread over many people. This creates several challenges: First, the pictures must be digitized to save them from decaying and make them available to the public. Since a simple listing of all the pictures is confusing, the pictures should be structured accessibly. Second, known information that makes the stories vivid needs to be added to the pictures. Users should get the opportunity to contribute their knowledge and memories. To make this usable for all interested parties, even for older, less technophile generations, the interface should be intuitive and error-tolerant.
The resulting requirements are not covered in their entirety by any existing software solution without losing the intuitive interface or the scalability of the system.
Therefore, we have developed our digital picture archive within the scope of a bachelor project in cooperation with the Bad Harzburg-Stiftung. For the implementation of this web application, we use the UI framework React in the frontend, which communicates via a GraphQL interface with the Content Management System Strapi in the backend. The use of this system enables our project partner to create an efficient process from scanning analog pictures to presenting them to visitors in an organized and annotated way. To customize the solution for both picture delivery and information contribution for our target group, we designed prototypes and evaluated them with people from Bad Harzburg. This helped us gain valuable insights into our system’s usability and future challenges as well as requirements.
Our web application is already being used daily by our project partner. During the project, we still came up with numerous ideas for additional features to further support the exchange of knowledge.
N2  - Bilder können dabei helfen, die Vergangenheit greifbar zu machen und Erinnerungen zu bewahren, doch alleinstehende Bilder ohne Kontext erreichen das nur schwer. Der große Wert besteht in den Geschichten, die mit den Bildern verbunden sind. Je älter die Bilder jedoch werden, desto weniger Zeitzeugen können von diesen Geschichten berichten.
Besonders für große analoge Bildarchive, bei denen sich das Wissen und die Erinnerungen auf viele Personen verteilen, entstehen dadurch verschiedene Herausforderungen: Zunächst müssen die Bilder digitalisiert werden, um sie vor dem Zerfall zu schützen und um sie der Öffentlichkeit zugänglich machen zu können. Da eine einfache Aufreihung aller Bilder unübersichtlich ist, sollten die Bilder in eine zugängliche Struktur gebracht werden. Des Weiteren müssen zu den Bildern bekannte Informationen, aus denen ihre Geschichten erfahrbar werden, hinzugefügt werden. Nutzende sollen die Möglichkeit haben, eigenes Wissen und Erinnerungen beizutragen. Um dies für alle Interessierten, auch für ältere, evtl. wenig technikaffine Personen, nutzbar zu machen, sollte die Oberfläche eine intuitive und fehlertolerante Nutzung ermöglichen.
Die sich daraus ergebenden Anforderungen werden von keiner existierenden Softwarelösung im Gesamten abgedeckt, ohne die intuitive Oberfläche oder die Skalierbarkeit des Systems zu verlieren.

Daher haben wir im Rahmen eines Bachelorprojekts in Zusammenarbeit mit der Bad Harzburg-Stiftung ein eigenes digitales Bildarchiv entwickelt. Für die Umsetzung dieser Webapplikation nutzen wir das UI-Framework React im Frontend, welches über eine GraphQL-Schnittstelle mit dem Content Management System Strapi im Backend kommuniziert. Die Nutzung dieses Systems ermöglicht unserem Projektpartner einen effizienten Prozess vom Scannen der analogen Bilder bis zum geordneten und annotierten Darstellen für Besuchende. Um die Lösung sowohl für das Bereitstellen der Bilder als auch für das Beitragen von Informationen auf unsere Zielgruppe zuzuschneiden, haben wir Prototypen entworfen und mit Menschen aus Bad Harzburg getestet, um ihre Eindrücke auszuwerten. Mit diesen konnten wir wertvolle Erkenntnisse über die Nutzbarkeit und noch offene Herausforderungen und Anforderungen gewinnen.
Unsere Webanwendung ist bei unserem Projektpartner bereits im täglichen Einsatz. Trotzdem haben wir während des Projekts noch zahlreiche Ideen für zusätzliche Funktionen erarbeitet, um den Wissensaustausch weiter zu fördern.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 149 
KW  - digital picture archive
KW  - analog-to-digital conversion
KW  - user-generated content
KW  - intuitive interfaces
KW  - digitales Bildarchiv
KW  - Analog-zu-Digital-Konvertierung
KW  - benutzergenerierte Inhalte
KW  - intuitive Benutzeroberflächen
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-560291
SN  - 978-3-86956-545-3
SN  - 1613-5652
SN  - 2191-1665
IS  - 149
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Schneider, Sven
A1  - Maximova, Maria
A1  - Giese, Holger
T1  - Invariant Analysis for Multi-Agent Graph Transformation Systems using k-Induction
N2  - The analysis of behavioral models such as Graph Transformation Systems (GTSs) is of central importance in model-driven engineering. However, GTSs often result in intractably large or even infinite state spaces and may be equipped with multiple or even infinitely many start graphs. To mitigate these problems, static analysis techniques based on finite symbolic representations of sets of states or paths thereof have been devised. We focus on the technique of k-induction for establishing invariants specified using graph conditions. To this end, k-induction generates symbolic paths backwards from a symbolic state representing a violation of a candidate invariant to gather information on how that violation could have been reached possibly obtaining contradictions to assumed invariants. However, GTSs where multiple agents regularly perform actions independently from each other cannot be analyzed using this technique as of now as the independence among backward steps may prevent the gathering of relevant knowledge altogether.

In this paper, we extend k-induction to GTSs with multiple agents thereby supporting a wide range of additional GTSs. As a running example, we consider an unbounded number of shuttles driving on a large-scale track topology, which adjust their velocity to speed limits to avoid derailing. As central contribution, we develop pruning techniques based on causality and independence among backward steps and verify that k-induction remains sound under this adaptation as well as terminates in cases where it did not terminate before.
N2  - Die Analyse von Verhaltensmodellen wie Graphtransformationssystemen (GTSs) ist von zentraler Bedeutung im Model Driven Engineering. GTSs führen jedoch häufig zu unhanhabbar großen oder sogar unendlichen Zustandsräumen und können mit mehreren oder sogar unendlich vielen Startgraphen ausgestattet sein. Um diese Probleme abzumildern, wurden statische Analysetechniken entwickelt, die auf endlichen symbolischen Darstellungen von Mengen von Zuständen oder Pfaden basieren. Wir konzentrieren uns auf die Technik der k-Induktion zur Ermittlung von Invarianten, die unter Verwendung von Graphbedingungen spezifiziert sind. Zum Zweck der Analyse erzeugt die k-Induktion symbolische Rückwärtspfade von einem symbolischen Zustand, der eine Verletzung einer Kandidateninvariante darstellt, um Informationen darüber zu sammeln, wie diese Verletzung erreicht werden konnte, wodurch möglicherweise Widersprüche zu angenommenen Invarianten gefunden werden. GTSs, bei denen mehrere Agenten regelmäßig unabhängig voneinander Aktionen ausführen, können derzeit jedoch nicht mit dieser Technik analysiert werden, da die Unabhängigkeit zwischen Rückwärtsschritten das Sammeln von relevantem Wissen möglicherweise verhindert.

In diesem Artikel erweitern wir die k-Induktion auf GTSs mit mehreren Agenten und unterstützen dadurch eine breite Palette zusätzlicher GTSs. Als laufendes Beispiel betrachten wir eine unbegrenzte Anzahl von Shuttles, die auf einer großen Tracktopologie fahren und die ihre Geschwindigkeit an Geschwindigkeitsbegrenzungen anpassen, um ein Entgleisen zu vermeiden. Als zentralen Beitrag entwickeln wir Beschneidungstechniken basierend auf Kausalität und Unabhängigkeit zwischen Rückwärtsschritten und verifizieren, dass die k-Induktion unter dieser Anpassung korrekt bleibt und in Fällen terminiert, in denen sie zuvor nicht terminierte.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 143 
KW  - k-inductive invariant checking
KW  - causality
KW  - parallel and sequential independence
KW  - symbolic analysis
KW  - bounded backward model checking
KW  - k-induktive Invariantenprüfung
KW  - Kausalität
KW  - parallele und Sequentielle Unabhängigkeit
KW  - symbolische Analyse
KW  - Bounded Backward Model Checking
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-545851
SN  - 978-3-86956-531-6
SN  - 1613-5652
SN  - 2191-1665
IS  - 143
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Schneider, Sven
A1  - Maximova, Maria
A1  - Giese, Holger
T1  - Probabilistic metric temporal graph logic
N2  - Cyber-physical systems often encompass complex concurrent behavior with timing constraints and probabilistic failures on demand. The analysis whether such systems with probabilistic timed behavior adhere to a given specification is essential. When the states of the system can be represented by graphs, the rule-based formalism of Probabilistic Timed Graph Transformation Systems (PTGTSs) can be used to suitably capture structure dynamics as well as probabilistic and timed behavior of the system. The model checking support for PTGTSs w.r.t. properties specified using Probabilistic Timed Computation Tree Logic (PTCTL) has been already presented. Moreover, for timed graph-based runtime monitoring, Metric Temporal Graph Logic (MTGL) has been developed for stating metric temporal properties on identified subgraphs and their structural changes over time.

In this paper, we (a) extend MTGL to the Probabilistic Metric Temporal Graph Logic (PMTGL) by allowing for the specification of probabilistic properties, (b) adapt our MTGL satisfaction checking approach to PTGTSs, and (c) combine the approaches for PTCTL model checking and MTGL satisfaction checking to obtain a Bounded Model Checking (BMC) approach for PMTGL. In our evaluation, we apply an implementation of our BMC approach in AutoGraph to a running example.
N2  - Cyber-physische Systeme umfassen häufig ein komplexes nebenläufiges Verhalten mit Zeitbeschränkungen und probabilistischen Fehlern auf Anforderung. Die Analyse, ob solche Systeme mit probabilistischem gezeitetem Verhalten einer vorgegebenen Spezifikation entsprechen, ist essentiell. Wenn die Zustände des Systems durch Graphen dargestellt werden können, kann der regelbasierte Formalismus von probabilistischen gezeiteten Graphtransformationssystemen (PTGTSs) verwendet werden, um die Strukturdynamik sowie das probabilistische und gezeitete Verhalten des Systems geeignet zu erfassen. Die Modellprüfungsunterstützung für PTGTSs bzgl. Eigenschaften, die unter Verwendung von Probabilistic Timed Computation Tree Logic (PTCTL) spezifiziert wurden, wurde bereits entwickelt. Darüber hinaus wurde das gezeitete graphenbasierte Laufzeitmonitoring mittels metrischer temporaler Graphlogik (MTGL) entwickelt, um metrische temporale Eigenschaften auf identifizierten Untergraphen und ihre strukturellen Änderungen über die Zeit zu erfassen.

In diesem Artikel (a) erweitern wir MTGL auf die probabilistische metrische temporale Graphlogik (PMTGL), indem wir die Spezifikation probabilistischer Eigenschaften zulassen, (b) passen unseren MTGL-Prüfungsansatz auf PTGTSs an und (c) kombinieren die Ansätze für PTCTL-Modellprüfung und MTGL-Prüfung, um einen beschränkten Modellprüfungsansatz (BMC-Ansatz) für PMTGL zu erhalten. In unserer Auswertung wenden wir eine Implementierung unseres BMC-Ansatzes in AutoGraph auf ein Beispiel an.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 146 
KW  - cyber-physical systems
KW  - probabilistic timed systems
KW  - qualitative analysis
KW  - quantitative analysis
KW  - bounded model checking
KW  - cyber-physische Systeme
KW  - probabilistische gezeitete Systeme
KW  - qualitative Analyse
KW  - quantitative Analyse
KW  - Bounded Model Checking
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-545867
SN  - 978-3-86956-532-3
SN  - 1613-5652
SN  - 2191-1665
IS  - 146
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Klinke, Paula
A1  - Verhoeven, Silvan
A1  - Roth, Felix
A1  - Hagemann, Linus
A1  - Alnawa, Tarik
A1  - Lincke, Jens
A1  - Rein, Patrick
A1  - Hirschfeld, Robert
T1  - Tool support for collaborative creation of interactive storytelling media
N2  - Scrollytellings are an innovative form of web content. Combining the benefits of books, images, movies, and video games, they are a tool to tell compelling stories and provide excellent learning opportunities. Due to their multi-modality, creating high-quality scrollytellings is not an easy task. Different professions, such as content designers, graphics designers, and developers, need to collaborate to get the best out of the possibilities the scrollytelling format provides. Collaboration unlocks great potential. However, content designers cannot create scrollytellings directly and always need to consult with developers to implement their vision. This can result in misunderstandings. Often, the resulting scrollytelling will not match the designer’s vision sufficiently, causing unnecessary iterations. Our project partner Typeshift specializes in the creation of individualized scrollytellings for their clients. Examined existing solutions for authoring interactive content are not optimally suited for creating highly customized scrollytellings while still being able to manipulate all their elements programmatically. Based on their experience and expertise, we developed an editor to author scrollytellings in the lively.next live-programming environment. In this environment, a graphical user interface for content design is combined with powerful possibilities for programming behavior with the morphic system. The editor allows content designers to take on large parts of the creation process of scrollytellings on their own, such as creating the visible elements, animating content, and fine-tuning the scrollytelling. Hence, developers can focus on interactive elements such as simulations and games. Together with Typeshift, we evaluated the tool by recreating an existing scrollytelling and identified possible future enhancements. Our editor streamlines the creation process of scrollytellings. Content designers and developers can now both work on the same scrollytelling. Due to the editor inside of the lively.next environment, they can both work with a set of tools familiar to them and their traits. Thus, we mitigate unnecessary iterations and misunderstandings by enabling content designers to realize large parts of their vision of a scrollytelling on their own. Developers can add advanced and individual behavior. Thus, developers and content designers benefit from a clearer distribution of tasks while keeping the benefits of collaboration.
N2  - Scrollytellings sind innovative Webinhalte. Indem sie die Vorteile von Büchern, Bildern, Filmen und Videospielen vereinen, sind sie ein Werkzeug um Geschichten fesselnd zu erzählen und Lehrinhalte besonders effektiv zu vermitteln. Die Erstellung von Scrollytellings ist aufgrund ihrer Multimodalität keine einfache Aufgabe. Verschiedene Berufszweige wie Content-Designer:innen, Grafikdesigner:innen und Entwickler:innen müssen zusammenarbeiten, um das volle Potential des Scrollytelingformats auszuschöpfen. Jedoch können ContentDesigner:innen Scrollytellings nicht direkt selbst erstellen, sondern müssen ihre Vision stets gemeinsam mit Entwickler:innen umsetzen. Dabei können unnötige Iterationen über das Scrollytelling auftreten, wenn dieses den Visionen der Content-Designer:innen noch nicht entspricht. Außerdem können Missverständnisse entstehen. Unser Projektpartner Typeshift hat sich auf die Erstellung von, für seine Kund:innen individualisierten, Scrollytellings spezialisiert. Aufbauend auf Typeshifts Erfahrungen und Expertise haben wir einen Editor entwickelt, um Scrollytellings in der Live-Programmierumgebung lively.next zu erstellen. In lively.next wird eine graphische Oberfläche für die Erstellung von Inhalten mit weitreichenden Möglichkeiten zur Programmierung von Verhalten durch das Morphic-System kombiniert. Der Editor erlaubt es Content-Designer:innen eigenständig große Teile des Erstellungsprozesses von Scrollytellings durchzuführen, zum Beispiel das Erzeugen visueller Elemente, deren Animation sowie die Feinjustierung des gesamten Scrollytellings. So können Entwickler:innen sich auf die Erstellung von komplexen interaktiven Elementen, wie Simulationen oder Spiele, konzentrieren. Zusammen mit Typeshift haben wir die Nutzbarkeit unseres Editors durch die Nachbildung eines bereits existierenden Scrollytellings evaluiert und mögliche Verbesserungen identifiziert. Unser Editor vereinfacht den Erstellungsprozess von Scrollytellings. Content Designer:innen und Entwickler:innen können jetzt beide an demselben Scrollytelling arbeiten. Durch den Editor, der in lively.next integriert ist, können beide Parteien mit den ihnen bekannten und vertrauten Werkzeugen arbeiten. Durch den Editor verringern wir unnötige Iterationen und Missverständnisse und erlauben Content-Designer:innen große Teile ihrer Vision eines Scrollytellings eigenständig umzusetzen. Entwickler:innen können zusätzliches, individuelles Verhalten hinzufügen. So profitieren Entwickler:innen und Content-Designer:innen von einer besseren Aufgabenteilung, während die Vorteile von Zusammenarbeit bestehen bleiben.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 141 
KW  - scrollytelling
KW  - interactive media
KW  - web-based development
KW  - Lively Kernel
KW  - Scrollytelling
KW  - interaktive Medien
KW  - webbasierte Entwicklung
KW  - Lively Kernel
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-518570
SN  - 978-3-86956-521-7
SN  - 1613-5652
SN  - 2191-1665
IS  - 141
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Dürsch, Falco
A1  - Rein, Patrick
A1  - Mattis, Toni
A1  - Hirschfeld, Robert
T1  - Learning from failure
BT  - a history-based, lightweight test prioritization technique connecting software changes to test failures
N2  - Regression testing is a widespread practice in today's software industry to ensure software product quality. Developers derive a set of test cases, and execute them frequently to ensure that their change did not adversely affect existing functionality. As the software product and its test suite grow, the time to feedback during regression test sessions increases, and impedes programmer productivity: developers wait longer for tests to complete, and delays in fault detection render fault removal increasingly difficult.

Test case prioritization addresses the problem of long feedback loops by reordering test cases, such that test cases of high failure probability run first, and test case failures become actionable early in the testing process. We ask, given test execution schedules reconstructed from publicly available data, to which extent can their fault detection efficiency improved, and which technique yields the most efficient test schedules with respect to APFD?

To this end, we recover regression 6200 test sessions from the build log files of Travis CI, a popular continuous integration service, and gather 62000 accompanying changelists. We evaluate the efficiency of current test schedules, and examine the prioritization results of state-of-the-art lightweight, history-based heuristics. We propose and evaluate a novel set of prioritization algorithms, which connect software changes and test failures in a matrix-like data structure.

Our studies indicate that the optimization potential is substantial, because the existing test plans score only 30% APFD. The predictive power of past test failures proves to be outstanding: simple heuristics, such as repeating tests with failures in recent sessions, result in efficiency scores of 95% APFD. The best-performing matrix-based heuristic achieves a similar score of 92.5% APFD. In contrast to prior approaches, we argue that matrix-based techniques are useful beyond the scope of effective prioritization, and enable a number of use cases involving software maintenance.

We validate our findings from continuous integration processes by extending a continuous testing tool within development environments with means of test prioritization, and pose further research questions. We think that our findings are suited to propel adoption of (continuous) testing practices, and that programmers' toolboxes should contain test prioritization as an existential productivity tool.
N2  - Regressionstests sind in der heutigen Softwareindustrie weit verbreitete Praxis um die Qualität eines Softwareprodukts abzusichern. Dabei leiten Entwickler von den gestellten Anforderungen Testfälle ab und führen diese wiederholt aus, um sicherzustellen, dass ihre Änderungen die bereits existierende Funktionalität nicht negativ beeinträchtigen. Steigt die Größe und Komplexität der Software und ihrer Testsuite, so wird die Feedbackschleife der Testausführungen länger, und mindert die Produktivität der Entwickler: Sie warten länger auf das Testergebnis, und die Fehlerbehebung gestaltet sich umso schwieriger, je länger die Ursache zurückliegt.

Um die Feedbackschleife zu verkürzen, ändern Testpriorisierungs-Algorithmen die Reihenfolge der Testfälle, sodass Testfälle, die mit hoher Wahrscheinlichkeit fehlschlagen, zuerst ausgeführt werden. Der vorliegende Bericht beschäftigt sich mit der Frage nach der Effizienz von Testplänen, welche aus öffentlich einsehbaren Daten rekonstruierbar sind, und welche anwendbaren Priorisierungs-Techniken die effizienteste Testreihenfolge in Bezug auf APFD hervorbringen.

Zu diesem Zweck werden 6200 Testsitzungen aus den Logdateien von Travis CI, einem oft verwendeten Dienst für Continuous Integration, und über 62000 Änderungslisten rekonstruiert. Auf dieser Grundlage wird die Effizienz der derzeitigen Testpläne bewertet, als auch solcher, die aus der Neupriorisierung durch leichtgewichtige, verlaufsbasierte Algorithmen hervorgehen. Zudem schlägt der vorliegende Bericht eine neue Gruppe von Ansätzen vor, die Testfehlschläge und Softwareänderungen mit Hilfe einer Matrix in Bezug setzt.

Da die beobachteten Testreihenfolgen nur 30% APFD erzielen, liegt wesentliches Potential für Optimierung vor. Dabei besticht die Vorhersagekraft der unmittelbar vorangegangen Testfehlschläge: einfache Heuristiken, wie das Wiederholen von Tests, welche kürzlich fehlgeschlagen sind, führen zu Testplänen mit einer Effizienz von 95% APFD. Matrix-basierte Ansätze erreichen eine Fehlererkennungsrate von bis zu 92.5% APFD. Im Gegensatz zu den bisher bekannten Ansätzen sind die matrix-basierten Techniken auch über den Zweck der Testpriorisierung hinaus nützlich, und sind in der Softwarewartung anwendbar.

Zusätzlich werden die Ergebnisse der vorliegenden Studie für Continuous Integration Systeme im Kontext integrierter Entwicklungsumgebungen validiert, indem ein Tool für Continuous Testing um Testpriorisierung erweitert wird. Dies führt zu neuen Forschungsfragen. Die Untersuchungsergebnisse sind geeignet die Einführung von Continuous Testing zu befördern, und untermauern, dass Werkzeuge der Testpriorisierung für produktive Softwareentwicklung essenziell sind.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 145 
KW  - test case prioritization
KW  - continuous integration
KW  - regression testing
KW  - version control
KW  - live programming
KW  - heuristics
KW  - data set
KW  - test results
KW  - GitHub
KW  - Java
KW  - Testpriorisierungs
KW  - kontinuierliche Integration
KW  - Regressionstests
KW  - Versionsverwaltung
KW  - Live-Programmierung
KW  - Heuristiken
KW  - Datensatz
KW  - Testergebnisse
KW  - GitHub
KW  - Java
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-537554
SN  - 978-3-86956-528-6
SN  - 1613-5652
SN  - 2191-1665
IS  - 145
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Bläsius, Thomas
A1  - Friedrich, Tobias
A1  - Lischeid, Julius
A1  - Meeks, Kitty
A1  - Schirneck, Friedrich Martin
T1  - Efficiently enumerating hitting sets of hypergraphs arising in data profiling
JF  - Journal of computer and system sciences : JCSS
N2  - The transversal hypergraph problem asks to enumerate the minimal hitting sets of a hypergraph. If the solutions have bounded size, Eiter and Gottlob [SICOMP'95] gave an algorithm running in output-polynomial time, but whose space requirement also scales with the output. We improve this to polynomial delay and space. Central to our approach is the extension problem, deciding for a set X of vertices whether it is contained in any minimal hitting set. We show that this is one of the first natural problems to be W[3]-complete. We give an algorithm for the extension problem running in time O(m(vertical bar X vertical bar+1) n) and prove a SETH-lower bound showing that this is close to optimal. We apply our enumeration method to the discovery problem of minimal unique column combinations from data profiling. Our empirical evaluation suggests that the algorithm outperforms its worst-case guarantees on hypergraphs stemming from real-world databases.
KW  - Data profiling
KW  - Enumeration algorithm
KW  - Minimal hitting set
KW  - Transversal hypergraph
KW  - Unique column combination
KW  - W[3]-Completeness
Y1  - 2022
U6  - https://doi.org/10.1016/j.jcss.2021.10.002
SN  - 0022-0000
SN  - 1090-2724
VL  - 124
SP  - 192
EP  - 213
PB  - Elsevier
CY  - San Diego
ER  - 
TY  - JOUR
A1  - von Steinau-Steinrück, Robert
A1  - Kurth, Paula Sophie
T1  - Das reformierte Statusfeststellungsverfahren in der Praxis
JF  - NJW spezial
N2  - Das Statusfeststellungsverfahren ermöglicht auf Antrag bei der alleinzuständigen Deutschen Rentenversicherung Bund den Erhalt einer verbindlichen Einschätzung der häufig komplizierten und folgenschweren Abgrenzung einer selbstständigen Tätigkeit von einer abhängigen Beschäftigung. Zum 1.4.2022 wurde das Statusfeststellungsverfahren umfassend reformiert. In der Praxis haben sich die eingeführten Novellierungen bislang unterschiedlich bewährt.
Y1  - 2022
UR  - https://beck-online.beck.de/Bcid/Y-300-Z-NJW-SPEZIAL-B-2022-S-754-N-1
SN  - 1613-4621
VL  - 19
IS  - 24
SP  - 754
EP  - 755
PB  - C.H. Beck
CY  - München
ER  - 
TY  - JOUR
A1  - von Steinau-Steinrück, Robert
A1  - Miller, Denis
T1  - Rückzahlungsklauseln für Fortbildungen
BT  - typische Fehler
JF  - Neue juristische Wochenschrift : NJW Spezial
N2  - Mit Urteil vom 1.3.2022 (NZA2022, NZA Jahr 2022 Seite 780) hat das BAG erneut über die Wirksamkeit einer Rückzahlungsklausel in einer Fortbildungsvereinbarung entschieden. Die Entscheidung reiht sich in eine nicht leicht zu durchschauende Anzahl von Urteilen hierzu ein. Sie dient uns zum Anlass, einen Überblick über die Rechtsprechung zu geben.
Y1  - 2022
UR  - https://beck-online.beck.de/Bcid/Y-300-Z-NJW-SPEZIAL-B-2022-S-370-N-1
SN  - 1613-4621
VL  - 19
IS  - 12
SP  - 370
EP  - 371
PB  - C.H. Beck
CY  - München
ER  - 
TY  - JOUR
A1  - von Steinau-Steinrück, Robert
A1  - Höltge, Clara
T1  - Krieg in Europa
BT  - Beschäftigung ukrainischer Geflüchteter in Deutschland
JF  - NJW spezial
N2  - Am 24.2.2022 begann der russische Angriffskrieg in der Ukraine. Seitdem fliehen täglich zahlreiche ukrainische Staatsbürger in die Europäische Union, viele davon nach Deutschland. Vorrangig ist jetzt die Sicherung der Grundbedürfnisse, wie Verpflegung, Unterkunft und medizinischer Versorgung. Daneben fragen sich Arbeitgeber, wie sie ukrainische Staatsbürger möglichst schnell beschäftigen können. Wir geben einen Überblick über die Möglichkeiten, ukrainische Geflüchtete möglichst schnell in den deutschen Arbeitsmarkt zu integrieren.
Y1  - 2022
UR  - https://beck-online.beck.de/Bcid/Y-300-Z-NJW-SPEZIAL-B-2022-S-242-N-1
SN  - 1613-4621
VL  - 19
IS  - 8
SP  - 242
EP  - 243
PB  - C.H. Beck
CY  - München
ER  - 
TY  - THES
A1  - Elsaid, Mohamed Esameldin Mohamed
T1  - Virtual machines live migration cost modeling and prediction
T1  - Modellierung und Vorhersage der Live-Migrationskosten für Virtuelle Maschinen
N2  - Dynamic resource management is an essential requirement for private and public cloud computing environments. With dynamic resource management, the physical resources assignment to the cloud virtual resources depends on the actual need of the applications or the running services, which enhances the cloud physical resources utilization and reduces the offered services cost. In addition, the virtual resources can be moved across different physical resources in the cloud environment without an obvious impact on the running applications or services production. This means that the availability of the running services and applications in the cloud is independent on the hardware resources including the servers, switches and storage failures. This increases the reliability of using cloud services compared to the classical data-centers environments.
In this thesis we briefly discuss the dynamic resource management topic and then deeply focus on live migration as the definition of the compute resource dynamic management. Live migration is a commonly used and an essential feature in cloud and virtual data-centers environments. Cloud computing load balance, power saving and fault tolerance features are all dependent on live migration to optimize the virtual and physical resources usage. As we will discuss in this thesis, live migration shows many benefits to cloud and virtual data-centers environments, however the cost of live migration can not be ignored. Live migration cost includes the migration time, downtime, network overhead, power consumption increases and CPU overhead.
IT admins run virtual machines live migrations without an idea about the migration cost. So, resources bottlenecks, higher migration cost and migration failures might happen. The first problem that we discuss in this thesis is how to model the cost of the virtual machines live migration. Secondly, we investigate how to make use of machine learning techniques to help the cloud admins getting an estimation of this cost before initiating the migration for one of multiple virtual machines. Also, we discuss the optimal timing for a specific virtual machine before live migration to another server. Finally, we propose practical solutions that can be used by the cloud admins to be integrated with the cloud administration portals to answer the raised research questions above.
Our research methodology to achieve the project objectives is to propose empirical models based on using VMware test-beds with different benchmarks tools. Then we make use of the machine learning techniques to propose a prediction approach for virtual machines live migration cost. Timing optimization for live migration is also proposed in this thesis based on using the cost prediction and data-centers network utilization prediction. Live migration with persistent memory clusters is also discussed at the end of the thesis. The cost prediction and timing optimization techniques proposed in this thesis could be practically integrated with VMware vSphere cluster portal such that the IT admins can now use the cost prediction feature and timing optimization option before proceeding with a virtual machine live migration.
Testing results show that our proposed approach for VMs live migration cost prediction shows acceptable results with less than 20% prediction error and can be easily implemented and integrated with VMware vSphere as an example of a commonly used resource management portal for virtual data-centers and private cloud environments. The results show that using our proposed VMs migration timing optimization technique also could save up to 51% of migration time of the VMs migration time for memory intensive workloads and up to 27% of the migration time for network intensive workloads. This timing optimization technique can be useful for network admins to save migration time with utilizing higher network rate and higher probability of success.
At the end of this thesis, we discuss the persistent memory technology as a new trend in servers memory technology. Persistent memory modes of operation and configurations are discussed in detail to explain how live migration works between servers with different memory configuration set up. Then, we build a VMware cluster with persistent memory inside server and also with DRAM only servers to show the live migration cost difference between the VMs with DRAM only versus the VMs with persistent memory inside.
N2  - Die dynamische Ressourcenverwaltung ist eine wesentliche Voraussetzung für private und öffentliche Cloud-Computing-Umgebungen. Bei der dynamischen Ressourcenverwaltung hängt die Zuweisung der physischen Ressourcen zu den virtuellen Cloud-Ressourcen vom tatsächlichen Bedarf der Anwendungen oder der laufenden Dienste ab, was die Auslastung der physischen Cloud-Ressourcen verbessert und die Kosten für die angebotenen Dienste reduziert. Darüber hinaus können die virtuellen Ressourcen über verschiedene physische Ressourcen in der Cloud-Umgebung verschoben werden, ohne dass dies einen offensichtlichen Einfluss auf die laufenden Anwendungen oder die Produktion der Dienste hat. Das bedeutet, dass die Verfügbarkeit der laufenden Dienste und Anwendungen in der Cloud unabhängig von den Hardwareressourcen einschließlich der Server, Netzwerke und Speicherausfälle ist. Dies erhöht die Zuverlässigkeit bei der Nutzung von Cloud-Diensten im Vergleich zu klassischen Rechenzentrumsumgebungen.
In dieser Arbeit wird das Thema der dynamischen Ressourcenverwaltung kurz erörtert, um sich dann eingehend mit der Live-Migration als Definition der dynamischen Verwaltung von Compute-Ressourcen zu beschäftigen. Live-Migration ist eine häufig verwendete und wesentliche Funktion in Cloud- und virtuellen Rechenzentrumsumgebungen. Cloud-Computing-Lastausgleich, Energiespar- und Fehlertoleranzfunktionen sind alle von der Live-Migration abhängig, um die Nutzung der virtuellen und physischen Ressourcen zu optimieren. Wie wir in dieser Arbeit erörtern werden, zeigt die Live-Migration viele Vorteile für Cloud- und virtuelle Rechenzentrumsumgebungen, jedoch können die Kosten der Live-Migration nicht ignoriert werden. Zu den Kosten der Live-Migration gehören die Migrationszeit, die Ausfallzeit, der Netzwerk-Overhead, der Anstieg des Stromverbrauchs und der CPU-Overhead.
IT-Administratoren führen Live-Migrationen von virtuellen Maschinen durch, ohne eine Vorstellung von den Migrationskosten zu haben. So kann es zu Ressourcenengpässen, höheren Migrationskosten und Migrationsfehlern kommen. Das erste Problem, das wir in dieser Arbeit diskutieren, ist, wie man die Kosten der Live-Migration virtueller Maschinen modellieren kann. Zweitens untersuchen wir, wie maschinelle Lerntechniken eingesetzt werden können, um den Cloud-Administratoren zu helfen, eine Schätzung dieser Kosten zu erhalten, bevor die Migration für eine oder mehrere virtuelle Maschinen eingeleitet wird. Außerdem diskutieren wir das optimale Timing für eine bestimmte virtuelle Maschine vor der Live-Migration auf einen anderen Server. Schließlich schlagen wir praktische Lösungen vor, die von den Cloud-Admins verwendet werden können, um in die Cloud-Administrationsportale integriert zu werden, um die oben aufgeworfenen Forschungsfragen zu beantworten.
Unsere Forschungsmethodik zur Erreichung der Projektziele besteht darin, empirische Modelle vorzuschlagen, die auf der Verwendung von VMware-Testbeds mit verschiedenen Benchmark-Tools basieren. Dann nutzen wir die Techniken des maschinellen Lernens, um einen Vorhersageansatz für die Kosten der Live-Migration virtueller Maschinen vorzuschlagen. Die Timing-Optimierung für die Live-Migration wird ebenfalls in dieser Arbeit vorgeschlagen, basierend auf der Kostenvorhersage und der Vorhersage der Netzwerkauslastung des Rechenzentrums. Die Live-Migration mit Clustern mit persistentem Speicher wird ebenfalls am Ende der Arbeit diskutiert.
Die in dieser Arbeit vorgeschlagenen Techniken zur Kostenvorhersage und Timing-Optimierung könnten praktisch in das VMware vSphere-Cluster-Portal integriert werden, so dass die IT-Administratoren nun die Funktion zur Kostenvorhersage und die Option zur Timing-Optimierung nutzen können, bevor sie mit einer Live-Migration der virtuellen Maschine fortfahren.
Die Testergebnisse zeigen, dass unser vorgeschlagener Ansatz für die VMs-Live-Migrationskostenvorhersage akzeptable Ergebnisse mit weniger als 20\% Fehler in der Vorhersagegenauigkeit zeigt und leicht implementiert und in VMware vSphere als Beispiel für ein häufig verwendetes Ressourcenmanagement-Portal für virtuelle Rechenzentren und private Cloud-Umgebungen integriert werden kann. Die Ergebnisse zeigen, dass mit der von uns vorgeschlagenen Technik zur Timing-Optimierung der VMs-Migration auch bis zu 51\% der Migrationszeit für speicherintensive Workloads und bis zu 27\% der Migrationszeit für netzwerkintensive Workloads eingespart werden können. Diese Timing-Optimierungstechnik kann für Netzwerkadministratoren nützlich sein, um Migrationszeit zu sparen und dabei eine höhere Netzwerkrate und eine höhere Erfolgswahrscheinlichkeit zu nutzen.
Am Ende dieser Arbeit wird die persistente Speichertechnologie als neuer Trend in der Server-Speichertechnologie diskutiert. Die Betriebsarten und Konfigurationen des persistenten Speichers werden im Detail besprochen, um zu erklären, wie die Live-Migration zwischen Servern mit unterschiedlichen Speicherkonfigurationen funktioniert. Dann bauen wir einen VMware-Cluster mit persistentem Speicher im Server und auch mit Servern nur mit DRAM auf, um den Kostenunterschied bei der Live-Migration zwischen den VMs mit nur DRAM und den VMs mit persistentem Speicher im Server zu zeigen.
KW  - virtual
KW  - cloud
KW  - computing
KW  - machines
KW  - live migration
KW  - machine learning
KW  - prediction
KW  - Wolke
KW  - Computing
KW  - Live-Migration
KW  - maschinelles Lernen
KW  - Maschinen
KW  - Vorhersage
KW  - virtuell
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-540013
ER  - 
TY  - BOOK
A1  - Meinel, Christoph
A1  - John, Catrina
A1  - Wollowski, Tobias
T1  - Die HPI Schul-Cloud –  Von der Vision zur digitale Infrastruktur für deutsche Schulen
N2  - Digitale Medien sind aus unserem Alltag kaum noch wegzudenken. Einer der zentralsten Bereiche für unsere Gesellschaft, die schulische Bildung, darf hier nicht hintanstehen. Wann immer der Einsatz digital unterstützter Tools pädagogisch sinnvoll ist, muss dieser in einem sicheren Rahmen ermöglicht werden können. Die HPI Schul-Cloud ist dieser Vision gefolgt, die vom Nationalen IT-Gipfel 2016 angestoßen wurde und dem Bericht vorangestellt ist – gefolgt. Sie hat sich in den vergangenen fünf Jahren vom Pilotprojekt zur unverzichtbaren IT-Infrastruktur für zahlreiche Schulen entwickelt. Während der Corona-Pandemie hat sie für viele Tausend Schulen wichtige Unterstützung bei der Umsetzung ihres Bildungsauftrags geboten. Das Ziel, eine zukunftssichere und datenschutzkonforme Infrastruktur zur digitalen Unterstützung des Unterrichts zur Verfügung zu stellen, hat sie damit mehr als erreicht. Aktuell greifen rund 1,4 Millionen Lehrkräfte und Schülerinnen und Schüler bundesweit und an den deutschen Auslandsschulen auf die HPI Schul-Cloud zu.
N2  - It is hard to imagine our everyday lives without digital media. One of the most central areas for our society, school education, must not be left behind. Whenever the use of digitally supported tools makes pedagogical sense, it must be possible to enable it within a secure framework. The HPI School Cloud has followed this vision, which was initiated by the 2016 National IT Summit and precedes the report. Over the past five years, it has evolved from a pilot project to an indispensable IT infrastructure for numerous schools. During the Corona pandemic, it provided important support for many thousands of schools in implementing their educational mission. It has thus more than achieved its goal of providing a future-proof and data-protection-compliant infrastructure for digital support of teaching. Currently, around 1.4 million teachers and students nationwide and at German schools abroad access the HPI School Cloud.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 144 
KW  - digitale Infrastruktur für den Schulunterricht
KW  - digital unterstützter Unterricht
KW  - Datenschutz-sicherer Einsatz in der Schule
KW  - Unterricht mit digitalen Medien
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-535860
SN  - 978-3-86956-526-2
SN  - 1613-5652
SN  - 2191-1665
IS  - 144
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Bartz, Christian
T1  - Reducing the annotation burden: deep learning for optical character recognition using less manual annotations
N2  - Text is a ubiquitous entity in our world and daily life. We encounter it nearly everywhere in shops, on the street, or in our flats. Nowadays, more and more text is contained in digital images. These images are either taken using cameras, e.g., smartphone cameras, or taken using scanning devices such as document scanners. The sheer amount of available data, e.g., millions of images taken by Google Streetview, prohibits manual analysis and metadata extraction. Although much progress was made in the area of optical character recognition (OCR) for printed text in documents, broad areas of OCR are still not fully explored and hold many research challenges. With the mainstream usage of machine learning and especially deep learning, one of the most pressing problems is the availability and acquisition of annotated ground truth for the training of machine learning models because obtaining annotated training data using manual annotation mechanisms is time-consuming and costly. In this thesis, we address of how we can reduce the costs of acquiring ground truth annotations for the application of state-of-the-art machine learning methods to optical character recognition pipelines. To this end, we investigate how we can reduce the annotation cost by using only a fraction of the typically required ground truth annotations, e.g., for scene text recognition systems. We also investigate how we can use synthetic data to reduce the need of manual annotation work, e.g., in the area of document analysis for archival material. In the area of scene text recognition, we have developed a novel end-to-end scene text recognition system that can be trained using inexact supervision and shows competitive/state-of-the-art performance on standard benchmark datasets for scene text recognition. Our method consists of two independent neural networks, combined using spatial transformer networks. Both networks learn together to perform text localization and text recognition at the same time while only using annotations for the recognition task. We apply our model to end-to-end scene text recognition (meaning localization and recognition of words) and pure scene text recognition without any changes in the network architecture.

In the second part of this thesis, we introduce novel approaches for using and generating synthetic data to analyze handwriting in archival data. First, we propose a novel preprocessing method to determine whether a given document page contains any handwriting. We propose a novel data synthesis strategy to train a classification model and show that our data synthesis strategy is viable by evaluating the trained model on real images from an archive. Second, we introduce the new analysis task of handwriting classification. Handwriting classification entails classifying a given handwritten word image into classes such as date, word, or number. Such an analysis step allows us to select the best fitting recognition model for subsequent text recognition; it also allows us to reason about the semantic content of a given document page without the need for fine-grained text recognition and further analysis steps, such as Named Entity Recognition. We show that our proposed approaches work well when trained on synthetic data. Further, we propose a flexible metric learning approach to allow zero-shot classification of classes unseen during the network’s training. Last, we propose a novel data synthesis algorithm to train off-the-shelf pixel-wise semantic segmentation networks for documents. Our data synthesis pipeline is based on the famous Style-GAN architecture and can synthesize realistic document images with their corresponding segmentation annotation without the need for any annotated data!
N2  - Text umgibt uns überall. Wir finden Text in allen Lebenslagen, z.B. in einem Geschäft, an Gebäuden, oder in unserer Wohnung. Viele dieser Textentitäten können heutzutage auch in digitalen Bildern gefunden werden, welche auf verschiedene Art und Weise erstellt werden können, z.B. mittels einer Kamera in einem Smartphone oder durch einen Dokumentenscanner. Die Anzahl verfügbarer digitaler Bilder, z.B. Millionen – wenn nicht Milliarden von Bildern – in Google Streetview, macht eine manuelle Analyse der Bilddaten unmöglich. Obwohl es im Gebiet der Optical Character Recognition (OCR) in den letzten Jahren viel Fortschritt gab, gibt es doch noch viele Bereiche, die noch nicht vollständig erforscht worden sind. Der immer zunehmende Einsatz von Methoden des maschinellen Lernens, insbesondere der Einsatz von Deep Learning Technologien, im Bereich der OCR, führt zu dem großen Problem der Verfügbarkeit von annotierten Trainingsdaten. Die Beschaffung annotierter Daten mittels manueller Annotation ist zeitintensiv und sehr teuer. In dieser Arbeit zeigen wir neue Wege und Verfahren auf, wie das Problem der Beschaffung annotierter Daten für die Anwendung von modernsten Deep Learning Verfahren im Bereich der OCR gelöst werden könnte. Hierbei zeigen wir neue Verfahren in zwei Unterbereichen der OCR. Einerseits untersuchen wir, wie wir die Annotationskosten reduzieren könnten, indem wir inexakte Annotationen benutzen um z.B. die Kosten der Annotation von echten Daten im Bereich der Texterkennung aus natürlichen Bildern zu reduzieren. Dieses System wird mittels weak supervision trainiert und erreicht Ergebnisse, die auf dem Stand der Technik bzw. darüber liegen. Unsere Methode basiert auf zwei unabhängigen neuronalen Netzwerken, die mittels eines Spatial Transformers verbunden werden. Beide Netzwerke werden zusammen trainiert und lernen zusammen, wie Text gefunden und gelesen werden kann. Dabei nutzen wir aber nur Annotationen und Supervision für das Lesen (recognition) des Textes, nicht für die Textfindung. Wir zeigen weiterhin, dass unser System für eine Mehrzahl von Aufgaben im Bereich der Texterkennung aus natürlichen Bildern genutzt werden kann, ohne Veränderungen im Netzwerk vornehmen zu müssen. Andererseits untersuchen wir, wie wir Verfahren zur Erstellung von synthetischen Daten benutzen können, um die Kosten und den Aufwand der manuellen Annotation zu verringern und zeigen Ergebnisse aus dem Bereich der Analyse von Handschrift in historischen Archivdokumenten. Zuerst präsentieren wir ein System zur Erkennung, ob ein Bild überhaupt Handschrift enthält. Hier schlagen wir eine neue Datengenerierungsmethode vor. Die generierten Daten werden zum Training eines Klassifizierungsmodells genutzt. Unsere experimentellen Ergebnisse belegen, dass unsere Idee auch auf echten Daten aus einem Archiv eingesetzt werden kann.

Als Zweites führen wir einen neuen Schritt in einer Dokumentenanalyseplattform ein: Handschriftklassifizierung. Hier ordnen wir Bilder einzelner handgeschriebener Wörter anhand ihrer visuellen Struktur in Klassen, wie Zahlen, Datumsangaben oder Wörter ein. Die Einführung dieses Analyseschrittes erlaubt es uns den besten Algorithmus für den nächsten Schritt, die eigentliche Handschrifterkennung, zu finden. Der Analyseschritt erlaubt es uns auch, bereits Aussagen über den semantischen Inhalt eines Dokumentes zu treffen, ohne weitere Analyseschritte, wie Named Entity Recognition, durchführen zu müssen. Wir zeigen, dass unser Ansatz sehr gut funktioniert, wenn er auf synthetischen Daten trainiert wird; wir zeigen weiterhin, dass unser Ansatz auch für zero-shot Klassifikation eingesetzt werden kann. Zum Schluss präsentieren wir ein neues Verfahren zur Generierung von Trainingsdaten für die pixelgenaue semantische Segmentierung in Bildern von Dokumenten. Unser Verfahren basiert auf der bekannten StyleGAN Architektur und ist in der Lage Bilder mit entsprechender Annotation automatisch zu generieren. Hierbei werden keine echten annotierten Daten benötigt und das Verfahren kann auf jeder Form von Dokumenten eingesetzt werden.
KW  - computer vision
KW  - optical character recognition
KW  - archive analysis
KW  - data synthesis
KW  - weak supervision
KW  - Archivanalyse
KW  - maschinelles Sehen
KW  - Datensynthese
KW  - Texterkennung
KW  - schwach überwachtes maschinelles Lernen
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-555407
ER  - 
TY  - BOOK
A1  - Meinel, Christoph
A1  - Willems, Christian
A1  - Staubitz, Thomas
A1  - Sauer, Dominic
A1  - Hagedorn, Christiane
T1  - openHPI
T1  - openHPI
BT  - 10 Years of MOOCs at the Hasso Plattner Institute
BT  - 10 Jahre MOOCs am Hasso-Plattner-Institut
N2  - On the occasion of the 10th openHPI anniversary, this technical report provides information about the HPI MOOC platform, including its core features, technology, and architecture.

In an introduction, the platform family with all partner platforms is presented; these now amount to nine platforms, including openHPI. This section introduces openHPI as an advisor and research partner in various projects. 

In the second chapter, the functionalities and common course formats of the platform are presented. The functionalities are divided into learner and admin features. The learner features section provides detailed information about performance records, courses, and the learning materials of which a course is composed: videos, texts, and quizzes. In addition, the learning materials can be enriched by adding external exercise tools that communicate with the HPI MOOC platform via the Learning Tools Interoperability (LTI) standard. Furthermore, the concept of peer assessments completed the possible learning materials.
The section then proceeds with further information on the discussion forum, a fundamental concept of MOOCs compared to traditional e-learning offers. The section is concluded with a description of the quiz recap, learning objectives, mobile applications, gameful learning, and the help desk.

The next part of this chapter deals with the admin features. The described functionality is restricted to describing the news and announcements, dashboards and statistics, reporting capabilities, research options with A/B testing, the course feed, and the TransPipe tool to support the process of creating automated or manual subtitles. The platform supports a large variety of additional features, but a detailed description of these features goes beyond the scope of this report.
The chapter then elaborates on common course formats and openHPI teaching activities at the HPI. The chapter concludes with some best practices for course design and delivery.

The third chapter provides insights into the technology and architecture behind openHPI. A special characteristic of the openHPI project is the conscious decision to operate the complete application from bare metal to platform development. Hence, the chapter starts with a section about the openHPI Cloud, including detailed information about the data center and devices, the used cloud software OpenStack and Ceph, as well as the openHPI Cloud Service provided for the HPI.

Afterward, a section on the application technology stack and development tooling describes the application infrastructure components, the used automation, the deployment pipeline, and the tools used for monitoring and alerting. The chapter is concluded with detailed information about the technology stack and concrete platform implementation details. The section describes the service-oriented Ruby on Rails application, inter-service communication, and public APIs. It also provides more information on the design system and components used in the application. The section concludes with a discussion of the original microservice architecture, where we share our insights and reasoning for migrating back to a monolithic application.

The last chapter provides a summary and an outlook on the future of digital education.
N2  - Anlässlich des 10-jährigen Jubiläums von openHPI informiert dieser technische Bericht über die HPI-MOOC-Plattform einschließlich ihrer Kernfunktionen, Technologie und Architektur.
In einer Einleitung wird die Plattformfamilie mit allen Partnerplattformen vorgestellt; diese belaufen sich inklusive openHPI aktuell auf neun Plattformen. In diesem Abschnitt wird außerdem gezeigt, wie openHPI als Berater und Forschungspartner in verschiedenen Projekten fungiert. 

Im zweiten Kapitel werden die Funktionalitäten und gängigen Kursformate der Plattform präsentiert. Die Funktionalitäten sind in Lerner- und Admin-Funktionen unterteilt. Der Bereich Lernerfunktionen bietet detaillierte Informationen zu Leistungsnachweisen, Kursen und den Lernmaterialien, aus denen sich ein Kurs zusammensetzt: Videos, Texte und Quiz. Darüber hinaus können die Lernmaterialien durch externe Übungstools angereichert werden, die über den Standard Learning Tools Interoperability (LTI) mit der HPI MOOC-Plattform kommunizieren. Das Konzept der Peer-Assessments rundet die möglichen Lernmaterialien ab.
Der Abschnitt geht dann weiter auf das Diskussionsforum ein, das einen grundlegenden Unterschied von MOOCs im Vergleich zu traditionellen E-Learning-Angeboten darstellt. Zum Abschluss des Abschnitts folgen eine Beschreibung von Quiz-Recap, Lernzielen, mobilen Anwendungen, spielerischen Lernens und dem Helpdesk.

Der nächste Teil dieses Kapitels beschäftigt sich mit den Admin-Funktionen. Die Funktionalitätsbeschreibung beschränkt sich Neuigkeiten und Ankündigungen, Dashboards und Statistiken, Berichtsfunktionen, Forschungsoptionen mit A/B-Tests, den Kurs-Feed und das TransPipe-Tool zur Unterstützung beim Erstellen von automatischen oder manuellen Untertiteln. Die Plattform unterstützt außerdem eine Vielzahl zusätzlicher Funktionen, doch eine detaillierte Beschreibung dieser Funktionen würde den Rahmen des Berichts sprengen.
Das Kapitel geht dann auf gängige Kursformate und openHPI-Lehrveranstaltungen am HPI ein, bevor es mit einigen Best Practices für die Gestaltung und Durchführung von Kursen schließt.
Zum Abschluss des technischen Berichts gibt das letzte Kapitel eine Zusammenfassung und einen Ausblick auf die Zukunft der digitalen Bildung. 

Ein besonderes Merkmal des openHPI-Projekts ist die bewusste Entscheidung, die komplette Anwendung von den physischen Netzwerkkomponenten bis zur Plattformentwicklung eigenständig zu betreiben. Bei der vorliegenden deutschen Variante handelt es sich um eine gekürzte Übersetzung des technischen Berichts 148, bei der kein Einblick in die Technologien und Architektur von openHPI gegeben wird. Interessierte Leser:innen können im technischen Bericht 148 (vollständige englische Version) detaillierte Informationen zum Rechenzentrum und den Geräten, der Cloud-Software und dem openHPI Cloud Service aber auch zu Infrastruktur-Anwendungskomponenten wie Entwicklungstools, Automatisierung, Deployment-Pipeline und Monitoring erhalten. Außerdem finden sich dort weitere Informationen über den Technologiestack und konkrete Implementierungsdetails der Plattform inklusive der serviceorientierten Ruby on Rails-Anwendung, die Kommunikation zwischen den Diensten, öffentliche APIs, sowie Designsystem und -komponenten. Der Abschnitt schließt mit einer Diskussion über die ursprüngliche Microservice-Architektur und die Migration zu einer monolithischen Anwendung.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 148 
KW  - openHPI
KW  - MOOC
KW  - digital learning platform
KW  - digital enlightenment
KW  - lifelong learning
KW  - openHPI
KW  - MOOC
KW  - digitale Lernplattform
KW  - digitale Aufklärung
KW  - lebenslanges Lernen
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-560208
SN  - 978-3-86956-544-6
SN  - 1613-5652
SN  - 2191-1665
IS  - 148
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Ihde, Sven
A1  - Pufahl, Luise
A1  - Völker, Maximilian
A1  - Goel, Asvin
A1  - Weske, Mathias
T1  - A framework for modeling and executing task
BT  - specific resource allocations in business processes
JF  - Computing : archives for informatics and numerical computation
N2  - As resources are valuable assets, organizations have to decide which resources to allocate to business process tasks in a way that the process is executed not only effectively but also efficiently. Traditional role-based resource allocation leads to effective process executions, since each task is performed by a resource that has the required skills and competencies to do so. However, the resulting allocations are typically not as efficient as they could be, since optimization techniques have yet to find their way in traditional business process management scenarios. On the other hand, operations research provides a rich set of analytical methods for supporting problem-specific decisions on resource allocation. This paper provides a novel framework for creating transparency on existing tasks and resources, supporting individualized allocations for each activity in a process, and the possibility to integrate problem-specific analytical methods of the operations research domain. To validate the framework, the paper reports on the design and prototypical implementation of a software architecture, which extends a traditional process engine with a dedicated resource management component. This component allows us to define specific resource allocation problems at design time, and it also facilitates optimized resource allocation at run time. The framework is evaluated using a real-world parcel delivery process. The evaluation shows that the quality of the allocation results increase significantly with a technique from operations research in contrast to the traditional applied rule-based approach.
KW  - Process Execution
KW  - Business Process Management
KW  - Resource Allocation
KW  - Resource Management
KW  - Activity-oriented Optimization
Y1  - 2022
U6  - https://doi.org/10.1007/s00607-022-01093-2
SN  - 0010-485X
SN  - 1436-5057
VL  - 104
SP  - 2405
EP  - 2429
PB  - Springer
CY  - Wien
ER  - 
TY  - JOUR
A1  - Roostapour, Vahid
A1  - Neumann, Aneta
A1  - Neumann, Frank
A1  - Friedrich, Tobias
T1  - Pareto optimization for subset selection with dynamic cost constraints
JF  - Artificial intelligence
N2  - We consider the subset selection problem for function f with constraint bound B that changes over time. Within the area of submodular optimization, various greedy approaches are commonly used. For dynamic environments we observe that the adaptive variants of these greedy approaches are not able to maintain their approximation quality. Investigating the recently introduced POMC Pareto optimization approach, we show that this algorithm efficiently computes a phi=(alpha(f)/2)(1 - 1/e(alpha)f)-approximation, where alpha(f) is the submodularity ratio of f, for each possible constraint bound b <= B. Furthermore, we show that POMC is able to adapt its set of solutions quickly in the case that B increases. Our experimental investigations for the influence maximization in social networks show the advantage of POMC over generalized greedy algorithms. We also consider EAMC, a new evolutionary algorithm with polynomial expected time guarantee to maintain phi approximation ratio, and NSGA-II with two different population sizes as advanced multi-objective optimization algorithm, to demonstrate their challenges in optimizing the maximum coverage problem. Our empirical analysis shows that, within the same number of evaluations, POMC is able to perform as good as NSGA-II under linear constraint, while EAMC performs significantly worse than all considered algorithms in most cases.
KW  - Subset selection
KW  - Submodular function
KW  - Multi-objective optimization
KW  - Runtime analysis
Y1  - 2022
U6  - https://doi.org/10.1016/j.artint.2021.103597
SN  - 0004-3702
SN  - 1872-7921
VL  - 302
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - CHAP
A1  - Krasnova, Hanna
A1  - Gundlach, Jana
A1  - Baumann, Annika
T1  - Coming back for more
BT  - the effect of news feed serendipity on social networking site sage
T2  - PACIS 2022 proceedings
N2  - Recent spikes in social networking site (SNS) usage times have launched investigations into reasons for excessive SNS usage. Extending research on social factors (i.e., fear of missing out), this study considers the News Feed setup. More specifically, we suggest that the order of the News Feed (chronological vs. algorithmically assembled posts) affects usage behaviors. Against the background of the variable reward schedule, this study hypothesizes that the different orders exert serendipity differently. Serendipity, termed as unexpected lucky encounters with information, resembles variable rewards. Studies have evidenced a relation between variable rewards and excessive behaviors. Similarly, we hypothesize that order-induced serendipitous encounters affect SNS usage times and explore this link in a two-wave survey with an experimental setup (users using either chronological or algorithmic News Feeds). While theoretically extending explanations for increased SNS usage times by considering the News Feed order, practically the study will offer recommendations for relevant stakeholders.
Y1  - 2022
UR  - https://aisel.aisnet.org/pacis2022/271
SN  - 9781958200018
PB  - AIS Electronic Library (AISeL)
CY  - [Erscheinungsort nicht ermittelbar]
ER  - 
TY  - JOUR
A1  - Ndashimye, Felix
A1  - Hebie, Oumarou
A1  - Tjaden, Jasper
T1  - Effectiveness of WhatsApp for measuring migration in follow-up phone surveys
BT  - lessons from a mode experiment in two low-income countries during COVID contact restrictions
JF  - Social science computer review
N2  - Phone surveys have increasingly become important data collection tools in developing countries, particularly in the context of sudden contact restrictions due to the COVID-19 pandemic. So far, there is limited evidence regarding the potential of the messenger service WhatsApp for remote data collection despite its large global coverage and expanding membership. WhatsApp may offer advantages in terms of reducing panel attrition and cutting survey costs. WhatsApp may offer additional benefits to migration scholars interested in cross-border migration behavior which is notoriously difficult to measure using conventional face-to-face surveys. In this field experiment, we compared the response rates between WhatsApp and interactive voice response (IVR) modes using a sample of 8446 contacts in Senegal and Guinea. At 12%, WhatsApp survey response rates were nearly eight percentage points lower than IVR survey response rates. However, WhatsApp offers higher survey completion rates, substantially lower costs and does not introduce more sample selection bias compared to IVR. We discuss the potential of WhatsApp surveys in low-income contexts and provide practical recommendations for field implementation.
KW  - WhatsApp
KW  - survey mode
KW  - migration
KW  - Covid
KW  - phone
Y1  - 2022
U6  - https://doi.org/10.1177/08944393221111340
SN  - 0894-4393
SN  - 1552-8286
PB  - Sage
CY  - Thousand Oaks
ER  - 
TY  - JOUR
A1  - Spiekermann, Sarah
A1  - Krasnova, Hanna
A1  - Hinz, Oliver
A1  - Baumann, Annika
A1  - Benlian, Alexander
A1  - Gimpel, Henner
A1  - Heimbach, Irina
A1  - Koester, Antonia
A1  - Maedche, Alexander
A1  - Niehaves, Bjoern
A1  - Risius, Marten
A1  - Trenz, Manuel
T1  - Values and ethics in information systems
BT  - a state-of-the-art analysis and avenues for future research
JF  - Business & information systems engineering
Y1  - 2022
U6  - https://doi.org/10.1007/s12599-021-00734-8
SN  - 2363-7005
SN  - 1867-0202
VL  - 64
IS  - 2
SP  - 247
EP  - 264
PB  - Springer Gabler
CY  - Wiesbaden
ER  - 
TY  - BOOK
A1  - Gerken, Stefanie
A1  - Uebernickel, Falk
A1  - de Paula, Danielly
T1  - Design Thinking: a Global Study on Implementation Practices in Organizations
T1  - Design Thinking: eine globale Studie über Implementierungspraktiken in Organisationen
BT  - Past - Present - Future
BT  - Vergangenheit - Gegenwart - Zukunft
N2  - These days design thinking is no longer a “new approach”. Among practitioners, as well as academics, interest in the topic has gathered pace over the last two decades. However, opinions are divided over the longevity of the phenomenon: whether design thinking is merely “old wine in new bottles,” a passing trend, or still evolving as it is being spread to an increasing number of organizations and industries. Despite its growing relevance and the diffusion of design thinking, knowledge on the actual status quo in organizations remains scarce. With a new study, the research team of Prof. Uebernickel and Stefanie Gerken investigates temporal developments and changes in design thinking practices in organizations over the past six years comparing the results of the 2015 “Parts without a whole” study with current practices and future developments. Companies of all sizes and from different parts of the world participated in the survey. The findings from qualitative interviews with experts, i.e., people who have years of knowledge with design thinking, were cross-checked with the results from an exploratory analysis of the survey data. This analysis uncovers significant variances and similarities in how design thinking is interpreted and applied in businesses.
N2  - Heutzutage ist Design Thinking kein "neuer Ansatz" mehr. Unter Praktikern und Akademikern hat das Interesse an diesem Thema in den letzten zwei Jahrzehnten stark zugenommen. Die Meinungen sind jedoch geteilt, ob Design Thinking lediglich "alter Wein in neuen Schläuchen" ist, ein vorübergehender Trend, oder ein sich weiterentwickelndes Phänomen, welches in immer mehr Organisationen und Branchen Fuß fast. Trotz der wachsenden Relevanz und Verbreitung von Design Thinking ist das Wissen über den tatsächlichen Status quo in Organisationen nach wie vor spärlich. Mit einer neuen Studie untersucht das Forschungsteam von Prof. Uebernickel, Stefanie Gerken und Dr. Danielly de Paula die zeitlichen Entwicklungen und Veränderungen von Design Thinking Praktiken in Organisationen über die letzten sechs Jahre und vergleicht die Ergebnisse der Studie "Parts without a whole" aus dem Jahr 2015 mit aktuellen Praktiken und perspektivischen Entwicklungen. An der Studie haben Unternehmen aller Größen und aus verschiedenen Teilen der Welt teilgenommen. Um dem komplexen Untersuchungsgegenstand gerecht zu werden, wurde eine Mixed-Method-Ansatz gewählt: Die Erkenntnisse aus qualitativen Experteninterviews, d.h. Personen, die sich seit Jahren mit dem Thema Design Thinking in der Praxis beschäftigen, wurden mit den Ergebnissen einer quantitativen Analyse von Umfragedaten abgeglichen. Die vorliegende Studie erörtert signifikante Unterschiede und Gemeinsamkeiten bei der Interpretation und Anwendung von Design Thinking in Unternehmen.
KW  - Design Thinking
KW  - Agile
KW  - Implementation in Organizations
KW  - life-centered
KW  - human-centered
KW  - Innovation
KW  - Behavior change
KW  - Problem Solving
KW  - Creative
KW  - Solution Space
KW  - Process
KW  - Mindset
KW  - Tools
KW  - Wicked Problems
KW  - VUCA-World
KW  - Ambiguity
KW  - Interdisciplinary Teams
KW  - Multidisciplinary Teams
KW  - Impact
KW  - Measurement
KW  - Ideation
KW  - Agilität
KW  - agil
KW  - Ambiguität
KW  - Verhaltensänderung
KW  - Kreativität
KW  - Design Thinking
KW  - Ideenfindung
KW  - Auswirkungen
KW  - Implementierung in Organisationen
KW  - Innovation
KW  - interdisziplinäre Teams
KW  - Messung
KW  - Denkweise
KW  - multidisziplinäre Teams
KW  - Problemlösung
KW  - Prozess
KW  - Lösungsraum
KW  - Werkzeuge
KW  - Aktivitäten
KW  - verzwickte Probleme
KW  - menschenzentriert
KW  - lebenszentriert
KW  - VUCA-World
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-534668
SN  - 978-3-86956-525-5
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Flotterer, Boris
A1  - Maximova, Maria
A1  - Schneider, Sven
A1  - Dyck, Johannes
A1  - Zöllner, Christian
A1  - Giese, Holger
A1  - Hély, Christelle
A1  - Gaucherel, Cédric
T1  - Modeling and Formal Analysis of Meta-Ecosystems with Dynamic Structure using Graph Transformation
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam
N2  - The dynamics of ecosystems is of crucial importance. Various model-based approaches exist to understand and analyze their internal effects. In this paper, we model the space structure dynamics and ecological dynamics of meta-ecosystems using the formal technique of Graph Transformation (short GT). We build GT models to describe how a meta-ecosystem (modeled as a graph) can evolve over time (modeled by GT rules) and to analyze these GT models with respect to qualitative properties such as the existence of structural stabilities. As a case study, we build three GT models describing the space structure dynamics and ecological dynamics of three different savanna meta-ecosystems. The first GT model considers a savanna meta-ecosystem that is limited in space to two ecosystem patches, whereas the other two GT models consider two savanna meta-ecosystems that are unlimited in the number of ecosystem patches and only differ in one GT rule describing how the space structure of the meta-ecosystem grows. In the first two GT models, the space structure dynamics and ecological dynamics of the meta-ecosystem shows two main structural stabilities: the first one based on grassland-savanna-woodland transitions and the second one based on grassland-desert transitions. The transition between these two structural stabilities is driven by high-intensity fires affecting the tree components. In the third GT model, the GT rule for savanna regeneration induces desertification and therefore a collapse of the meta-ecosystem. We believe that GT models provide a complementary avenue to that of existing approaches to rigorously study ecological phenomena.
N2  - Die Dynamik von Ökosystemen ist von entscheidender Bedeutung. Es gibt verschiedene modellbasierte Ansätze, um ihre internen Effekte zu verstehen und zu analysieren. In diesem Beitrag modellieren wir die Raumstrukturdynamik und ökologische Dynamik von Metaökosystemen mit der formalen Technik der Graphtransformation (kurz GT). Wir bauen GT-Modelle, um zu beschreiben, wie sich ein Meta-Ökosystem (modelliert als Graph) im Laufe der Zeit entwickeln kann (modelliert durch GT-Regeln) und analysieren diese GT-Modelle hinsichtlich qualitativer Eigenschaften wie das Vorhandensein struktureller Stabilitäten. Als Fallstudie bauen wir drei GT-Modelle, die die Dynamik der Raumstruktur und die ökologische Dynamik von drei verschiedenen Savannen-Meta-Ökosystemen beschreiben. Das erste GT-Modell betrachtet ein Savannen-Meta-Ökosystem, das räumlich auf zwei Ökosystem-Abschnitte begrenzt ist, während die anderen beiden GT-Modelle zwei Savannen-Meta-Ökosysteme betrachten, die in der Anzahl von Ökosystem-Abschnitten uneingeschränkt sind und sich nur in einer GT-Regel unterscheiden, die beschreibt, wie die Raumstruktur des Meta-Ökosystems wächst. In den ersten beiden GT-Modellen zeigen die Raumstrukturdynamik und die ökologische Dynamik des Metaökosystems zwei Hauptstrukturstabilitäten: die erste basiert auf Grasland-Savannen-Wald-Übergängen und die zweite basiert auf Grasland-Wüsten-Übergängen. Der Übergang zwischen diesen beiden strukturellen Stabilitäten wird durch hochintensive Brände angetrieben, die die Baumkomponenten beeinträchtigen. Beim dritten GT-Modell führt die Savannenregeneration beschreibende GT-Regel zur Wüstenbildung und damit zum Kollaps des Meta-Ökosystems. Wir glauben, dass GT-Modelle eine gute Ergänzung zu bestehenden Ansätzen darstellen, um ökologische Phänomene rigoros zu untersuchen.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 147 
KW  - dynamic systems
KW  - discrete-event model
KW  - qualitative model
KW  - savanna
KW  - trajectories
KW  - desertification
KW  - dynamische Systeme
KW  - diskretes Ereignismodell
KW  - qualitatives Modell
KW  - Savanne
KW  - Trajektorien
KW  - Wüstenbildung
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-547643
SN  - 978-3-86956-533-0
SN  - 1613-5652
SN  - 2191-1665
IS  - 147
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Stauffer, Maxime
A1  - Mengesha, Isaak
A1  - Seifert, Konrad
A1  - Krawczuk, Igor
A1  - Fischer, Jens
A1  - Serugendo, Giovanna Di Marzo
T1  - A computational turn in policy process studies
BT  - coevolving network dynamics of policy change
JF  - Complexity
N2  - The past three decades of policy process studies have seen the emergence of a clear intellectual lineage with regard to complexity. Implicitly or explicitly, scholars have employed complexity theory to examine the intricate dynamics of collective action in political contexts. However, the methodological counterparts to complexity theory, such as computational methods, are rarely used and, even if they are, they are often detached from established policy process theory. Building on a critical review of the application of complexity theory to policy process studies, we present and implement a baseline model of policy processes using the logic of coevolving networks. Our model suggests that an actor's influence depends on their environment and on exogenous events facilitating dialogue and consensus-building. Our results validate previous opinion dynamics models and generate novel patterns. Our discussion provides ground for further research and outlines the path for the field to achieve a computational turn.
Y1  - 2022
U6  - https://doi.org/10.1155/2022/8210732
SN  - 1076-2787
SN  - 1099-0526
VL  - 2022
PB  - Wiley-Hindawi
CY  - London
ER  - 
TY  - JOUR
A1  - Wendering, Philipp
A1  - Nikoloski, Zoran
T1  - COMMIT
BT  - Consideration of metabolite leakage and community composition improves microbial community reconstructions
JF  - PLoS Computational Biology : a new community journal / publ. by the Public Library of Science (PLoS) in association with the International Society for Computational Biology (ISCB)
N2  - Composition and functions of microbial communities affect important traits in diverse hosts, from crops to humans. Yet, mechanistic understanding of how metabolism of individual microbes is affected by the community composition and metabolite leakage is lacking. Here, we first show that the consensus of automatically generated metabolic reconstructions improves the quality of the draft reconstructions, measured by comparison to reference models. We then devise an approach for gap filling, termed COMMIT, that considers metabolites for secretion based on their permeability and the composition of the community. By applying COMMIT with two soil communities from the Arabidopsis thaliana culture collection, we could significantly reduce the gap-filling solution in comparison to filling gaps in individual reconstructions without affecting the genomic support. Inspection of the metabolic interactions in the soil communities allows us to identify microbes with community roles of helpers and beneficiaries. Therefore, COMMIT offers a versatile fully automated solution for large-scale modelling of microbial communities for diverse biotechnological applications. <br /> Author summaryMicrobial communities are important in ecology, human health, and crop productivity. However, detailed information on the interactions within natural microbial communities is hampered by the community size, lack of detailed information on the biochemistry of single organisms, and the complexity of interactions between community members. Metabolic models are comprised of biochemical reaction networks based on the genome annotation, and can provide mechanistic insights into community functions. Previous analyses of microbial community models have been performed with high-quality reference models or models generated using a single reconstruction pipeline. However, these models do not contain information on the composition of the community that determines the metabolites exchanged between the community members. In addition, the quality of metabolic models is affected by the reconstruction approach used, with direct consequences on the inferred interactions between community members. Here, we use fully automated consensus reconstructions from four approaches to arrive at functional models with improved genomic support while considering the community composition. We applied our pipeline to two soil communities from the Arabidopsis thaliana culture collection, providing only genome sequences. Finally, we show that the obtained models have 90% genomic support and demonstrate that the derived interactions are corroborated by independent computational predictions.
Y1  - 2022
U6  - https://doi.org/10.1371/journal.pcbi.1009906
SN  - 1553-734X
SN  - 1553-7358
VL  - 18
IS  - 3
PB  - Public Library of Science
CY  - San Fransisco
ER  - 
TY  - GEN
A1  - Benlian, Alexander
A1  - Wiener, Martin
A1  - Cram, W. Alec
A1  - Krasnova, Hanna
A1  - Maedche, Alexander
A1  - Mohlmann, Mareike
A1  - Recker, Jan
A1  - Remus, Ulrich
T1  - Algorithmic management
BT  - Bright and dark sides, practical implications, and research opportunities
T2  - Zweitveröffentlichungen der Universität Potsdam : Wirtschafts- und Sozialwissenschaftliche Reihe
T3  - Zweitveröffentlichungen der Universität Potsdam : Wirtschafts- und Sozialwissenschaftliche Reihe - 174 
Y1  - 0202
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-607112
SN  - 2363-7005
SN  - 1867-0202
SN  - 1867-5808
IS  - 6
ER  - 
TY  - JOUR
A1  - Benlian, Alexander
A1  - Wiener, Martin
A1  - Cram, W. Alec
A1  - Krasnova, Hanna
A1  - Maedche, Alexander
A1  - Mohlmann, Mareike
A1  - Recker, Jan
A1  - Remus, Ulrich
T1  - Algorithmic management
BT  - bright and dark sides, practical implications, and research opportunities
JF  - Business and information systems engineering
Y1  - 2022
U6  - https://doi.org/10.1007/s12599-022-00764-w
SN  - 2363-7005
SN  - 1867-0202
VL  - 64
IS  - 6
SP  - 825
EP  - 839
PB  - Springer Gabler
CY  - Wiesbaden
ER  - 
TY  - CHAP
A1  - Sultanow, Eldar
A1  - Chircu, Alina
A1  - Wüstemann, Stefanie
A1  - Schwan, André
A1  - Lehmann, Andreas
A1  - Sept, André
A1  - Szymaski, Oliver
A1  - Venkatesan, Sripriya
A1  - Ritterbusch, Georg David
A1  - Teichmann, Malte Rolf
T1  - Metaverse opportunities for the public sector
T2  - International Conference on Information Systems 2022 : Special Interest Group on Big Data : Proceedings
N2  - The metaverse is envisioned as a virtual shared space facilitated by emerging technologies such as virtual reality (VR), augmented reality (AR), the Internet of Things (IoT), 5G, artificial intelligence (AI), big data, spatial computing, and digital twins (Allam et al., 2022; Dwivedi et al., 2022; Ravenscraft, 2022; Wiles, 2022). While still a nascent concept, the metaverse has the potential to “transform the physical world, as well as transport or extend physical activities to a virtual world” (Wiles, 2022). Big data technologies will also be essential in managing the enormous amounts of data created in the metaverse (Sun et al., 2022). Metaverse technologies can offer the public sector a host of benefits, such as simplified information exchange, stronger communication with citizens, better access to public services, or benefiting from a new virtual economy. Implementations are underway in several cities around the world (Geraghty et al., 2022). In this paper, we analyze metaverse opportunities for the public sector and explore their application in the context of Germany’s Federal Employment Agency. Based on an analysis of academic literature and practical examples, we create a capability map for potential metaverse business capabilities for different areas of the public sector (broadly defined). These include education (virtual training and simulation, digital campuses that offer not just online instruction but a holistic university campus experience, etc.), tourism (virtual travel to remote locations and museums, virtual festival participation, etc.), health (employee training – as for emergency situations, virtual simulations for patient treatment – for example, for depression or anxiety, etc.), military (virtual training to experience operational scenarios without being exposed to a real-world threats, practice strategic decision-making, or gain technical knowledge for operating and repairing equipment, etc.), administrative services (document processing, virtual consultations for citizens, etc.), judiciary (AI decision-making aids, virtual proceedings, etc.), public safety (virtual training for procedural issues, special operations, or unusual situations, etc.), emergency management (training for natural disasters, etc.), and city planning (visualization of future development projects and interactive feedback, traffic management, attraction gamification, etc.), among others. We further identify several metaverse application areas for Germany's Federal Employment Agency. These applications can help it realize the goals of the German government for digital transformation that enables faster, more effective, and innovative government services. They include training of employees, training of customers, and career coaching for customers. These applications can be implemented using interactive learning games with AI agents, virtual representations of the organizational spaces, and avatars interacting with each other in these spaces. Metaverse applications will both use big data (to design the virtual environments) and generate big data (from virtual interactions). Issues related to data availability, quality, storage, processing (and related computing power requirements), interoperability, sharing, privacy and security will need to be addressed in these emerging metaverse applications (Sun et al., 2022). Special attention is needed to understand the potential for power inequities (wealth inequity, algorithmic bias, digital exclusion) due to technologies such as VR (Egliston & Carter, 2021), harmful surveillance practices (Bibri & Allam, 2022), and undesirable user behavior or negative psychological impacts (Dwivedi et al., 2022). The results of this exploratory study can inform public sector organizations of emerging metaverse opportunities and enable them to develop plans for action as more of the metaverse technologies become a reality. While the metaverse body of research is still small and research agendas are only now starting to emerge (Dwivedi et al., 2022), this study offers a building block for future development and analysis of metaverse applications.
Y1  - 2022
UR  - https://aisel.aisnet.org/sigbd2022/5/
PB  - AIS
CY  - Atlanta
ER  - 
TY  - GEN
A1  - Seewann, Lena
A1  - Verwiebe, Roland
A1  - Buder, Claudia
A1  - Fritsch, Nina-Sophie
T1  - “Broadcast your gender.”
BT  - A comparison of four text-based classification methods of German YouTube channels
T2  - Zweitveröffentlichungen der Universität Potsdam : Wirtschafts- und Sozialwissenschaftliche Reihe
N2  - Social media platforms provide a large array of behavioral data relevant to social scientific research. However, key information such as sociodemographic characteristics of agents are often missing. This paper aims to compare four methods of classifying social attributes from text. Specifically, we are interested in estimating the gender of German social media creators. By using the example of a random sample of 200 YouTube channels, we compare several classification methods, namely (1) a survey among university staff, (2) a name dictionary method with the World Gender Name Dictionary as a reference list, (3) an algorithmic approach using the website gender-api.com, and (4) a Multinomial Naïve Bayes (MNB) machine learning technique. These different methods identify gender attributes based on YouTube channel names and descriptions in German but are adaptable to other languages. Our contribution will evaluate the share of identifiable channels, accuracy and meaningfulness of classification, as well as limits and benefits of each approach. We aim to address methodological challenges connected to classifying gender attributes for YouTube channels as well as related to reinforcing stereotypes and ethical implications.
T3  - Zweitveröffentlichungen der Universität Potsdam : Wirtschafts- und Sozialwissenschaftliche Reihe - 152 
KW  - text based classification methods
KW  - gender
KW  - YouTube
KW  - machine learning
KW  - authorship attribution
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-566287
SN  - 1867-5808
IS  - 152
ER  - 
TY  - JOUR
A1  - Seewann, Lena
A1  - Verwiebe, Roland
A1  - Buder, Claudia
A1  - Fritsch, Nina-Sophie
T1  - “Broadcast your gender.”
BT  - A comparison of four text-based classification methods of German YouTube channels
JF  - Frontiers in Big Data
N2  - Social media platforms provide a large array of behavioral data relevant to social scientific research. However, key information such as sociodemographic characteristics of agents are often missing. This paper aims to compare four methods of classifying social attributes from text. Specifically, we are interested in estimating the gender of German social media creators. By using the example of a random sample of 200 YouTube channels, we compare several classification methods, namely (1) a survey among university staff, (2) a name dictionary method with the World Gender Name Dictionary as a reference list, (3) an algorithmic approach using the website gender-api.com, and (4) a Multinomial Naïve Bayes (MNB) machine learning technique. These different methods identify gender attributes based on YouTube channel names and descriptions in German but are adaptable to other languages. Our contribution will evaluate the share of identifiable channels, accuracy and meaningfulness of classification, as well as limits and benefits of each approach. We aim to address methodological challenges connected to classifying gender attributes for YouTube channels as well as related to reinforcing stereotypes and ethical implications.
KW  - text based classification methods
KW  - gender
KW  - YouTube
KW  - machine learning
KW  - authorship attribution
Y1  - 2022
U6  - https://doi.org/10.3389/fdata.2022.908636
SN  - 2624-909X
IS  - 5
PB  - Frontiers
CY  - Lausanne, Schweiz
ER  - 
TY  - JOUR
A1  - Chen, Junchao
A1  - Lange, Thomas
A1  - Andjelkovic, Marko
A1  - Simevski, Aleksandar
A1  - Lu, Li
A1  - Krstić, Miloš
T1  - Solar particle event and single event upset prediction from SRAM-based monitor and supervised machine learning
JF  - IEEE transactions on emerging topics in computing / IEEE Computer Society, Institute of Electrical and Electronics Engineers
N2  - The intensity of cosmic radiation may differ over five orders of magnitude within a few hours or days during the Solar Particle Events (SPEs), thus increasing for several orders of magnitude the probability of Single Event Upsets (SEUs) in space-borne electronic systems. Therefore, it is vital to enable the early detection of the SEU rate changes in order to ensure timely activation of dynamic radiation hardening measures. In this paper, an embedded approach for the prediction of SPEs and SRAM SEU rate is presented. The proposed solution combines the real-time SRAM-based SEU monitor, the offline-trained machine learning model and online learning algorithm for the prediction. With respect to the state-of-the-art, our solution brings the following benefits: (1) Use of existing on-chip data storage SRAM as a particle detector, thus minimizing the hardware and power overhead, (2) Prediction of SRAM SEU rate one hour in advance, with the fine-grained hourly tracking of SEU variations during SPEs as well as under normal conditions, (3) Online optimization of the prediction model for enhancing the prediction accuracy during run-time, (4) Negligible cost of hardware accelerator design for the implementation of selected machine learning model and online learning algorithm. The proposed design is intended for a highly dependable and self-adaptive multiprocessing system employed in space applications, allowing to trigger the radiation mitigation mechanisms before the onset of high radiation levels.
KW  - Machine learning
KW  - Single event upsets
KW  - Random access memory
KW  - monitoring
KW  - machine learning algorithms
KW  - predictive models
KW  - space missions
KW  - solar particle event
KW  - single event upset
KW  - machine learning
KW  - online learning
KW  - hardware accelerator
KW  - reliability
KW  - self-adaptive multiprocessing system
Y1  - 2022
U6  - https://doi.org/10.1109/TETC.2022.3147376
SN  - 2168-6750
VL  - 10
IS  - 2
SP  - 564
EP  - 580
PB  - Institute of Electrical and Electronics Engineers
CY  - [New York, NY]
ER  - 
TY  - BOOK
A1  - Rana, Kaushik
A1  - Mohapatra, Durga Prasad
A1  - Sidorova, Julia
A1  - Lundberg, Lars
A1  - Sköld, Lars
A1  - Lopes Grim, Luís Fernando
A1  - Sampaio Gradvohl, André Leon
A1  - Cremerius, Jonas
A1  - Siegert, Simon
A1  - Weltzien, Anton von
A1  - Baldi, Annika
A1  - Klessascheck, Finn
A1  - Kalancha, Svitlana
A1  - Lichtenstein, Tom
A1  - Shaabani, Nuhad
A1  - Meinel, Christoph
A1  - Friedrich, Tobias
A1  - Lenzner, Pascal
A1  - Schumann, David
A1  - Wiese, Ingmar
A1  - Sarna, Nicole
A1  - Wiese, Lena
A1  - Tashkandi, Araek Sami
A1  - van der Walt, Estée
A1  - Eloff, Jan H. P.
A1  - Schmidt, Christopher
A1  - Hügle, Johannes
A1  - Horschig, Siegfried
A1  - Uflacker, Matthias
A1  - Najafi, Pejman
A1  - Sapegin, Andrey
A1  - Cheng, Feng
A1  - Stojanovic, Dragan
A1  - Stojnev Ilić, Aleksandra
A1  - Djordjevic, Igor
A1  - Stojanovic, Natalija
A1  - Predic, Bratislav
A1  - González-Jiménez, Mario
A1  - de Lara, Juan
A1  - Mischkewitz, Sven
A1  - Kainz, Bernhard
A1  - van Hoorn, André
A1  - Ferme, Vincenzo
A1  - Schulz, Henning
A1  - Knigge, Marlene
A1  - Hecht, Sonja
A1  - Prifti, Loina
A1  - Krcmar, Helmut
A1  - Fabian, Benjamin
A1  - Ermakova, Tatiana
A1  - Kelkel, Stefan
A1  - Baumann, Annika
A1  - Morgenstern, Laura
A1  - Plauth, Max
A1  - Eberhard, Felix
A1  - Wolff, Felix
A1  - Polze, Andreas
A1  - Cech, Tim
A1  - Danz, Noel
A1  - Noack, Nele Sina
A1  - Pirl, Lukas
A1  - Beilharz, Jossekin Jakob
A1  - De Oliveira, Roberto C. L.
A1  - Soares, Fábio Mendes
A1  - Juiz, Carlos
A1  - Bermejo, Belen
A1  - Mühle, Alexander
A1  - Grüner, Andreas
A1  - Saxena, Vageesh
A1  - Gayvoronskaya, Tatiana
A1  - Weyand, Christopher
A1  - Krause, Mirko
A1  - Frank, Markus
A1  - Bischoff, Sebastian
A1  - Behrens, Freya
A1  - Rückin, Julius
A1  - Ziegler, Adrian
A1  - Vogel, Thomas
A1  - Tran, Chinh
A1  - Moser, Irene
A1  - Grunske, Lars
A1  - Szárnyas, Gábor
A1  - Marton, József
A1  - Maginecz, János
A1  - Varró, Dániel
A1  - Antal, János Benjamin
ED  - Meinel, Christoph
ED  - Polze, Andreas
ED  - Beins, Karsten
ED  - Strotmann, Rolf
ED  - Seibold, Ulrich
ED  - Rödszus, Kurt
ED  - Müller, Jürgen
T1  - HPI Future SOC Lab – Proceedings 2018
N2  - The “HPI Future SOC Lab” is a cooperation of the Hasso Plattner Institute (HPI) and industry partners. Its mission is to enable and promote exchange and interaction between the research community and the industry partners.
  The HPI Future SOC Lab provides researchers with free of charge access to a complete infrastructure of state of the art hard and software. This infrastructure includes components, which might be too expensive for an ordinary research environment, such as servers with up to 64 cores and 2 TB main memory. The offerings address researchers particularly from but not limited to the areas of computer science and business information systems. Main areas of research include cloud computing, parallelization, and In-Memory technologies.
  This technical report presents results of research projects executed in 2018. Selected projects have presented their results on April 17th and November 14th 2017 at the Future SOC Lab Day events.
N2  - Das Future SOC Lab am HPI ist eine Kooperation des Hasso-Plattner-Instituts mit verschiedenen Industriepartnern. Seine Aufgabe ist die Ermöglichung und Förderung des Austausches zwischen Forschungsgemeinschaft und Industrie.
  Am Lab wird interessierten Wissenschaftler:innen eine Infrastruktur von neuester Hard- und Software kostenfrei für Forschungszwecke zur Verfügung gestellt. Dazu zählen Systeme, die im normalen Hochschulbereich in der Regel nicht zu finanzieren wären, bspw. Server mit bis zu 64 Cores und 2 TB Hauptspeicher. Diese Angebote richten sich insbesondere an Wissenschaftler:innen in den Gebieten Informatik und Wirtschaftsinformatik. Einige der Schwerpunkte sind Cloud Computing, Parallelisierung und In-Memory Technologien. 
  In diesem Technischen Bericht werden die Ergebnisse der Forschungsprojekte des Jahres 2018 vorgestellt.  Ausgewählte Projekte stellten ihre Ergebnisse am 17. April und 14. November 2018 im Rahmen des Future SOC Lab Tags vor.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 151 
KW  - Future SOC Lab
KW  - research projects
KW  - multicore architectures
KW  - in-memory technology
KW  - cloud computing
KW  - machine learning
KW  - artifical intelligence
KW  - Future SOC Lab
KW  - Forschungsprojekte
KW  - Multicore Architekturen
KW  - In-Memory Technologie
KW  - Cloud Computing
KW  - maschinelles Lernen
KW  - künstliche Intelligenz
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-563712
SN  - 978-3-86956-547-7
SN  - 1613-5652
SN  - 2191-1665
IS  - 151
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Grüner, Andreas
T1  - Towards practical and trust-enhancing attribute aggregation for self-sovereign identity
N2  - Identity management is at the forefront of applications’ security posture. It separates the unauthorised user from the legitimate individual. Identity management models have evolved from the isolated to the centralised paradigm and identity federations. Within this advancement, the identity provider emerged as a trusted third party that holds a powerful position. Allen postulated the novel self-sovereign identity paradigm to establish a new balance. Thus, extensive research is required to comprehend its virtues and limitations. Analysing the new paradigm, initially, we investigate the blockchain-based self-sovereign identity concept structurally. Moreover, we examine trust requirements in this context by reference to patterns. These shapes comprise major entities linked by a decentralised identity provider. By comparison to the traditional models, we conclude that trust in credential management and authentication is removed. Trust-enhancing attribute aggregation based on multiple attribute providers provokes a further trust shift. Subsequently, we formalise attribute assurance trust modelling by a metaframework. It encompasses the attestation and trust network as well as the trust decision process, including the trust function, as central components. A secure attribute assurance trust model depends on the security of the trust function. The trust function should consider high trust values and several attribute authorities. Furthermore, we evaluate classification, conceptual study, practical analysis and simulation as assessment strategies of trust models. For realising trust-enhancing attribute aggregation, we propose a probabilistic approach. The method exerts the principle characteristics of correctness and validity. These values are combined for one provider and subsequently for multiple issuers. We embed this trust function in a model within the self-sovereign identity ecosystem. To practically apply the trust function and solve several challenges for the service provider that arise from adopting self-sovereign identity solutions, we conceptualise and implement an identity broker. The mediator applies a component-based architecture to abstract from a single solution. Standard identity and access management protocols build the interface for applications. We can conclude that the broker’s usage at the side of the service provider does not undermine self-sovereign principles, but fosters the advancement of the ecosystem. The identity broker is applied to sample web applications with distinct attribute requirements to showcase usefulness for authentication and attribute-based access control within a case study.
N2  - Das Identitätsmanagement ist Kernbestandteil der Sicherheitsfunktionen von Applikationen. Es unterscheidet berechtigte Benutzung von illegitimer Verwendung. Die Modelle des Identitätsmanagements haben sich vom isolierten zum zentralisierten Paradigma und darüber hinaus zu Identitätsverbünden weiterentwickelt. Im Rahmen dieser Evolution ist der Identitätsanbieter zu einer mächtigen vertrauenswürdigen dritten Partei aufgestiegen. Zur Etablierung eines bis jetzt noch unvorstellbaren Machtgleichgewichts wurde der Grundgedanke der selbstbestimmten Identität proklamiert. Eine tiefgehende Analyse des neuen Konzepts unterstützt auf essentielle Weise das generelle Verständnis der Vorzüge und Defizite. Bei der Analyse des Modells untersuchen wir zu Beginn strukturelle Komponenten des selbstbestimmten Identitätsmanagements basierend auf der Blockchain Technologie. Anschließend erforschen wir Vertrauensanforderungen in diesem Kontext anhand von Mustern. Diese schematischen Darstellungen illustrieren das Verhältnis der Hauptakteure im Verbund mit einem dezentralisierten Identitätsanbieter. Im Vergleich zu den traditionellen Paradigmen, können wir festellen, dass kein Vertrauen mehr in das Verwalten von Anmeldeinformationen und der korrekten Authentifizierung benötigt wird. Zusätzlich bewirkt die Verwendung von vertrauensfördernder Attributaggregation eine weitere Transformation der Vertrauenssituation. Darauffolgend formalisieren wir die Darstellung von Vertrauensmodellen in Attribute Assurance mit Hilfe eines Meta-Frameworks. Als zentrale Komponenten sind das Attestierungs- und Vertrauensnetzwerk sowie der Vertrauensentscheidungsprozess, einschließlich der Vertrauensfunktion, enthalten. Ein sicheres Vertrauensmodell beruht auf der Sicherheit der Vertrauensfunktion. Hohe Vertrauenswerte sowie mehrere Attributaussteller sollten dafür berücksichtigt werden. Des Weiteren evaluieren wir Klassifikation, die konzeptionelle und praktische Analyse sowie die Simulation als Untersuchungsansätze für Vertrauensmodelle. Für die Umsetzung der vertrauensfördernden Attributaggregation schlagen wir einen wahrscheinlichkeitstheoretischen Ansatz vor. Die entwickelte Methode basiert auf den primären Charakteristiken der Korrektheit und Gültigkeit von Attributen. Diese Indikatoren werden für einen und anschließend für mehrere Merkmalsanbieter kombiniert. Zusätzlich betten wir die daraus entstehende Vertrauensfunktion in ein vollständiges Modell auf Basis des Ökosystem von selbstbestimmten Identitäten ein. Für die praktische Anwendung der Vertrauensfunktion und die Überwindung mehrerer Herausforderungen für den Dienstanbieter, bei der Einführung selbstbestimmter Identitätslösungen, konzipieren und implementieren wir einen Identitätsbroker. Dieser Vermittler besteht aus einer komponentenbasierten Architektur, um von einer dedizierten selbstbestimmten Identitätslösung zu abstrahieren. Zusätzlich bilden etablierte Identitäts- und Zugriffsverwaltungsprotokolle die Schnittstelle zu herkömmlichen Anwendungen. Der Einsatz des Brokers auf der Seite des Dienstanbieters unterminiert nicht die Grundsätze der selbstbestimmten Identität. Im Gegensatz wird die Weiterentwicklung des entsprechenden Ökosystems gefördert. Innerhalb einer Fallstudie wird die Verwendung des Identitätsbrokers bei Anwendungen mit unterschiedlichen Anforderungen an Benutzerattribute betrachtet, um die Nützlichkeit bei der Authentifizierung und Attributbasierten Zugriffskontrolle zu demonstrieren.
KW  - identity
KW  - self-sovereign identity
KW  - trust
KW  - attribute assurance
KW  - Identität
KW  - selbst-souveräne Identitäten
KW  - Vertrauen
KW  - Attributsicherung
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-567450
ER  - 
TY  - GEN
A1  - Panzer, Marcel
A1  - Bender, Benedict
A1  - Gronau, Norbert
T1  - Neural agent-based production planning and control
BT  - an architectural review
T2  - Zweitveröffentlichungen der Universität Potsdam : Wirtschafts- und Sozialwissenschaftliche Reihe
N2  - Nowadays, production planning and control must cope with mass customization, increased fluctuations in demand, and high competition pressures. Despite prevailing market risks, planning accuracy and increased adaptability in the event of disruptions or failures must be ensured, while simultaneously optimizing key process indicators. To manage that complex task, neural networks that can process large quantities of high-dimensional data in real time have been widely adopted in recent years. Although these are already extensively deployed in production systems, a systematic review of applications and implemented agent embeddings and architectures has not yet been conducted. The main contribution of this paper is to provide researchers and practitioners with an overview of applications and applied embeddings and to motivate further research in neural agent-based production. Findings indicate that neural agents are not only deployed in diverse applications, but are also increasingly implemented in multi-agent environments or in combination with conventional methods — leveraging performances compared to benchmarks and reducing dependence on human experience. This not only implies a more sophisticated focus on distributed production resources, but also broadening the perspective from a local to a global scale. Nevertheless, future research must further increase scalability and reproducibility to guarantee a simplified transfer of results to reality.
T3  - Zweitveröffentlichungen der Universität Potsdam : Wirtschafts- und Sozialwissenschaftliche Reihe - 172 
KW  - production planning and control
KW  - machine learning
KW  - neural networks
KW  - systematic literature review
KW  - taxonomy
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-604777
SN  - 1867-5808
ER  - 
TY  - JOUR
A1  - Panzer, Marcel
A1  - Bender, Benedict
A1  - Gronau, Norbert
T1  - Neural agent-based production planning and control
BT  - an architectural review
JF  - Journal of Manufacturing Systems
N2  - Nowadays, production planning and control must cope with mass customization, increased fluctuations in demand, and high competition pressures. Despite prevailing market risks, planning accuracy and increased adaptability in the event of disruptions or failures must be ensured, while simultaneously optimizing key process indicators. To manage that complex task, neural networks that can process large quantities of high-dimensional data in real time have been widely adopted in recent years. Although these are already extensively deployed in production systems, a systematic review of applications and implemented agent embeddings and architectures has not yet been conducted. The main contribution of this paper is to provide researchers and practitioners with an overview of applications and applied embeddings and to motivate further research in neural agent-based production. Findings indicate that neural agents are not only deployed in diverse applications, but are also increasingly implemented in multi-agent environments or in combination with conventional methods — leveraging performances compared to benchmarks and reducing dependence on human experience. This not only implies a more sophisticated focus on distributed production resources, but also broadening the perspective from a local to a global scale. Nevertheless, future research must further increase scalability and reproducibility to guarantee a simplified transfer of results to reality.
KW  - production planning and control
KW  - machine learning
KW  - neural networks
KW  - systematic literature review
KW  - taxonomy
Y1  - 2022
U6  - https://doi.org/10.1016/j.jmsy.2022.10.019
SN  - 0278-6125
SN  - 1878-6642
VL  - 65
SP  - 743
EP  - 766
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - GEN
A1  - Monti, Remo
A1  - Rautenstrauch, Pia
A1  - Ghanbari, Mahsa
A1  - Rani James, Alva
A1  - Kirchler, Matthias
A1  - Ohler, Uwe
A1  - Konigorski, Stefan
A1  - Lippert, Christoph
T1  - Identifying interpretable gene-biomarker associations with functionally informed kernel-based tests in 190,000 exomes
T2  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät
N2  - Here we present an exome-wide rare genetic variant association study for 30 blood biomarkers in 191,971 individuals in the UK Biobank. We compare gene- based association tests for separate functional variant categories to increase interpretability and identify 193 significant gene-biomarker associations. Genes associated with biomarkers were ~ 4.5-fold enriched for conferring Mendelian disorders. In addition to performing weighted gene-based variant collapsing tests, we design and apply variant-category-specific kernel-based tests that integrate quantitative functional variant effect predictions for mis- sense variants, splicing and the binding of RNA-binding proteins. For these tests, we present a computationally efficient combination of the likelihood- ratio and score tests that found 36% more associations than the score test alone while also controlling the type-1 error. Kernel-based tests identified 13% more associations than their gene-based collapsing counterparts and had advantages in the presence of gain of function missense variants. We introduce local collapsing by amino acid position for missense variants and use it to interpret associations and identify potential novel gain of function variants in PIEZO1. Our results show the benefits of investigating different functional mechanisms when performing rare-variant association tests, and demonstrate pervasive rare-variant contribution to biomarker variability.
T3  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 16 
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-586078
IS  - 16
ER  - 
TY  - JOUR
A1  - Monti, Remo
A1  - Rautenstrauch, Pia
A1  - Ghanbari, Mahsa
A1  - Rani James, Alva
A1  - Kirchler, Matthias
A1  - Ohler, Uwe
A1  - Konigorski, Stefan
A1  - Lippert, Christoph
T1  - Identifying interpretable gene-biomarker associations with functionally informed kernel-based tests in 190,000 exomes
JF  - Nature Communications
N2  - Here we present an exome-wide rare genetic variant association study for 30 blood biomarkers in 191,971 individuals in the UK Biobank. We compare gene- based association tests for separate functional variant categories to increase interpretability and identify 193 significant gene-biomarker associations. Genes associated with biomarkers were ~ 4.5-fold enriched for conferring Mendelian disorders. In addition to performing weighted gene-based variant collapsing tests, we design and apply variant-category-specific kernel-based tests that integrate quantitative functional variant effect predictions for mis- sense variants, splicing and the binding of RNA-binding proteins. For these tests, we present a computationally efficient combination of the likelihood- ratio and score tests that found 36% more associations than the score test alone while also controlling the type-1 error. Kernel-based tests identified 13% more associations than their gene-based collapsing counterparts and had advantages in the presence of gain of function missense variants. We introduce local collapsing by amino acid position for missense variants and use it to interpret associations and identify potential novel gain of function variants in PIEZO1. Our results show the benefits of investigating different functional mechanisms when performing rare-variant association tests, and demonstrate pervasive rare-variant contribution to biomarker variability.
Y1  - 2022
U6  - https://doi.org/10.1038/s41467-022-32864-2
SN  - 2041-1723
VL  - 13
PB  - Nature Publishing Group UK
CY  - London
ER  - 
TY  - CHAP
A1  - Hagemann, Linus
A1  - Abramova, Olga
T1  - Crafting audience engagement in social media conversations
BT  - evidence from the U.S. 2020 presidential elections
T2  - Proceedings of the 55th Hawaii International Conference on System Sciences
N2  - Observing inconsistent results in prior studies, this paper applies the elaboration likelihood model to investigate the impact of affective and cognitive cues embedded in social media messages on audience engagement during a political event. Leveraging a rich dataset in the context of the 2020 U.S. presidential elections containing more than 3 million tweets, we found the prominence of both cue types. For the overall sample, positivity and sentiment are negatively related to engagement. In contrast, the post-hoc sub-sample analysis of tweets from famous users shows that emotionally charged content is more engaging. The role of sentiment decreases when the number of followers grows and ultimately becomes insignificant for Twitter participants with a vast number of followers. Prosocial orientation (“we-talk”) is consistently associated with more likes, comments, and retweets in the overall sample and sub-samples.
KW  - mediated conversation
KW  - big data
KW  - engagement
KW  - sentiment analysis
KW  - social media
Y1  - 2022
SN  - 978-0-9981331-5-7
SP  - 3222
EP  - 3231
PB  - HICSS Conference Office University of Hawaii at Manoa
CY  - Honolulu
ER  - 
TY  - THES
A1  - Jiang, Lan
T1  - Discovering metadata in data files
N2  - It is estimated that data scientists spend up to 80% of the time exploring, cleaning, and transforming their data. A major reason for that expenditure is the lack of knowledge about the used data, which are often from different sources and have heterogeneous structures. As a means to describe various properties of data, metadata can help data scientists understand and prepare their data, saving time for innovative and valuable data analytics. However, metadata do not always exist: some data file formats are not capable of storing them; metadata were deleted for privacy concerns; legacy data may have been produced by systems that were not designed to store and handle meta- data. As data are being produced at an unprecedentedly fast pace and stored in diverse formats, manually creating metadata is not only impractical but also error-prone, demanding automatic approaches for metadata detection.

In this thesis, we are focused on detecting metadata in CSV files – a type of plain-text file that, similar to spreadsheets, may contain different types of content at arbitrary positions. We propose a taxonomy of metadata in CSV files and specifically address the discovery of three different metadata: line and cell type, aggregations, and primary keys and foreign keys.

Data are organized in an ad-hoc manner in CSV files, and do not follow a fixed structure, which is assumed by common data processing tools. Detecting the structure of such files is a prerequisite of extracting information from them, which can be addressed by detecting the semantic type, such as header, data, derived, or footnote, of each line or each cell. We propose the supervised- learning approach Strudel to detect the type of lines and cells. CSV files may also include aggregations. An aggregation represents the arithmetic relationship between a numeric cell and a set of other numeric cells. Our proposed AggreCol algorithm is capable of detecting aggregations of five arithmetic functions in CSV files. Note that stylistic features, such as font style and cell background color, do not exist in CSV files. Our proposed algorithms address the respective problems by using only content, contextual, and computational features.

Storing a relational table is also a common usage of CSV files. Primary keys and foreign keys are important metadata for relational databases, which are usually not present for database instances dumped as plain-text files. We propose the HoPF algorithm to holistically detect both constraints in relational databases. Our approach is capable of distinguishing true primary and foreign keys from a great amount of spurious unique column combinations and inclusion dependencies, which can be detected by state-of-the-art data profiling algorithms.
N2  - Schätzungen zufolge verbringen Datenwissenschaftler bis zu 80% ihrer Zeit mit der Erkundung, Bereinigung und Umwandlung ihrer Daten. Ein Hauptgrund für diesen Aufwand ist das fehlende Wissen über die verwendeten Daten, die oft aus unterschiedlichen Quellen stammen und heterogene Strukturen aufweisen.
Als Mittel zur Beschreibung verschiedener Dateneigenschaften können Metadaten Datenwissenschaftlern dabei helfen, ihre Daten zu verstehen und aufzubereiten, und so wertvolle Zeit die Datenanalysen selbst sparen.
Metadaten sind jedoch nicht immer vorhanden: Zum Beispiel sind einige Dateiformate nicht in der Lage, sie zu speichern; Metadaten können aus Datenschutzgründen gelöscht worden sein; oder ältere Daten wurden möglicherweise von Systemen erzeugt, die nicht für die Speicherung und Verarbeitung von Metadaten konzipiert waren. Da Daten in einem noch nie dagewesenen Tempo produziert und in verschiedenen Formaten gespeichert werden, ist die manuelle Erstellung von Metadaten nicht nur unpraktisch, sondern auch fehleranfällig, so dass automatische Ansätze zur Metadatenerkennung erforderlich sind.

In dieser Arbeit konzentrieren wir uns auf die Erkennung von Metadaten in CSV-Dateien - einer Art von Klartextdateien, die, ähnlich wie Tabellenkalkulationen, verschiedene Arten von Inhalten an beliebigen Positionen enthalten können. Wir schlagen eine Taxonomie der Metadaten in CSV-Dateien vor und befassen uns speziell mit der Erkennung von drei verschiedenen Metadaten: Zeile und Zellensemantischer Typ, Aggregationen sowie Primärschlüssel und Fremdschlüssel.

Die Daten sind in CSV-Dateien ad-hoc organisiert und folgen keiner festen Struktur, wie sie von gängigen Datenverarbeitungsprogrammen angenommen wird. Die Erkennung der Struktur solcher Dateien ist eine Voraussetzung für die Extraktion von Informationen aus ihnen, die durch die Erkennung des semantischen Typs jeder Zeile oder jeder Zelle, wie z. B. Kopfzeile, Daten, abgeleitete Daten oder Fußnote, angegangen werden kann. Wir schlagen den Ansatz des überwachten Lernens, genannt „Strudel“ vor, um den strukturellen Typ von Zeilen und Zellen zu klassifizieren. CSV-Dateien können auch Aggregationen enthalten. Eine Aggregation stellt die arithmetische Beziehung zwischen einer numerischen Zelle und einer Reihe anderer numerischer Zellen dar. Der von uns vorgeschlagene „Aggrecol“-Algorithmus ist in der Lage, Aggregationen von fünf arithmetischen Funktionen in CSV-Dateien zu erkennen. Da stilistische Merkmale wie Schriftart und Zellhintergrundfarbe in CSV-Dateien nicht vorhanden sind, die von uns vorgeschlagenen Algorithmen die entsprechenden Probleme, indem sie nur die Merkmale Inhalt, Kontext und Berechnungen verwenden.

Die Speicherung einer relationalen Tabelle ist ebenfalls eine häufige Verwendung von CSV-Dateien. Primär- und Fremdschlüssel sind wichtige Metadaten für relationale Datenbanken, die bei Datenbankinstanzen, die als reine Textdateien gespeichert werden, normalerweise nicht vorhanden sind. Wir schlagen den „HoPF“-Algorithmus vor, um beide Constraints in relationalen Datenbanken ganzheitlich zu erkennen. Unser Ansatz ist in der Lage, echte Primär- und Fremdschlüssel von einer großen Menge an falschen eindeutigen Spaltenkombinationen und Einschlussabhängigkeiten zu unterscheiden, die von modernen Data-Profiling-Algorithmen erkannt werden können.
KW  - data preparation
KW  - metadata detection
KW  - data wrangling
KW  - Datenaufbereitung
KW  - Datentransformation
KW  - Erkennung von Metadaten
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-566204
ER  - 
TY  - JOUR
A1  - Rosin, Paul L.
A1  - Lai, Yu-Kun
A1  - Mould, David
A1  - Yi, Ran
A1  - Berger, Itamar
A1  - Doyle, Lars
A1  - Lee, Seungyong
A1  - Li, Chuan
A1  - Liu, Yong-Jin
A1  - Semmo, Amir
A1  - Shamir, Ariel
A1  - Son, Minjung
A1  - Winnemöller, Holger
T1  - NPRportrait 1.0: A three-level benchmark for non-photorealistic rendering of portraits
JF  - Computational visual media
N2  - Recently, there has been an upsurge of activity in image-based non-photorealistic rendering (NPR), and in particular portrait image stylisation, due to the advent of neural style transfer (NST). However, the state of performance evaluation in this field is poor, especially compared to the norms in the computer vision and machine learning communities. Unfortunately, the task of evaluating image stylisation is thus far not well defined, since it involves subjective, perceptual, and aesthetic aspects. To make progress towards a solution, this paper proposes a new structured, three-level, benchmark dataset for the evaluation of stylised portrait images. Rigorous criteria were used for its construction, and its consistency was validated by user studies. Moreover, a new methodology has been developed for evaluating portrait stylisation algorithms, which makes use of the different benchmark levels as well as annotations provided by user studies regarding the characteristics of the faces. We perform evaluation for a wide variety of image stylisation methods (both portrait-specific and general purpose, and also both traditional NPR approaches and NST) using the new benchmark dataset.
KW  - non-photorealistic rendering (NPR)
KW  - image stylization
KW  - style transfer
KW  - portrait
KW  - evaluation
KW  - benchmark
Y1  - 2022
U6  - https://doi.org/10.1007/s41095-021-0255-3
SN  - 2096-0433
SN  - 2096-0662
VL  - 8
IS  - 3
SP  - 445
EP  - 465
PB  - Springer Nature
CY  - London
ER  - 
TY  - JOUR
A1  - Taleb, Aiham
A1  - Rohrer, Csaba
A1  - Bergner, Benjamin
A1  - De Leon, Guilherme
A1  - Rodrigues, Jonas Almeida
A1  - Schwendicke, Falk
A1  - Lippert, Christoph
A1  - Krois, Joachim
T1  - Self-supervised learning methods for label-efficient dental caries classification
JF  - Diagnostics : open access journal
N2  - High annotation costs are a substantial bottleneck in applying deep learning architectures to clinically relevant use cases, substantiating the need for algorithms to learn from unlabeled data. 

In this work, we propose employing self-supervised methods. To that end, we trained with three self-supervised algorithms on a large corpus of unlabeled dental images, which contained 38K bitewing radiographs (BWRs). We then applied the learned neural network representations on tooth-level dental caries classification, for which we utilized labels extracted from electronic health records (EHRs). Finally, a holdout test-set was established, which consisted of 343 BWRs and was annotated by three dental professionals and approved by a senior dentist. 

This test-set was used to evaluate the fine-tuned caries classification models. Our experimental results demonstrate the obtained gains by pretraining models using self-supervised algorithms. These include improved caries classification performance (6 p.p. increase in sensitivity) and, most importantly, improved label-efficiency. 
In other words, the resulting models can be fine-tuned using few labels (annotations). 

Our results show that using as few as 18 annotations can produce >= 45% sensitivity, which is comparable to human-level diagnostic performance. 
This study shows that self-supervision can provide gains in medical image analysis, particularly when obtaining labels is costly and expensive.
KW  - unsupervised methods
KW  - self-supervised learning
KW  - representation learning
KW  - dental caries classification
KW  - data driven approaches
KW  - annotation
KW  - efficient deep learning
Y1  - 2022
U6  - https://doi.org/10.3390/diagnostics12051237
SN  - 2075-4418
VL  - 12
IS  - 5
PB  - MDPI
CY  - Basel
ER  - 
TY  - JOUR
A1  - Wiemker, Veronika
A1  - Bunova, Anna
A1  - Neufeld, Maria
A1  - Gornyi, Boris
A1  - Yurasova, Elena
A1  - Konigorski, Stefan
A1  - Kalinina, Anna
A1  - Kontsevaya, Anna
A1  - Ferreira-Borges, Carina
A1  - Probst, Charlotte
T1  - Pilot study to evaluate usability and acceptability of the 'Animated Alcohol Assessment Tool' in Russian primary healthcare
JF  - Digital health
N2  - Background and aims: Accurate and user-friendly assessment tools quantifying alcohol consumption are a prerequisite to effective prevention and treatment programmes, including Screening and Brief Intervention. Digital tools offer new potential in this field. We developed the ‘Animated Alcohol Assessment Tool’ (AAA-Tool), a mobile app providing an interactive version of the World Health Organization's Alcohol Use Disorders Identification Test (AUDIT) that facilitates the description of individual alcohol consumption via culturally informed animation features. This pilot study evaluated the Russia-specific version of the Animated Alcohol Assessment Tool with regard to (1) its usability and acceptability in a primary healthcare setting, (2) the plausibility of its alcohol consumption assessment results and (3) the adequacy of its Russia-specific vessel and beverage selection. Methods: Convenience samples of 55 patients (47% female) and 15 healthcare practitioners (80% female) in 2 Russian primary healthcare facilities self-administered the Animated Alcohol Assessment Tool and rated their experience on the Mobile Application Rating Scale – User Version. Usage data was automatically collected during app usage, and additional feedback on regional content was elicited in semi-structured interviews. Results: On average, patients completed the Animated Alcohol Assessment Tool in 6:38 min (SD = 2.49, range = 3.00–17.16). User satisfaction was good, with all subscale Mobile Application Rating Scale – User Version scores averaging >3 out of 5 points. A majority of patients (53%) and practitioners (93%) would recommend the tool to ‘many people’ or ‘everyone’. Assessed alcohol consumption was plausible, with a low number (14%) of logically impossible entries. Most patients reported the Animated Alcohol Assessment Tool to reflect all vessels (78%) and all beverages (71%) they typically used. Conclusion: High acceptability ratings by patients and healthcare practitioners, acceptable completion time, plausible alcohol usage assessment results and perceived adequacy of region-specific content underline the Animated Alcohol Assessment Tool's potential to provide a novel approach to alcohol assessment in primary healthcare. After its validation, the Animated Alcohol Assessment Tool might contribute to reducing alcohol-related harm by facilitating Screening and Brief Intervention implementation in Russia and beyond.
KW  - Alcohol use assessment
KW  - Alcohol Use Disorders Identification Test
KW  - screening tools
KW  - digital health
KW  - mobile applications
KW  - Russia
KW  - primary healthcare
KW  - usability
KW  - acceptability
Y1  - 2022
U6  - https://doi.org/10.1177/20552076211074491
SN  - 2055-2076
VL  - 8
PB  - Sage Publications
CY  - London
ER  - 
TY  - JOUR
A1  - Ulrich, Jens-Uwe
A1  - Lutfi, Ahmad
A1  - Rutzen, Kilian
A1  - Renard, Bernhard Y.
T1  - ReadBouncer
BT  - precise and scalable adaptive sampling for nanopore sequencing
JF  - Bioinformatics
N2  - Motivation: 
Nanopore sequencers allow targeted sequencing of interesting nucleotide sequences by rejecting other sequences from individual pores. This feature facilitates the enrichment of low-abundant sequences by depleting overrepresented ones in-silico. Existing tools for adaptive sampling either apply signal alignment, which cannot handle human-sized reference sequences, or apply read mapping in sequence space relying on fast graphical processing units (GPU) base callers for real-time read rejection. Using nanopore long-read mapping tools is also not optimal when mapping shorter reads as usually analyzed in adaptive sampling applications. 

Results: 
Here, we present a new approach for nanopore adaptive sampling that combines fast CPU and GPU base calling with read classification based on Interleaved Bloom Filters. ReadBouncer improves the potential enrichment of low abundance sequences by its high read classification sensitivity and specificity, outperforming existing tools in the field. It robustly removes even reads belonging to large reference sequences while running on commodity hardware without GPUs, making adaptive sampling accessible for in-field researchers. Readbouncer also provides a user-friendly interface and installer files for end-users without a bioinformatics background.
Y1  - 2022
U6  - https://doi.org/10.1093/bioinformatics/btac223
SN  - 1367-4803
SN  - 1367-4811
VL  - 38
IS  - SUPPL 1
SP  - 153
EP  - 160
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Wittig, Alice
A1  - Miranda, Fabio Malcher
A1  - Hölzer, Martin
A1  - Altenburg, Tom
A1  - Bartoszewicz, Jakub Maciej
A1  - Beyvers, Sebastian
A1  - Dieckmann, Marius Alfred
A1  - Genske, Ulrich
A1  - Giese, Sven Hans-Joachim
A1  - Nowicka, Melania
A1  - Richard, Hugues
A1  - Schiebenhoefer, Henning
A1  - Schmachtenberg, Anna-Juliane
A1  - Sieben, Paul
A1  - Tang, Ming
A1  - Tembrockhaus, Julius
A1  - Renard, Bernhard Y.
A1  - Fuchs, Stephan
T1  - CovRadar
BT  - continuously tracking and filtering SARS-CoV-2 mutations for genomic surveillance
JF  - Bioinformatics
N2  - The ongoing pandemic caused by SARS-CoV-2 emphasizes the importance of genomic surveillance to understand the evolution of the virus, to monitor the viral population, and plan epidemiological responses. Detailed analysis, easy visualization and intuitive filtering of the latest viral sequences are powerful for this purpose. We present CovRadar, a tool for genomic surveillance of the SARS-CoV-2 Spike protein. CovRadar consists of an analytical pipeline and a web application that enable the analysis and visualization of hundreds of thousand sequences. First, CovRadar extracts the regions of interest using local alignment, then builds a multiple sequence alignment, infers variants and consensus and finally presents the results in an interactive app, making accessing and reporting simple, flexible and fast.
Y1  - 2022
U6  - https://doi.org/10.1093/bioinformatics/btac411
SN  - 1367-4803
SN  - 1367-4811
VL  - 38
IS  - 17
SP  - 4223
EP  - 4225
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Schladebach, Marcus
T1  - Satelliten-Megakonstellationen im Weltraumrecht
JF  - Kommunikation & Recht : K & R / Beihefter
Y1  - 2022
SN  - 1434-6354
IS  - 2
SP  - 26
EP  - 29
PB  - dfv-Mediengruppe
CY  - Frankfurt am Main
ER  - 
TY  - BOOK
A1  - Meinel, Christoph
A1  - Willems, Christian
A1  - Staubitz, Thomas
A1  - Sauer, Dominic
A1  - Hagedorn, Christiane
T1  - openHPI
T1  - openHPI
BT  - 10 Jahre MOOCs am Hasso-Plattner-Institut
BT  - 10 Years of MOOCs at the Hasso Plattner Institute
N2  - Anlässlich des 10-jährigen Jubiläums von openHPI informiert dieser technische Bericht über die HPI-MOOC-Plattform einschließlich ihrer Kernfunktionen, Technologie und Architektur.
In einer Einleitung wird die Plattformfamilie mit allen Partnerplattformen vorgestellt; diese belaufen sich inklusive openHPI aktuell auf neun Plattformen. In diesem Abschnitt wird außerdem gezeigt, wie openHPI als Berater und Forschungspartner in verschiedenen Projekten fungiert. 

Im zweiten Kapitel werden die Funktionalitäten und gängigen Kursformate der Plattform präsentiert. Die Funktionalitäten sind in Lerner- und Admin-Funktionen unterteilt. Der Bereich Lernerfunktionen bietet detaillierte Informationen zu Leistungsnachweisen, Kursen und den Lernmaterialien, aus denen sich ein Kurs zusammensetzt: Videos, Texte und Quiz. Darüber hinaus können die Lernmaterialien durch externe Übungstools angereichert werden, die über den Standard Learning Tools Interoperability (LTI) mit der HPI MOOC-Plattform kommunizieren. Das Konzept der Peer-Assessments rundet die möglichen Lernmaterialien ab.
Der Abschnitt geht dann weiter auf das Diskussionsforum ein, das einen grundlegenden Unterschied von MOOCs im Vergleich zu traditionellen E-Learning-Angeboten darstellt. Zum Abschluss des Abschnitts folgen eine Beschreibung von Quiz-Recap, Lernzielen, mobilen Anwendungen, spielerischen Lernens und dem Helpdesk.

Der nächste Teil dieses Kapitels beschäftigt sich mit den Admin-Funktionen. Die Funktionalitätsbeschreibung beschränkt sich Neuigkeiten und Ankündigungen, Dashboards und Statistiken, Berichtsfunktionen, Forschungsoptionen mit A/B-Tests, den Kurs-Feed und das TransPipe-Tool zur Unterstützung beim Erstellen von automatischen oder manuellen Untertiteln. Die Plattform unterstützt außerdem eine Vielzahl zusätzlicher Funktionen, doch eine detaillierte Beschreibung dieser Funktionen würde den Rahmen des Berichts sprengen.
Das Kapitel geht dann auf gängige Kursformate und openHPI-Lehrveranstaltungen am HPI ein, bevor es mit einigen Best Practices für die Gestaltung und Durchführung von Kursen schließt.
Zum Abschluss des technischen Berichts gibt das letzte Kapitel eine Zusammenfassung und einen Ausblick auf die Zukunft der digitalen Bildung. 

Ein besonderes Merkmal des openHPI-Projekts ist die bewusste Entscheidung, die komplette Anwendung von den physischen Netzwerkkomponenten bis zur Plattformentwicklung eigenständig zu betreiben. Bei der vorliegenden deutschen Variante handelt es sich um eine gekürzte Übersetzung des technischen Berichts 148, bei der kein Einblick in die Technologien und Architektur von openHPI gegeben wird. Interessierte Leser:innen können im technischen Bericht 148 (vollständige englische Version) detaillierte Informationen zum Rechenzentrum und den Geräten, der Cloud-Software und dem openHPI Cloud Service aber auch zu Infrastruktur-Anwendungskomponenten wie Entwicklungstools, Automatisierung, Deployment-Pipeline und Monitoring erhalten. Außerdem finden sich dort weitere Informationen über den Technologiestack und konkrete Implementierungsdetails der Plattform inklusive der serviceorientierten Ruby on Rails-Anwendung, die Kommunikation zwischen den Diensten, öffentliche APIs, sowie Designsystem und -komponenten. Der Abschnitt schließt mit einer Diskussion über die ursprüngliche Microservice-Architektur und die Migration zu einer monolithischen Anwendung.
N2  - On the occasion of the 10th openHPI anniversary, this technical report provides information about the HPI MOOC platform, including its core features, technology, and architecture.

In an introduction, the platform family with all partner platforms is presented; these now amount to nine platforms, including openHPI. This section introduces openHPI as an advisor and research partner in various projects. 

In the second chapter, the functionalities and common course formats of the platform are presented. The functionalities are divided into learner and admin features. The learner features section provides detailed information about performance records, courses, and the learning materials of which a course is composed: videos, texts, and quizzes. In addition, the learning materials can be enriched by adding external exercise tools that communicate with the HPI MOOC platform via the Learning Tools Interoperability (LTI) standard. Furthermore, the concept of peer assessments completed the possible learning materials.
The section then proceeds with further information on the discussion forum, a fundamental concept of MOOCs compared to traditional e-learning offers. The section is concluded with a description of the quiz recap, learning objectives, mobile applications, gameful learning, and the help desk.

The next part of this chapter deals with the admin features. The described functionality is restricted to describing the news and announcements, dashboards and statistics, reporting capabilities, research options with A/B testing, the course feed, and the TransPipe tool to support the process of creating automated or manual subtitles. The platform supports a large variety of additional features, but a detailed description of these features goes beyond the scope of this report.
The chapter then elaborates on common course formats and openHPI teaching activities at the HPI. The chapter concludes with some best practices for course design and delivery.

The third chapter provides insights into the technology and architecture behind openHPI. A special characteristic of the openHPI project is the conscious decision to operate the complete application from bare metal to platform development. Hence, the chapter starts with a section about the openHPI Cloud, including detailed information about the data center and devices, the used cloud software OpenStack and Ceph, as well as the openHPI Cloud Service provided for the HPI.

Afterward, a section on the application technology stack and development tooling describes the application infrastructure components, the used automation, the deployment pipeline, and the tools used for monitoring and alerting. The chapter is concluded with detailed information about the technology stack and concrete platform implementation details. The section describes the service-oriented Ruby on Rails application, inter-service communication, and public APIs. It also provides more information on the design system and components used in the application. The section concludes with a discussion of the original microservice architecture, where we share our insights and reasoning for migrating back to a monolithic application.

The last chapter provides a summary and an outlook on the future of digital education.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 150 
KW  - openHPI
KW  - MOOC
KW  - digitale Lernplattform
KW  - digitale Aufklärung
KW  - lebenslanges Lernen
KW  - openHPI
KW  - MOOC
KW  - digital learning platform
KW  - digital enlightenment
KW  - lifelong learning
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-561792
SN  - 978-3-86956-546-0
SN  - 1613-5652
SN  - 2191-1665
IS  - 150
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Dehnert, Maik
T1  - Studies on the Digital Transformation of Incumbent Organizations
T1  - Studien zur Digitalen Transformation traditioneller Organisationen
BT  - Causes, Effects and Solutions for Banking
BT  - Ursachen, Wirkungen und Lösungen für das Bankwesen
N2  - Traditional organizations are strongly encouraged by emerging digital customer behavior and digital competition to transform their businesses for the digital age. Incumbents are particularly exposed to the field of tension between maintaining and renewing their business model. Banking is one of the industries most affected by digitalization, with a large stream of digital innovations around Fintech. Most research contributions focus on digital innovations, such as Fintech, but there are only a few studies on the related challenges and perspectives of incumbent organizations, such as traditional banks. Against this background, this dissertation examines the specific causes, effects and solutions for traditional banks in digital transformation − an underrepresented research area so far.

The first part of the thesis examines how digitalization has changed the latent customer expectations in banking and studies the underlying technological drivers of evolving business-to-consumer (B2C) business models. Online consumer reviews are systematized to identify latent concepts of customer behavior and future decision paths as strategic digitalization effects. Furthermore, the service attribute preferences, the impact of influencing factors and the underlying customer segments are uncovered for checking accounts in a discrete choice experiment. The dissertation contributes here to customer behavior research in digital transformation, moving beyond the technology acceptance model. In addition, the dissertation systematizes value proposition types in the evolving discourse around smart products and services as key drivers of business models and market power in the platform economy.

The second part of the thesis focuses on the effects of digital transformation on the strategy development of financial service providers, which are classified along with their firm performance levels. Standard types are derived based on fuzzy-set qualitative comparative analysis (fsQCA), with facade digitalization as one typical standard type for low performing incumbent banks that lack a holistic strategic response to digital transformation. Based on this, the contradictory impact of digitalization measures on key business figures is examined for German savings banks, confirming that the shift towards digital customer interaction was not accompanied by new revenue models diminishing bank profitability. The dissertation further contributes to the discourse on digitalized work designs and the consequences for job perceptions in banking customer advisory. The threefold impact of the IT support perceived in customer interaction on the job satisfaction of customer advisors is disentangled.

In the third part of the dissertation, solutions are developed design-oriented for core action areas of digitalized business models, i.e., data and platforms. A consolidated taxonomy for data-driven business models and a future reference model for digital banking have been developed. The impact of the platform economy is demonstrated here using the example of the market entry by Bigtech. The role-based e3-value modeling is extended by meta-roles and role segments and linked to value co-creation mapping in VDML. In this way, the dissertation extends enterprise modeling research on platform ecosystems and value co-creation using the example of banking.
N2  - Traditionelle Unternehmen sehen sich angesichts des zunehmend digitalen Kundenverhaltens und gesteigerten digitalen Wettbewerbs damit konfrontiert, ihr Geschäftsmodell adäquat für das digitale Zeitalter weiterzuentwickeln. Insbesondere etablierte Unternehmen befinden sich dabei in einem Spannungsfeld aus Bewahrung und Erneuerung. Der Großteil jüngerer Forschungsbeiträge zum Bankwesen fokussiert sich auf digitale Fintech-Innovationen, nur wenige Studien befassen sich mit Herausforderungen und Perspektiven traditioneller Banken. Vor diesem Hintergrund untersucht die Dissertation die Ursachen und Wirkungen der Digitalen Transformation im Bankwesen und zeigt Lösungswege für traditionelle Banken auf.

Der erste Teil der Dissertation untersucht die Ursachen der Digitalen Transformation im Banking. Neuartige Einflussfaktoren und Entscheidungspfade im Kundenverhalten werden als strategische Digitalisierungstreiber für Banken identifiziert. Darauf aufbauend werden in einem Discrete-Choice-Experiment die Präferenzen deutscher Bankkunden hinsichtlich digitaler und nicht-digitaler Dienstleistungsattribute am Beispiel von Girokonten untersucht. Die Arbeit leistet einen über das Technologieakzeptanzmodell hinausgehenden Beitrag zur Erforschung des Kundenverhaltens in der Digitalen Transformation. Ein weiterer Forschungsbeitrag systematisiert anschließend wesentliche Charakteristika smarter Produkte und Dienstleistungen als Treiber von Geschäftsmodellen und Marktmacht in der Plattformökonomie.

Der zweite Teil der Arbeit befasst sich zunächst mit den Auswirkungen der Digitalen Transformation auf die Strategieentwicklung von traditionellen Finanzdienstleistern, die mittels Fallstudien entlang ihres Finanzerfolgs typologisiert werden. Die Fassadendigitalisierung wird als Standardtyp traditioneller Anbieter systematisiert, die zwar zunehmend auf digitale Kundeninteraktion setzen, aber die Geschäftsmodelldimension der Digitalen Transformation vernachlässigen. Darauf aufbauend werden in Panelregressionsanalysen die Auswirkungen der Digitalisierung auf deutsche Sparkassen auf betriebswirtschaftliche Kennzahlen untersucht. Eine weitere quantitative Studie untersucht die Wirkungen neuartiger IT-Beratungswerkzeuge auf die Arbeitszufriedenheit von Bankkundenberatern. Die Dissertation leistet hiermit einen Beitrag zur Transformationsforschung in den Bereichen Bankstrategie und Arbeitsprozesse.

Im dritten Teil der Dissertation werden gestaltungsorientiert Lösungsartefakte für die zentralen Handlungsfelder digitalisierter Geschäftsmodelle - Daten und Plattformen - entwickelt. Dies schließt einerseits eine konsolidierte Taxonomie für datengetriebene Geschäftsmodelle und andererseits ein Referenzmodell für zukünftige plattformbasierte Bankenökosysteme ein. Die rollenbasierte Referenzmodellierungsmethodik e3-value wird um Meta-Rollen und Rollensegmente erweitert, um die die strategischen Auswirkungen plattformbasierter Geschäftsmodelle aufzuzeigen. Hiermit erweitert die Dissertation die Unternehmensmodellierungsforschung im Bereich digitaler Plattform-Ökosysteme am Beispiel des Bankwesens.
KW  - digital transformation
KW  - digitalization
KW  - digital strategy
KW  - consumer behavior
KW  - platform ecosystems
KW  - value co-creation
KW  - Fintech
KW  - incumbent
KW  - bank
KW  - Digitale Transformation
KW  - Digitalisierung
KW  - Digitalstrategie
KW  - Kundenverhalten
KW  - Plattform-Ökosysteme
KW  - Wertschöpfungskooperation
KW  - Fintech
KW  - traditionelle Unternehmen
KW  - Bank
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-548324
ER  - 
TY  - JOUR
A1  - Richly, Keven
A1  - Schlosser, Rainer
A1  - Boissier, Martin
T1  - Budget-conscious fine-grained configuration optimization for spatio-temporal applications
JF  - Proceedings of the VLDB Endowment
N2  - Based on the performance requirements of modern spatio-temporal data mining applications, in-memory database systems are often used to store and process the data. To efficiently utilize the scarce DRAM capacities, modern database systems support various tuning possibilities to reduce the memory footprint (e.g., data compression) or increase performance (e.g., additional indexes). However, the selection of cost and performance balancing configurations is challenging due to the vast number of possible setups consisting of mutually dependent individual decisions. In this paper, we introduce a novel approach to jointly optimize the compression, sorting, indexing, and tiering configuration for spatio-temporal workloads. Further, we consider horizontal data partitioning, which enables the independent application of different tuning options on a fine-grained level. We propose different linear programming (LP) models addressing cost dependencies at different levels of accuracy to compute optimized tuning configurations for a given workload and memory budgets. To yield maintainable and robust configurations, we extend our LP-based approach to incorporate reconfiguration costs as well as a worst-case optimization for potential workload scenarios. Further, we demonstrate on a real-world dataset that our models allow to significantly reduce the memory footprint with equal performance or increase the performance with equal memory size compared to existing tuning heuristics.
KW  - General Earth and Planetary Sciences
KW  - Water Science and Technology
KW  - Geography, Planning and Development
Y1  - 2022
U6  - https://doi.org/10.14778/3565838.3565858
SN  - 2150-8097
VL  - 15
IS  - 13
SP  - 4079
EP  - 4092
PB  - Association for Computing Machinery (ACM)
CY  - [New York]
ER  - 
TY  - THES
A1  - Niephaus, Fabio
T1  - Exploratory tool-building platforms for polyglot virtual machines
N2  - Polyglot programming allows developers to use multiple programming languages within the same software project. While it is common to use more than one language in certain programming domains, developers also apply polyglot programming for other purposes such as to re-use software written in other languages. Although established approaches to polyglot programming come with significant limitations, for example, in terms of performance and tool support, developers still use them to be able to combine languages.
Polyglot virtual machines (VMs) such as GraalVM provide a new level of polyglot programming, allowing languages to directly interact with each other. This reduces the amount of glue code needed to combine languages, results in better performance, and enables tools such as debuggers to work across languages. However, only a little research has focused on novel tools that are designed to support developers in building software with polyglot VMs. One reason is that tool-building is often an expensive activity, another one is that polyglot VMs are still a moving target as their use cases and requirements are not yet well understood.
In this thesis, we present an approach that builds on existing self-sustaining programming systems such as Squeak/Smalltalk to enable exploratory programming, a practice for exploring and gathering software requirements, and re-use their extensive tool-building capabilities in the context of polyglot VMs. Based on TruffleSqueak, our implementation for the GraalVM, we further present five case studies that demonstrate how our approach helps tool developers to design and build tools for polyglot programming. We further show that TruffleSqueak can also be used by application developers to build and evolve polyglot applications at run-time and by language and runtime developers to understand the dynamic behavior of GraalVM languages and internals. Since our platform allows all these developers to apply polyglot programming, it can further help to better understand the advantages, use cases, requirements, and challenges of polyglot VMs. Moreover, we demonstrate that our approach can also be applied to other polyglot VMs and that insights gained through it are transferable to other programming systems.
We conclude that our research on tools for polyglot programming is an important step toward making polyglot VMs more approachable for developers in practice. With good tool support, we believe polyglot VMs can make it much more common for developers to take advantage of multiple languages and their ecosystems when building software.
N2  - Durch Polyglottes Programmieren können Softwareentwickler:innen mehrere Programmiersprachen für das Bauen von Software verwenden. Während diese Art von Programmierung in einigen Programmierdomänen üblich ist, wenden Entwickler:innen Polyglottes Programmieren auch aus anderen Gründen an, wie zum Beispiel, um Software über Programmiersprachen hinweg wiederverwenden zu können. Obwohl die bestehenden Ansätze zum Polyglotten Programmieren mit erheblichen Einschränkungen verbunden sind, wie beispielsweise in Bezug zur Laufzeitperformance oder der Unterstützung durch Programmierwerkzeuge, werden sie dennoch von Entwickler:innen genutzt, um Sprachen kombinieren zu können.
Mehrsprachige Ausführungsumgebungen wie zum Beispiel GraalVM bieten Polyglottes Programmieren auf einer neuen Ebene an, welche es Sprachen erlaubt, direkt miteinander zu interagieren. Dadurch wird die Menge an notwendigem Glue Code beim Kombinieren von Sprachen reduziert und die Laufzeitperformance verbessert. Außerdem können Debugger und andere Programmierwerkzeuge über mehrere Sprachen hinweg verwendet werden. Jedoch hat sich bisher nur wenig wissenschaftliche Arbeit mit neuartigen Werkzeugen beschäftigt, die darauf ausgelegt sind, Entwickler:innen beim Polyglotten Programmieren mit mehrsprachigen Ausführungsumgebungen zu unterstützen. Ein Grund dafür ist, dass das Bauen von Werkzeugen üblicherweise sehr aufwendig ist. Ein anderer Grund ist, dass sich mehrsprachige Ausführungsumgebungen immer noch ständig weiterentwickeln, da ihre Anwendungsfälle und Anforderungen noch nicht ausreichend verstanden sind.
In dieser Arbeit stellen wir einen Ansatz vor, der auf selbsttragenden Programmiersystemen wie zum Beispiel Squeak/Smalltalk aufbaut, um Exploratives Programmieren, eine Praktik zum Explorieren und Erfassen von Softwareanforderungen, sowie das Wiederverwenden ihrer umfangreichen Fähigkeiten zum Bauen von Werkzeugen im Rahmen von mehrsprachigen Ausführungsumgebungen zu ermöglichen. Basierend auf TruffleSqueak, unserer Implementierung für die GraalVM, zeigen wir anhand von fünf Fallstudien, wie unser Ansatz Werkzeugentwickler:innen dabei hilft, neue Werkzeuge zum Polyglotten Programmieren zu entwerfen und zu bauen. Außerdem demonstrieren wir, dass TruffleSqueak auch von Anwendungsentwickler:innen zum Bauen und Erweitern von polyglotten Anwendungen zur Laufzeit genutzt werden kann und Sprach- sowie Laufzeitentwickler:innen dabei hilft, das dynamische Verhalten von GraalVM-Sprachen und -Interna zu verstehen. Da unsere Plattform dabei all diesen Entwickler:innen Polyglottes Programmieren erlaubt, trägt sie außerdem dazu bei, dass Vorteile, Anwendungsfälle, Anforderungen und Herausforderungen von mehrsprachigen Ausführungsumgebungen besser verstanden werden können. Darüber hinaus zeigen wir, dass unser Ansatz auch auf andere mehrsprachige Ausführungsumgebungen angewandt werden kann und dass die Erkenntnisse, die man durch unseren Ansatz gewinnen kann, auch auf andere Programmiersysteme übertragbar sind.
Wir schlussfolgern, dass unsere Forschung an Werkzeugen zum Polyglotten Programmieren ein wichtiger Schritt ist, um mehrsprachige Ausführungsumgebungen zugänglicher für Entwickler:innen in der Praxis zu machen. Wir sind davon überzeugt, dass diese Ausführungsumgebungen mit guter Werkzeugunterstützung dazu führen können, dass Softwareentwickler:innen häufiger von den Vorteilen der Verwendung mehrerer Programmiersprachen zum Bauen von Software profitieren wollen.
KW  - polyglot programming
KW  - polyglottes Programmieren
KW  - programming tools
KW  - Programmierwerkzeuge
KW  - Smalltalk
KW  - Smalltalk
KW  - GraalVM
KW  - GraalVM
KW  - virtual machines
KW  - virtuelle Maschinen
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-571776
ER  - 
TY  - JOUR
A1  - Rojahn, Marcel
A1  - Weber, Edzard
A1  - Gronau, Norbert
T1  - Towards a standardization in scheduling models
BT  - assessing the variety of homonyms
JF  - International journal of industrial and systems engineering
N2  - Terminology is a critical instrument for each researcher. Different terminologies for the same research object may arise in different research communities. By this inconsistency, many synergistic effects get lost. Theories and models will be more understandable and reusable if a common terminology is applied. This paper examines the terminological (in)consistence for the research field of job-shop scheduling by a literature review. There is an enormous variety in the choice of terms and mathematical notation for the same concept. The comparability, reusability and combinability of scheduling methods is unnecessarily hampered by the arbitrary use of homonyms and synonyms. The acceptance in the community of used variables and notation forms is shown by means of a compliance quotient. This is proven by the evaluation of 240 scientific publications on planning methods.
KW  - job-shop scheduling
KW  - JSP
KW  - terminology
KW  - notation
KW  - standardization
Y1  - 2023
UR  - https://publications.waset.org/10013137/pdf
SN  - 1748-5037
SN  - 1748-5045
VL  - 17
IS  - 6
SP  - 401
EP  - 408
PB  - Inderscience Enterprises
CY  - Genève
ER  - 
TY  - GEN
A1  - Ritterbusch, Georg David
A1  - Teichmann, Malte Rolf
T1  - Defining the metaverse
BT  - A systematic literature review
T2  - Zweitveröffentlichungen der Universität Potsdam : Wirtschafts- und Sozialwissenschaftliche Reihe
N2  - The term Metaverse is emerging as a result of the late push by multinational technology conglomerates and a recent surge of interest in Web 3.0, Blockchain, NFT, and Cryptocurrencies. From a scientific point of view, there is no definite consensus on what the Metaverse will be like. This paper collects, analyzes, and synthesizes scientific definitions and the accompanying major characteristics of the Metaverse using the methodology of a Systematic Literature Review (SLR). Two revised definitions for the Metaverse are presented, both condensing the key attributes, where the first one is rather simplistic holistic describing “a three-dimensional online environment in which users represented by avatars interact with each other in virtual spaces decoupled from the real physical world”. In contrast, the second definition is specified in a more detailed manner in the paper and further discussed. These comprehensive definitions offer specialized and general scholars an application within and beyond the scientific context of the system science, information system science, computer science, and business informatics, by also introducing open research challenges. Furthermore, an outlook on the social, economic, and technical implications is given, and the preconditions that are necessary for a successful implementation are discussed.
T3  - Zweitveröffentlichungen der Universität Potsdam : Wirtschafts- und Sozialwissenschaftliche Reihe - 159 
KW  - Metaverse
KW  - Systematics
KW  - Bibliometrics
KW  - Augmented reality
KW  - Taxonomy
KW  - Semantic Web
KW  - Second Life
KW  - Blockchains
KW  - Economics
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-588799
SN  - 1867-5808
IS  - 159
SP  - 12368
EP  - 12377
ER  - 
TY  - JOUR
A1  - Ritterbusch, Georg David
A1  - Teichmann, Malte Rolf
T1  - Defining the metaverse
BT  - A systematic literature review
JF  - IEEE Access
N2  - The term Metaverse is emerging as a result of the late push by multinational technology conglomerates and a recent surge of interest in Web 3.0, Blockchain, NFT, and Cryptocurrencies. From a scientific point of view, there is no definite consensus on what the Metaverse will be like. This paper collects, analyzes, and synthesizes scientific definitions and the accompanying major characteristics of the Metaverse using the methodology of a Systematic Literature Review (SLR). Two revised definitions for the Metaverse are presented, both condensing the key attributes, where the first one is rather simplistic holistic describing “a three-dimensional online environment in which users represented by avatars interact with each other in virtual spaces decoupled from the real physical world”. In contrast, the second definition is specified in a more detailed manner in the paper and further discussed. These comprehensive definitions offer specialized and general scholars an application within and beyond the scientific context of the system science, information system science, computer science, and business informatics, by also introducing open research challenges. Furthermore, an outlook on the social, economic, and technical implications is given, and the preconditions that are necessary for a successful implementation are discussed.
KW  - Metaverse
KW  - Systematics
KW  - Bibliometrics
KW  - Augmented reality
KW  - Taxonomy
KW  - Semantic Web
KW  - Second Life
KW  - Blockchains
KW  - Economics
Y1  - 2023
U6  - https://doi.org/10.1109/ACCESS.2023.3241809
SN  - 2169-3536
VL  - 11
SP  - 12368
EP  - 12377
PB  - Institute of Electrical and Electronics Engineers
CY  - New York, NY
ER  - 
TY  - THES
A1  - Bano, Dorina
T1  - Discovering data models from event logs
T1  - Entdecken von Datenmodellen aus Ereignisprotokollen
N2  - In the last two decades, process mining has developed from a niche
discipline to a significant research area with considerable impact on academia and industry. Process mining enables organisations to identify the running business processes from historical execution data. The first requirement of any process mining technique is an event log, an artifact that represents concrete business process executions in the form of sequence of events. These logs can be extracted from the organization's information systems and are used by process experts to retrieve deep insights from the organization's running processes. Considering the events pertaining to such logs, the process models can be automatically discovered and enhanced or annotated with performance-related information. Besides behavioral information, event logs contain domain specific data, albeit implicitly. However, such data are usually overlooked and, thus, not utilized to their full potential.

Within the process mining area, we address in this thesis the research gap of discovering, from event logs, the contextual information that cannot be captured by applying existing process mining techniques. Within this research gap, we identify four key problems and tackle them by looking at an event log from different angles. First, we address the problem of deriving an event log in the absence of a proper database access and domain knowledge. The second problem is related to the under-utilization of the implicit domain knowledge present in an event log that can increase the understandability of the discovered process model. Next, there is a lack of a holistic representation of the historical data manipulation at the process model level of abstraction. Last but not least, each process model presumes to be independent of other process models when discovered from an event log, thus, ignoring possible data dependencies between processes within an organization. 

For each of the problems mentioned above, this thesis proposes a dedicated method. The first method provides a solution to extract an event log only from the transactions performed on the database that are stored in the form of redo logs. The second method deals with discovering the underlying data model that is implicitly embedded in the event log, thus, complementing the discovered process model with important domain knowledge information. The third method captures, on the process model level, how the data affects the running process instances. Lastly, the fourth method is about the discovery of the relations between business processes (i.e., how they exchange data) from a set of event logs and explicitly representing such complex interdependencies in a business process architecture.

All the methods introduced in this thesis are implemented as a prototype and their feasibility is proven by being applied on real-life event logs.
N2  - In den letzten zwei Jahrzehnten hat sich Process Mining von einer Nischendisziplin zu einem bedeutenden Forschungsgebiet mit erheblichen Auswirkungen auf Wissenschaft und Industrie entwickelt. Process Mining ermöglicht es Unternehmen, die laufenden Geschäftsprozesse anhand historischer Ausführungsdaten zu identifizieren. Die erste Voraussetzung für jede Process-Mining-Technik ist ein Ereignisprotokoll (Event Log), ein Artefakt, das konkrete Geschäftsprozessausführungen in Form einer Abfolge von Ereignissen darstellt. Diese Protokolle (Logs) können aus den Informationssystemen der Unternehmen extrahiert werden und ermöglichen es Prozessexperten, tiefe Einblicke in die laufenden Unternehmensprozesse zu gewinnen. Unter Berücksichtigung der Abfolge der Ereignisse in diesen Protokollen (Logs) können Prozessmodelle automatisch entdeckt und mit leistungsbezogenen Informationen erweitert werden. Neben verhaltensbezogenen Informationen enthalten Ereignisprotokolle (Event Logs) auch domänenspezifische Daten, wenn auch nur implizit. Solche Daten werden jedoch in der Regel nicht in vollem Umfang genutzt. Diese Arbeit befasst sich
im Bereich Process Mining mit der Forschungslücke der Extraktion von Kontextinformationen aus Ereignisprotokollen (Event Logs), die von bestehenden Process Mining-Techniken nicht erfasst werden.

Innerhalb dieser Forschungslücke identifizieren wir vier Schlüsselprobleme, bei denen wir die Ereignisprotokolle (Event Logs) aus verschiedenen Perspektiven betrachten. Zunächst befassen wir uns mit dem Problem der Erfassung eines Ereignisprotokolls (Event Logs) ohne hinreichenden Datenbankzugang. Das zweite Problem ist die unzureichende Nutzung des in Ereignisprotokollen (Event Logs) enthaltenen Domänenwissens, das zum besseren Verständnis der generierten Prozessmodelle beitragen kann. Außerdem mangelt es an einer ganzheitlichen Darstellung der historischen Datenmanipulation auf Prozessmodellebene. Nicht zuletzt werden Prozessmodelle häufig unabhängig
von anderen Prozessmodellen betrachtet, wenn sie aus Ereignisprotokollen (Event Logs) ermittelt wurden. Dadurch können mögliche Datenabhängigkeiten zwischen Prozessen innerhalb einer Organisation übersehen werden.

Für jedes der oben genannten Probleme schlägt diese Arbeit eine eigene Methode vor. Die erste Methode ermöglicht es, ein Ereignisprotokoll (Event Log) ausschließlich anhand der Historie der auf einer Datenbank durchgeführten Transaktionen zu extrahieren, die in Form von Redo-Logs gespeichert ist. Die zweite Methode befasst sich mit der Entdeckung des 

zugrundeliegenden Datenmodells, das implizit in dem jeweiligen Ereignisprotokoll (Event Log) eingebettet ist, und ergänzt so mit das entdeckte Prozessmodell mit wichtigen, domänenspezifischen Informationen. Bei der dritten Methode wird auf der Ebene des Prozess-
modells erfasst, wie sich die Daten auf die laufenden Prozessinstanzen auswirken. Die vierte Methode befasst sich schließlich mit der Entdeckung der Beziehungen zwischen Geschäftsprozessen (d.h. deren Datenaustausch) auf Basis der jeweiligen Ereignisprotokolle (Event Logs), sowie mit der expliziten Darstellung solcher komplexen Abhängigkeiten in einer Geschäftsprozessarchitektur.

 

Alle in dieser Arbeit vorgestellten Methoden sind als Prototyp implementiert und ihre Anwendbarkeit wird anhand ihrer Anwendung auf reale Ereignisprotokolle (Event Logs) nachgewiesen.
KW  - process mining
KW  - data models
KW  - business process architectures
KW  - Datenmodelle
KW  - Geschäftsprozessarchitekturen
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-585427
ER  - 
TY  - CHAP
A1  - Rojahn, Marcel
A1  - Ambros, Maximilian
A1  - Biru, Tibebu
A1  - Krallmann, Hermann
A1  - Gronau, Norbert
A1  - Grum, Marcus
ED  - Rutkowski, Leszek
ED  - Scherer, Rafał
ED  - Korytkowski, Marcin
ED  - Pedrycz, Witold
ED  - Tadeusiewicz, Ryszard
ED  - Zurada, Jacek M.
T1  - Adequate basis for the data-driven and machine-learning-based identification
T2  - Artificial intelligence and soft computing
N2  - Process mining (PM) has established itself in recent years as a main method for visualizing and analyzing processes. However, the identification of knowledge has not been addressed adequately because PM aims solely at data-driven discovering, monitoring, and improving real-world processes from event logs available in various information systems. The following paper, therefore, outlines a novel systematic analysis view on tools for data-driven and machine learning (ML)-based identification of knowledge-intensive target processes. To support the effectiveness of the identification process, the main contributions of this study are (1) to design a procedure for a systematic review and analysis for the selection of relevant dimensions, (2) to identify different categories of dimensions as evaluation metrics to select source systems, algorithms, and tools for PM and ML as well as include them in a multi-dimensional grid box model, (3) to select and assess the most relevant dimensions of the model, (4) to identify and assess source systems, algorithms, and tools in order to find evidence for the selected dimensions, and (5) to assess the relevance and applicability of the conceptualization and design procedure for tool selection in data-driven and ML-based process mining research.
KW  - data mining
KW  - knowledge engineering
KW  - various applications
Y1  - 2023
SN  - 978-3-031-42504-2
SN  - 978-3-031-42505-9
U6  - https://doi.org/10.1007/978-3-031-42505-9_48
SP  - 570
EP  - 588
PB  - Springer
CY  - Cham
ER  - 
TY  - THES
A1  - Sakizloglou, Lucas
T1  - Evaluating temporal queries over history-aware architectural runtime models
T1  - Ausführung temporaler Anfragen über geschichtsbewusste Architektur-Laufzeitmodelle
N2  - In model-driven engineering, the adaptation of large software systems with dynamic structure is enabled by architectural runtime models. Such a model represents an abstract state of the system as a graph of interacting components. Every relevant change in the system is mirrored in the model and triggers an evaluation of model queries, which search the model for structural patterns that should be adapted. This thesis focuses on a type of runtime models where the expressiveness of the model and model queries is extended to capture past changes and their timing. These history-aware models and temporal queries enable more informed decision-making during adaptation, as they support the formulation of requirements on the evolution of the pattern that should be adapted. However, evaluating temporal queries during adaptation poses significant challenges. First, it implies the capability to specify and evaluate requirements on the structure, as well as the ordering and timing in which structural changes occur. Then, query answers have to reflect that the history-aware model represents the architecture of a system whose execution may be ongoing, and thus answers may depend on future changes. Finally, query evaluation needs to be adequately fast and memory-efficient despite the increasing size of the history---especially for models that are altered by numerous, rapid changes.

The thesis presents a query language and a querying approach for the specification and evaluation of temporal queries. These contributions aim to cope with the challenges of evaluating temporal queries at runtime, a prerequisite for history-aware architectural monitoring and adaptation which has not been systematically treated by prior model-based solutions. The distinguishing features of our contributions are: the specification of queries based on a temporal logic which encodes structural patterns as graphs; the provision of formally precise query answers which account for timing constraints and ongoing executions; the incremental evaluation which avoids the re-computation of query answers after each change; and the option to discard history that is no longer relevant to queries. The query evaluation searches the model for occurrences of a pattern whose evolution satisfies a temporal logic formula. Therefore, besides model-driven engineering, another related research community is runtime verification. The approach differs from prior logic-based runtime verification solutions by supporting the representation and querying of structure via graphs and graph queries, respectively, which is more efficient for queries with complex patterns. We present a prototypical implementation of the approach and measure its speed and memory consumption in monitoring and adaptation scenarios from two application domains, with executions of an increasing size. We assess scalability by a comparison to the state-of-the-art from both related research communities. The implementation yields promising results, which pave the way for sophisticated history-aware self-adaptation solutions and indicate that the approach constitutes a highly effective technique for runtime monitoring on an architectural level.
N2  - In der modellgetriebenen Entwicklung wird die Adaptation großer Softwaresysteme mit dynamischer Struktur durch Architektur-Laufzeitmodelle ermöglicht. Ein solches Modell stellt einen abstrakten Zustand des Systems als einen Graphen von interagierenden Komponenten dar. Jede relevante Änderung im System spiegelt sich im Modell wider und löst eine Ausführung von Modellanfragen aus, die das Modell nach zu adaptierenden Strukturmustern durchsuchen. Diese Arbeit konzentriert sich auf eine Art von Laufzeitmodellen, bei denen die Ausdruckskraft des Modells und der Modellanfragen erweitert wird, um vergangene Änderungen und deren Zeitpunkt zu erfassen. Diese geschichtsbewussten Modelle und temporalen Anfragen ermöglichen eine fundiertere Entscheidungsfindung während der Adaptation, da sie die Formulierung von Anforderungen an die Entwicklung des Musters, das adaptiert werden soll, unterstützen. Die Ausführung von temporalen Anfragen während der Adaptation stellt jedoch eine große Herausforderung dar. Zunächst müssen Anforderungen an die Struktur sowie an die Reihenfolge und den Zeitpunkt von Strukturänderungen spezifiziert und evaluiert werden. Weiterhin müssen die Antworten auf die Anfragen berücksichtigen, dass das geschichtsbewusste Modell die Architektur eines Systems darstellt, dessen Ausführung fortlaufend sein kann, sodass die Antworten von zukünftigen Änderungen abhängen können. Schließlich muss die Anfrageausführung trotz der zunehmenden Größe der Historie hinreichend schnell und speichereffizient sein---insbesondere bei Modellen, die durch zahlreiche, schnelle Änderungen verändert werden.

In dieser Arbeit werden eine Sprache für die Spezifikation von temporalen Anfragen sowie eine Technik für deren Ausführung vorgestellt. Diese Beiträge zielen darauf ab, die Herausforderungen bei der Ausführung temporaler Anfragen zur Laufzeit zu bewältigen---eine Voraussetzung für ein geschichtsbewusstes Architekturmonitoring und geschichtsbewusste Architekturadaptation, die von früheren modellbasierten Lösungen nicht systematisch behandelt wurde. Die besonderen Merkmale unserer Beiträge sind: die Spezifikation von Anfragen auf der Basis einer temporalen Logik, die strukturelle Muster als Graphen kodiert; die Bereitstellung formal präziser Anfrageantworten, die temporale Einschränkungen und laufende Ausführungen berücksichtigen; die inkrementelle Ausführung, die die Neuberechnung von Abfrageantworten nach jeder Änderung vermeidet; und die Option, Historie zu verwerfen, die für Abfragen nicht mehr relevant ist. Bei der Anfrageausführung wird das Modell nach dem Auftreten eines Musters durchsucht, dessen Entwicklung eine temporallogische Formel erfüllt. Neben der modellgetriebenen Entwicklung ist daher die Laufzeitverifikation ein weiteres verwandtes Forschungsgebiet. Der Ansatz unterscheidet sich von bisherigen logikbasierten Lösungen zur Laufzeitverifikation, indem er die Darstellung und Abfrage von Strukturen über Graphen bzw. Graphanfragen unterstützt, was bei Anfragen mit komplexen Mustern effizienter ist. Wir stellen eine prototypische Implementierung des Ansatzes vor und messen seine Laufzeit und seinen Speicherverbrauch in Monitoring- und Adaptationsszenarien aus zwei Anwendungsdomänen mit Ausführungen von zunehmender Größe. Wir bewerten die Skalierbarkeit durch einen Vergleich mit dem Stand der Technik aus beiden verwandten Forschungsgebieten. Die Implementierung liefert vielversprechende Ergebnisse, die den Weg für anspruchsvolle geschichtsbewusste Selbstadaptationslösungen ebnen und darauf hindeuten, dass der Ansatz eine effektive Technik für das Laufzeitmonitoring auf Architekturebene darstellt.
KW  - architectural adaptation
KW  - history-aware runtime models
KW  - incremental graph query evaluation
KW  - model-driven software engineering
KW  - temporal graph queries
KW  - Architekturadaptation
KW  - geschichtsbewusste Laufzeit-Modelle
KW  - inkrementelle Ausführung von Graphanfragen
KW  - modellgetriebene Softwaretechnik
KW  - temporale Graphanfragen
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-604396
ER  -