TY - JOUR A1 - Bläsius, Thomas A1 - Friedrich, Tobias A1 - Krejca, Martin S. A1 - Molitor, Louise T1 - The impact of geometry on monochrome regions in the flip Schelling process JF - Computational geometry N2 - Schelling's classical segregation model gives a coherent explanation for the wide-spread phenomenon of residential segregation. We introduce an agent-based saturated open-city variant, the Flip Schelling Process (FSP), in which agents, placed on a graph, have one out of two types and, based on the predominant type in their neighborhood, decide whether to change their types; similar to a new agent arriving as soon as another agent leaves the vertex. We investigate the probability that an edge {u,v} is monochrome, i.e., that both vertices u and v have the same type in the FSP, and we provide a general framework for analyzing the influence of the underlying graph topology on residential segregation. In particular, for two adjacent vertices, we show that a highly decisive common neighborhood, i.e., a common neighborhood where the absolute value of the difference between the number of vertices with different types is high, supports segregation and, moreover, that large common neighborhoods are more decisive. As an application, we study the expected behavior of the FSP on two common random graph models with and without geometry: (1) For random geometric graphs, we show that the existence of an edge {u,v} makes a highly decisive common neighborhood for u and v more likely. Based on this, we prove the existence of a constant c>0 such that the expected fraction of monochrome edges after the FSP is at least 1/2+c. (2) For Erdős–Rényi graphs we show that large common neighborhoods are unlikely and that the expected fraction of monochrome edges after the FSP is at most 1/2+o(1). Our results indicate that the cluster structure of the underlying graph has a significant impact on the obtained segregation strength. KW - Agent-based model KW - Schelling segregation KW - Spin system Y1 - 2022 U6 - https://doi.org/10.1016/j.comgeo.2022.101902 SN - 0925-7721 SN - 1879-081X VL - 108 PB - Elsevier CY - Amsterdam ER - TY - JOUR A1 - Ruipérez-Valiente, José A. A1 - Staubitz, Thomas A1 - Jenner, Matt A1 - Halawa, Sherif A1 - Zhang, Jiayin A1 - Despujol, Ignacio A1 - Maldonado-Mahauad, Jorge A1 - Montoro, German A1 - Peffer, Melanie A1 - Rohloff, Tobias A1 - Lane, Jenny A1 - Turro, Carlos A1 - Li, Xitong A1 - Pérez-Sanagustín, Mar A1 - Reich, Justin T1 - Large scale analytics of global and regional MOOC providers: Differences in learners' demographics, preferences, and perceptions JF - Computers & education N2 - Massive Open Online Courses (MOOCs) remarkably attracted global media attention, but the spotlight has been concentrated on a handful of English-language providers. While Coursera, edX, Udacity, and FutureLearn received most of the attention and scrutiny, an entirely new ecosystem of local MOOC providers was growing in parallel. This ecosystem is harder to study than the major players: they are spread around the world, have less staff devoted to maintaining research data, and operate in multiple languages with university and corporate regional partners. To better understand how online learning opportunities are expanding through this regional MOOC ecosystem, we created a research partnership among 15 different MOOC providers from nine countries. We gathered data from over eight million learners in six thousand MOOCs, and we conducted a large-scale survey with more than 10 thousand participants. From our analysis, we argue that these regional providers may be better positioned to meet the goals of expanding access to higher education in their regions than the better-known global providers. To make this claim we highlight three trends: first, regional providers attract a larger local population with more inclusive demographic profiles; second, students predominantly choose their courses based on topical interest, and regional providers do a better job at catering to those needs; and third, many students feel more at ease learning from institutions they already know and have references from. Our work raises the importance of local education in the global MOOC ecosystem, while calling for additional research and conversations across the diversity of MOOC providers. KW - Learning analytics KW - Educational data mining KW - Massive open online courses KW - Large scale analytics KW - Cultural factors KW - Equity KW - Distance learning Y1 - 2022 U6 - https://doi.org/10.1016/j.compedu.2021.104426 SN - 0360-1315 SN - 1873-782X VL - 180 PB - Elsevier CY - Oxford ER - TY - THES A1 - Haskamp, Thomas T1 - Products design organizations T1 - Produkte designen Organisationen BT - how industrial-aged companies accomplish digital product innovation BT - wie etablierte Industrieunternehmen digitale Produktinnovationen erreichen N2 - The automotive industry is a prime example of digital technologies reshaping mobility. Connected, autonomous, shared, and electric (CASE) trends lead to new emerging players that threaten existing industrial-aged companies. To respond, incumbents need to bridge the gap between contrasting product architecture and organizational principles in the physical and digital realms. Over-the-air (OTA) technology, that enables seamless software updates and on-demand feature additions for customers, is an example of CASE-driven digital product innovation. Through an extensive longitudinal case study of an OTA initiative by an industrial- aged automaker, this dissertation explores how incumbents accomplish digital product innovation. Building on modularity, liminality, and the mirroring hypothesis, it presents a process model that explains the triggers, mechanisms, and outcomes of this process. In contrast to the literature, the findings emphasize the primacy of addressing product architecture challenges over organizational ones and highlight the managerial implications for success. N2 - Die Entwicklung neuer digitaler Produktinnovation erfordert in etablierten Industrieunternehmen die Integration von digitalen und physischen Elementen. Dies ist besonders in der Automobilindustrie sichtbar, wo der Trend zu vernetzter, autonomer, gemeinsam genutzter und elektrischer Mobilität zu einem neuen Wettbewerb führt, welcher etablierte Marktteilnehmer bedroht. Diese müssen lernen wie die Integration von gegensätzlichen Produktarchitekturen und Organisationsprinzipien aus der digitalen und physischen Produktentwicklung funktioniert. Die vorliegende Dissertation widmet sich diesem Problem. Basierend auf einer Fallstudie einer digitalen Produktinnovationsinitiative eines Premiummobilitätsanbieters rund um die Integration von Over-the-Air-Technologie für Software-Updates liefert sie wichtige Erkenntnisse. Erstens, etablierte Organisationen müssen Ihre Produktarchitektur befähigen, um verschiedene Produktarchitekturprinzipien in Einklang zu bringen. Zweitens, verschiedene Produktentwicklungsprozesse pro Produktebene müssen aufeinander abgestimmt werden. Drittens, die Organisationsstruktur muss erweitert werden, um die verschiedenen Produktebenen abzubilden. Darüber hinaus müssen auch Ressourcenallokationsprozesse auf die Entwicklungsprozesse abgestimmt werden. Basierend auf diesen Erkenntnissen und mit der bestehenden Fachliteratur wird in der Dissertation ein Prozessmodell entwickelt, welches erklären soll, wie etablierte Industrieunternehmen digitale Produktinnovation erreichen. Kernauslöser sind externer Marktdruck sowie existierende Architekturprinzipien. Wechselseitige Mechanismen wie die Befähigung der Produktarchitektur, die Erweiterung der Organisationstruktur, die Anpassung der Produktentwicklungsprozesse und die Anpassung der Ressourcenallokationsprozesse erklären den Prozess welcher in einer neuen Produktarchitektur sowie einer erweiterten Organisationsstruktur mündet. Der Forschungsbeitrag der Arbeit liegt im Bereich der digitalen Produktinnovation. Sie verlagert den Forschungsfokus auf Fragen der Produktarchitektur und verbindet diese durch Konzepte der Modularität mit organisatorischen Fragestellungen. Für die Praxis ergeben sich vier Hebel die Entscheidungsträger/innen nutzen können, um die Fähigkeiten zur digitalen Produktinnovation zu stärken. KW - digital product innovation KW - digital transformation KW - digital innovation KW - digitale Produktinnovation KW - digitale Transformation KW - digitale Innovation Y1 - 2024 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-646954 ER - TY - THES A1 - Lagodzinski, Julius Albert Gregor T1 - Counting homomorphisms over fields of prime order T1 - Zählen von Homomorphismen über Körper mit Primzahlordnung N2 - Homomorphisms are a fundamental concept in mathematics expressing the similarity of structures. They provide a framework that captures many of the central problems of computer science with close ties to various other fields of science. Thus, many studies over the last four decades have been devoted to the algorithmic complexity of homomorphism problems. Despite their generality, it has been found that non-uniform homomorphism problems, where the target structure is fixed, frequently feature complexity dichotomies. Exploring the limits of these dichotomies represents the common goal of this line of research. We investigate the problem of counting homomorphisms to a fixed structure over a finite field of prime order and its algorithmic complexity. Our emphasis is on graph homomorphisms and the resulting problem #_{p}Hom[H] for a graph H and a prime p. The main research question is how counting over a finite field of prime order affects the complexity. In the first part of this thesis, we tackle the research question in its generality and develop a framework for studying the complexity of counting problems based on category theory. In the absence of problem-specific details, results in the language of category theory provide a clear picture of the properties needed and highlight common ground between different branches of science. The proposed problem #Mor^{C}[B] of counting the number of morphisms to a fixed object B of C is abstract in nature and encompasses important problems like constraint satisfaction problems, which serve as a leading example for all our results. We find explanations and generalizations for a plethora of results in counting complexity. Our main technical result is that specific matrices of morphism counts are non-singular. The strength of this result lies in its algebraic nature. First, our proofs rely on carefully constructed systems of linear equations, which we know to be uniquely solvable. Second, by exchanging the field that the matrix is defined by to a finite field of order p, we obtain analogous results for modular counting. For the latter, cancellations are implied by automorphisms of order p, but intriguingly we find that these present the only obstacle to translating our results from exact counting to modular counting. If we restrict our attention to reduced objects without automorphisms of order p, we obtain results analogue to those for exact counting. This is underscored by a confluent reduction that allows this restriction by constructing a reduced object for any given object. We emphasize the strength of the categorial perspective by applying the duality principle, which yields immediate consequences for the dual problem of counting the number of morphisms from a fixed object. In the second part of this thesis, we focus on graphs and the problem #_{p}Hom[H]. We conjecture that automorphisms of order p capture all possible cancellations and that, for a reduced graph H, the problem #_{p}Hom[H] features the complexity dichotomy analogue to the one given for exact counting by Dyer and Greenhill. This serves as a generalization of the conjecture by Faben and Jerrum for the modulus 2. The criterion for tractability is that H is a collection of complete bipartite and reflexive complete graphs. From the findings of part one, we show that the conjectured dichotomy implies dichotomies for all quantum homomorphism problems, in particular counting vertex surjective homomorphisms and compactions modulo p. Since the tractable cases in the dichotomy are solved by trivial computations, the study of the intractable cases remains. As an initial problem in a series of reductions capable of implying hardness, we employ the problem of counting weighted independent sets in a bipartite graph modulo prime p. A dichotomy for this problem is shown, stating that the trivial cases occurring when a weight is congruent modulo p to 0 are the only tractable cases. We reduce the possible structure of H to the bipartite case by a reduction to the restricted homomorphism problem #_{p}Hom^{bip}[H] of counting modulo p the number of homomorphisms between bipartite graphs that maintain a given order of bipartition. This reduction does not have an impact on the accessibility of the technical results, thanks to the generality of the findings of part one. In order to prove the conjecture, it suffices to show that for a connected bipartite graph that is not complete, #_{p}Hom^{bip}[H] is #_{p}P-hard. Through a rigorous structural study of bipartite graphs, we establish this result for the rich class of bipartite graphs that are (K_{3,3}\{e}, domino)-free. This overcomes in particular the substantial hurdle imposed by squares, which leads us to explore the global structure of H and prove the existence of explicit structures that imply hardness. N2 - Homomorphismen sind ein grundlegendes Konzept der Mathematik, das die Ähnlichkeit von Strukturen ausdrückt. Sie bieten einen Rahmen, der viele der zentralen Probleme der Informatik umfasst und enge Verbindungen zu verschiedenen Wissenschaftsbereichen aufweist. Aus diesem Grund haben sich in den letzten vier Jahrzehnten viele Studien mit der algorithmischen Komplexität von Homomorphismusproblemen beschäftigt. Trotz ihrer Allgemeingültigkeit wurden Komplexitätsdichotomien häufig für nicht-uniforme Homomorphismusprobleme nachgewiesen, bei denen die Zielstruktur fixiert ist. Die Grenzen dieser Dichotomien zu erforschen, ist das gemeinsame Ziel dieses Forschungskalküls. Wir untersuchen das Problem und seine algorithmische Komplexität, Homomorphismen zu einer festen Struktur über einem endlichen Körper mit Primzahlordnung zu zählen. Wir konzentrieren uns auf Graphenhomomorphismen und das daraus resultierende Problem #_{p}Hom[H] für einen Graphen H und eine Primzahl p. Die Hauptforschungsfrage ist, wie das Zählen über einem endlichen Körper mit Primzahlordnung die Komplexität beeinflusst. Im ersten Teil wird die Forschungsfrage in ihrer Allgemeinheit behandelt und ein Rahmen für die Untersuchung der Komplexität von Zählproblemen auf der Grundlage der Kategorientheorie entwickelt. Losgelöst von problemspezifischen Details liefern die Ergebnisse in der Sprache der Kategorientheorie ein klares Bild der benötigten Eigenschaften und zeigen Gemeinsamkeiten zwischen verschiedenen Wissenschaftsgebieten auf. Das vorgeschlagene Problem #Mor^{C}[B] des Zählens der Anzahl von Morphismen zu einem festen Objekt B von C ist abstrakter Natur und umfasst wichtige Probleme wie Constraint Satisfaction Problems, die als leitendes Beispiel für alle unsere Ergebnisse dienen. Wir finden Erklärungen und Verallgemeinerungen für eine Vielzahl von Ergebnissen in der Komplexitätstheorie von Zählproblemen. Unser wichtigstes technisches Ergebnis ist, dass bestimmte Matrizen von Morphismenzahlen nicht singulär sind. Die Stärke dieses Ergebnisses liegt in seiner algebraischen Natur. Erstens basieren unsere Beweise auf sorgfältig konstruierten linearen Gleichungssystemen, von denen wir wissen, dass sie eindeutig lösbar sind. Zweitens, indem wir den Körper, über dem die Matrix definiert ist, durch einen endlichen Körper der Ordnung p ersetzen, erhalten wir analoge Ergebnisse für das modulare Zählen. Für letztere sind Annullierungen durch Automorphismen der Ordnung p impliziert, aber faszinierenderweise stellen diese das einzige Hindernis für die Übertragung unserer Ergebnisse von der exakten auf die modulare Zählung dar. Wenn wir unsere Aufmerksamkeit auf reduzierte Objekte ohne Automorphismen der Ordnung p beschränken, erhalten wir Ergebnisse, die zu denen des exakten Zählens analog sind. Dies wird durch eine konfluente Reduktion unterstrichen, die für jedes beliebige Objekt ein reduziertes Objekt konstruiert. Wir heben die Stärke der kategorialen Perspektive durch die Anwendung des Dualitätsprinzips hervor, das direkte Konsequenzen für das duale Problem des Zählens der Anzahl der Morphismen von einem fixen Objekts aus liefert. Im zweiten Teil konzentrieren wir uns auf Graphen und das Problem #_{p}Hom[H]. Wir stellen die Vermutung auf, dass Automorphismen der Ordnung p alle möglichen Annullierungen erklären und dass das Problem #_{p}Hom[H] für einen reduzierten Graphen H eine Komplexitätsdichotomie analog zu der aufweist, die von Dyer und Greenhill für das exakte Zählen bewiesen wurde. Dies stellt eine Verallgemeinerung der Vermutung von Faben und Jerrum für den Modulus 2 dar. Das Kriterium für die effiziente Lösbarkeit ist, dass H lediglich aus vollständigen bipartiten und reflexiven vollständigen Graphen besteht. Basierend auf den Ergebnisse des ersten Teils zeigen wir, dass die Vermutung Dichotomien für alle Quantenhomomorphismenprobleme impliziert, insbesondere für das Zählen modulo p von Homomorphismen surjektiv auf Knoten und von Verdichtungen. Da die effizient lösbaren Fälle in der Dichotomie durch triviale Berechnungen gelöst werden, bleibt es, die unlösbaren Fälle zu untersuchen. Als erstes Problem in einer Reihe von Reduktionen, deren Ziel es ist, Härte zu implizieren, verwenden wir das Problem des Zählens gewichteter unabhängiger Mengen in einem bipartiten Graphen modulo p. Für dieses Problem beweisen wir eine Dichotomie, die besagt, dass nur die trivialen Fälle effizient lösbar sind. Diese treten auf, wenn ein Gewicht kongruent modulo p zu 0 ist. Durch eine Reduktion auf das eingeschränkte Homomorphismusproblem #_{p}Hom^{bip}[H] reduzieren wir die mögliche Struktur von H auf den bipartiten Fall. Hierbei handelt es sich um das Problem des Zählens modulo p der Homomorphismen zwischen bipartiten Graphen, die eine gegebene Ordnung der Bipartition erhalten. Dank der Allgemeingültigkeit der Ergebnisse des ersten Teils hat diese Reduktion keinen Einfluss auf die Verfügbarkeit der technischen Ergebnisse. Für einen Beweis der Vermutung genügt es zu zeigen, dass #_{p}Hom^{bip}[H] für einen zusammenhängenden und nicht vollständigen bipartiten Graphen #_{p}P-schwer ist. Durch eine rigorose Untersuchung der Struktur von bipartiten Graphen beweisen wir dieses Ergebnis für die umfangreiche Klasse von bipartiten Graphen, die (K_{3,3}\{e}, domino)-frei sind. Dies überwindet insbesondere die substantielle Hürde, die durch Quadrate gegeben ist und uns dazu veranlasst, die globale Struktur von H zu untersuchen und die Existenz expliziter Strukturen zu beweisen, die Härte implizieren. KW - complexity theory KW - (modular) counting KW - relational structures KW - categories KW - homomorphisms KW - Zählen KW - Kategorien KW - Komplexitätstheorie KW - Homomorphismen KW - relationale Strukturen Y1 - 2024 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-646037 ER - TY - JOUR A1 - Essen, Anna A1 - Stern, Ariel Dora A1 - Haase, Christoffer Bjerre A1 - Car, Josip A1 - Greaves, Felix A1 - Paparova, Dragana A1 - Vandeput, Steven A1 - Wehrens, Rik A1 - Bates, David W. T1 - Health app policy BT - international comparison of nine countries' approaches JF - npj digital medicine N2 - An abundant and growing supply of digital health applications (apps) exists in the commercial tech-sector, which can be bewildering for clinicians, patients, and payers. A growing challenge for the health care system is therefore to facilitate the identification of safe and effective apps for health care practitioners and patients to generate the most health benefit as well as guide payer coverage decisions. Nearly all developed countries are attempting to define policy frameworks to improve decision-making, patient care, and health outcomes in this context. This study compares the national policy approaches currently in development/use for health apps in nine countries. We used secondary data, combined with a detailed review of policy and regulatory documents, and interviews with key individuals and experts in the field of digital health policy to collect data about implemented and planned policies and initiatives. We found that most approaches aim for centralized pipelines for health app approvals, although some countries are adding decentralized elements. While the countries studied are taking diverse paths, there is nevertheless broad, international convergence in terms of requirements in the areas of transparency, health content, interoperability, and privacy and security. The sheer number of apps on the market in most countries represents a challenge for clinicians and patients. Our analyses of the relevant policies identified challenges in areas such as reimbursement, safety, and privacy and suggest that more regulatory work is needed in the areas of operationalization, implementation and international transferability of approvals. Cross-national efforts are needed around regulation and for countries to realize the benefits of these technologies. Y1 - 2022 U6 - https://doi.org/10.1038/s41746-022-00573-1 SN - 2398-6352 VL - 5 IS - 1 PB - Macmillan Publishers Limited CY - Basingstoke ER - TY - JOUR A1 - Kühne, Katharina A1 - Herbold, Erika A1 - Bendel, Oliver A1 - Zhou, Yuefang A1 - Fischer, Martin H. T1 - “Ick bin een Berlina” BT - dialect proficiency impacts a robot’s trustworthiness and competence evaluation JF - Frontiers in robotics and AI N2 - Background: Robots are increasingly used as interaction partners with humans. Social robots are designed to follow expected behavioral norms when engaging with humans and are available with different voices and even accents. Some studies suggest that people prefer robots to speak in the user’s dialect, while others indicate a preference for different dialects. Methods: Our study examined the impact of the Berlin dialect on perceived trustworthiness and competence of a robot. One hundred and twenty German native speakers (Mage = 32 years, SD = 12 years) watched an online video featuring a NAO robot speaking either in the Berlin dialect or standard German and assessed its trustworthiness and competence. Results: We found a positive relationship between participants’ self-reported Berlin dialect proficiency and trustworthiness in the dialect-speaking robot. Only when controlled for demographic factors, there was a positive association between participants’ dialect proficiency, dialect performance and their assessment of robot’s competence for the standard German-speaking robot. Participants’ age, gender, length of residency in Berlin, and device used to respond also influenced assessments. Finally, the robot’s competence positively predicted its trustworthiness. Discussion: Our results inform the design of social robots and emphasize the importance of device control in online experiments. KW - competence KW - dialect KW - human-robot interaction KW - robot voice KW - social robot KW - trust Y1 - 2024 U6 - https://doi.org/10.3389/frobt.2023.1241519 SN - 2296-9144 VL - 10 PB - Frontiers Media S.A. CY - Lausanne ER - TY - JOUR A1 - Dressler, Falko A1 - Chiasserini, Carla Fabiana A1 - Fitzek, Frank H. P. A1 - Karl, Holger A1 - Cigno, Renato Lo A1 - Capone, Antonio A1 - Casetti, Claudio A1 - Malandrino, Francesco A1 - Mancuso, Vincenzo A1 - Klingler, Florian A1 - Rizzo, Gianluca T1 - V-Edge BT - virtual edge computing as an enabler for novel microservices and cooperative computing JF - IEEE network N2 - As we move from 5G to 6G, edge computing is one of the concepts that needs revisiting. Its core idea is still intriguing: Instead of sending all data and tasks from an end user's device to the cloud, possibly covering thousands of kilometers and introducing delays lower-bounded by propagation speed, edge servers deployed in close proximity to the user (e.g., at some base station) serve as proxy for the cloud. This is particularly interesting for upcoming machine-learning-based intelligent services, which require substantial computational and networking performance for continuous model training. However, this promising idea is hampered by the limited number of such edge servers. In this article, we discuss a way forward, namely the V-Edge concept. V-Edge helps bridge the gap between cloud, edge, and fog by virtualizing all available resources including the end users' devices and making these resources widely available. Thus, V-Edge acts as an enabler for novel microservices as well as cooperative computing solutions in next-generation networks. We introduce the general V-Edge architecture, and we characterize some of the key research challenges to overcome in order to enable wide-spread and intelligent edge services. KW - Training KW - Performance evaluation KW - Cloud computing KW - Microservice KW - architectures KW - Computer architecture KW - Delays KW - Servers Y1 - 2022 U6 - https://doi.org/10.1109/MNET.001.2100491 SN - 0890-8044 SN - 1558-156X VL - 36 IS - 3 SP - 24 EP - 31 PB - Inst. of Electr. and Electronics Engineers CY - Piscataway ER - TY - JOUR A1 - Ehrig, Lukas A1 - Wagner, Ann-Christin A1 - Wolter, Heike A1 - Correll, Christoph U. A1 - Geisel, Olga A1 - Konigorski, Stefan T1 - FASDetect as a machine learning-based screening app for FASD in youth with ADHD JF - npj Digital Medicine N2 - Fetal alcohol-spectrum disorder (FASD) is underdiagnosed and often misdiagnosed as attention-deficit/hyperactivity disorder (ADHD). Here, we develop a screening tool for FASD in youth with ADHD symptoms. To develop the prediction model, medical record data from a German University outpatient unit are assessed including 275 patients aged 0-19 years old with FASD with or without ADHD and 170 patients with ADHD without FASD aged 0-19 years old. We train 6 machine learning models based on 13 selected variables and evaluate their performance. Random forest models yield the best prediction models with a cross-validated AUC of 0.92 (95% confidence interval [0.84, 0.99]). Follow-up analyses indicate that a random forest model with 6 variables - body length and head circumference at birth, IQ, socially intrusive behaviour, poor memory and sleep disturbance - yields equivalent predictive accuracy. We implement the prediction model in a web-based app called FASDetect - a user-friendly, clinically scalable FASD risk calculator that is freely available at https://fasdetect.dhc-lab.hpi.de. KW - Medical research KW - Psychiatric disorders Y1 - 2023 U6 - https://doi.org/10.1038/s41746-023-00864-1 SN - 2398-6352 VL - 6 IS - 1 PB - Macmillan Publishers Limited CY - Basingstoke ER - TY - JOUR A1 - Slosarek, Tamara A1 - Ibing, Susanne A1 - Schormair, Barbara A1 - Heyne, Henrike A1 - Böttinger, Erwin A1 - Andlauer, Till A1 - Schurmann, Claudia T1 - Implementation and evaluation of personal genetic testing as part of genomics analysis courses in German universities JF - BMC Medical Genomics N2 - Purpose Due to the increasing application of genome analysis and interpretation in medical disciplines, professionals require adequate education. Here, we present the implementation of personal genotyping as an educational tool in two genomics courses targeting Digital Health students at the Hasso Plattner Institute (HPI) and medical students at the Technical University of Munich (TUM). Methods We compared and evaluated the courses and the students ' perceptions on the course setup using questionnaires. Results During the course, students changed their attitudes towards genotyping (HPI: 79% [15 of 19], TUM: 47% [25 of 53]). Predominantly, students became more critical of personal genotyping (HPI: 73% [11 of 15], TUM: 72% [18 of 25]) and most students stated that genetic analyses should not be allowed without genetic counseling (HPI: 79% [15 of 19], TUM: 70% [37 of 53]). Students found the personal genotyping component useful (HPI: 89% [17 of 19], TUM: 92% [49 of 53]) and recommended its inclusion in future courses (HPI: 95% [18 of 19], TUM: 98% [52 of 53]). Conclusion Students perceived the personal genotyping component as valuable in the described genomics courses. The implementation described here can serve as an example for future courses in Europe. KW - Genomics education KW - Personal genotyping KW - Personalized medicine Y1 - 2023 U6 - https://doi.org/10.1186/s12920-023-01503-0 SN - 1755-8794 VL - 16 IS - 1 PB - BMC CY - London ER - TY - THES A1 - Taleb, Aiham T1 - Self-supervised deep learning methods for medical image analysis T1 - Selbstüberwachte Deep Learning Methoden für die medizinische Bildanalyse N2 - Deep learning has seen widespread application in many domains, mainly for its ability to learn data representations from raw input data. Nevertheless, its success has so far been coupled with the availability of large annotated (labelled) datasets. This is a requirement that is difficult to fulfil in several domains, such as in medical imaging. Annotation costs form a barrier in extending deep learning to clinically-relevant use cases. The labels associated with medical images are scarce, since the generation of expert annotations of multimodal patient data at scale is non-trivial, expensive, and time-consuming. This substantiates the need for algorithms that learn from the increasing amounts of unlabeled data. Self-supervised representation learning algorithms offer a pertinent solution, as they allow solving real-world (downstream) deep learning tasks with fewer annotations. Self-supervised approaches leverage unlabeled samples to acquire generic features about different concepts, enabling annotation-efficient downstream task solving subsequently. Nevertheless, medical images present multiple unique and inherent challenges for existing self-supervised learning approaches, which we seek to address in this thesis: (i) medical images are multimodal, and their multiple modalities are heterogeneous in nature and imbalanced in quantities, e.g. MRI and CT; (ii) medical scans are multi-dimensional, often in 3D instead of 2D; (iii) disease patterns in medical scans are numerous and their incidence exhibits a long-tail distribution, so it is oftentimes essential to fuse knowledge from different data modalities, e.g. genomics or clinical data, to capture disease traits more comprehensively; (iv) Medical scans usually exhibit more uniform color density distributions, e.g. in dental X-Rays, than natural images. Our proposed self-supervised methods meet these challenges, besides significantly reducing the amounts of required annotations. We evaluate our self-supervised methods on a wide array of medical imaging applications and tasks. Our experimental results demonstrate the obtained gains in both annotation-efficiency and performance; our proposed methods outperform many approaches from related literature. Additionally, in case of fusion with genetic modalities, our methods also allow for cross-modal interpretability. In this thesis, not only we show that self-supervised learning is capable of mitigating manual annotation costs, but also our proposed solutions demonstrate how to better utilize it in the medical imaging domain. Progress in self-supervised learning has the potential to extend deep learning algorithms application to clinical scenarios. N2 - Deep Learning findet in vielen Bereichen breite Anwendung, vor allem wegen seiner Fähigkeit, Datenrepräsentationen aus rohen Eingabedaten zu lernen. Dennoch war der Erfolg bisher an die Verfügbarkeit großer annotatierter Datensätze geknüpft. Dies ist eine Anforderung, die in verschiedenen Bereichen, z. B. in der medizinischen Bildgebung, schwer zu erfüllen ist. Die Kosten für die Annotation stellen ein Hindernis für die Ausweitung des Deep Learning auf klinisch relevante Anwendungsfälle dar. Die mit medizinischen Bildern verbundenen Annotationen sind rar, da die Erstellung von Experten Annotationen für multimodale Patientendaten in großem Umfang nicht trivial, teuer und zeitaufwändig ist. Dies unterstreicht den Bedarf an Algorithmen, die aus den wachsenden Mengen an unbeschrifteten Daten lernen. Selbstüberwachte Algorithmen für das Repräsentationslernen bieten eine mögliche Lösung, da sie die Lösung realer (nachgelagerter) Deep-Learning-Aufgaben mit weniger Annotationen ermöglichen. Selbstüberwachte Ansätze nutzen unannotierte Stichproben, um generisches Eigenschaften über verschiedene Konzepte zu erlangen und ermöglichen so eine annotationseffiziente Lösung nachgelagerter Aufgaben. Medizinische Bilder stellen mehrere einzigartige und inhärente Herausforderungen für existierende selbstüberwachte Lernansätze dar, die wir in dieser Arbeit angehen wollen: (i) medizinische Bilder sind multimodal, und ihre verschiedenen Modalitäten sind von Natur aus heterogen und in ihren Mengen unausgewogen, z.B. (ii) medizinische Scans sind mehrdimensional, oft in 3D statt in 2D; (iii) Krankheitsmuster in medizinischen Scans sind zahlreich und ihre Häufigkeit weist eine Long-Tail-Verteilung auf, so dass es oft unerlässlich ist, Wissen aus verschiedenen Datenmodalitäten, z. B. Genomik oder klinische Daten, zu verschmelzen, um Krankheitsmerkmale umfassender zu erfassen; (iv) medizinische Scans weisen in der Regel eine gleichmäßigere Farbdichteverteilung auf, z. B. in zahnmedizinischen Röntgenaufnahmen, als natürliche Bilder. Die von uns vorgeschlagenen selbstüberwachten Methoden adressieren diese Herausforderungen und reduzieren zudem die Menge der erforderlichen Annotationen erheblich. Wir evaluieren unsere selbstüberwachten Methoden in verschiedenen Anwendungen und Aufgaben der medizinischen Bildgebung. Unsere experimentellen Ergebnisse zeigen, dass die von uns vorgeschlagenen Methoden sowohl die Effizienz der Annotation als auch die Leistung steigern und viele Ansätze aus der verwandten Literatur übertreffen. Darüber hinaus ermöglichen unsere Methoden im Falle der Fusion mit genetischen Modalitäten auch eine modalübergreifende Interpretierbarkeit. In dieser Arbeit zeigen wir nicht nur, dass selbstüberwachtes Lernen in der Lage ist, die Kosten für manuelle Annotationen zu senken, sondern auch, wie man es in der medizinischen Bildgebung besser nutzen kann. Fortschritte beim selbstüberwachten Lernen haben das Potenzial, die Anwendung von Deep-Learning-Algorithmen auf klinische Szenarien auszuweiten. KW - Artificial Intelligence KW - machine learning KW - unsupervised learning KW - representation learning KW - Künstliche Intelligenz KW - maschinelles Lernen KW - Representationlernen KW - selbstüberwachtes Lernen Y1 - 2024 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-644089 ER - TY - JOUR A1 - Shams, Boshra A1 - Wang, Ziqian A1 - Roine, Timo A1 - Aydogan, Dogu Baran A1 - Vajkoczy, Peter A1 - Lippert, Christoph A1 - Picht, Thomas A1 - Fekonja, Lucius Samo T1 - Machine learning-based prediction of motor status in glioma patients using diffusion MRI metrics along the corticospinal tract JF - Brain communications N2 - Shams et al. report that glioma patients' motor status is predicted accurately by diffusion MRI metrics along the corticospinal tract based on support vector machine method, reaching an overall accuracy of 77%. They show that these metrics are more effective than demographic and clinical variables. Along tract statistics enables white matter characterization using various diffusion MRI metrics. These diffusion models reveal detailed insights into white matter microstructural changes with development, pathology and function. Here, we aim at assessing the clinical utility of diffusion MRI metrics along the corticospinal tract, investigating whether motor glioma patients can be classified with respect to their motor status. We retrospectively included 116 brain tumour patients suffering from either left or right supratentorial, unilateral World Health Organization Grades II, III and IV gliomas with a mean age of 53.51 +/- 16.32 years. Around 37% of patients presented with preoperative motor function deficits according to the Medical Research Council scale. At group level comparison, the highest non-overlapping diffusion MRI differences were detected in the superior portion of the tracts' profiles. Fractional anisotropy and fibre density decrease, apparent diffusion coefficient axial diffusivity and radial diffusivity increase. To predict motor deficits, we developed a method based on a support vector machine using histogram-based features of diffusion MRI tract profiles (e.g. mean, standard deviation, kurtosis and skewness), following a recursive feature elimination method. Our model achieved high performance (74% sensitivity, 75% specificity, 74% overall accuracy and 77% area under the curve). We found that apparent diffusion coefficient, fractional anisotropy and radial diffusivity contributed more than other features to the model. Incorporating the patient demographics and clinical features such as age, tumour World Health Organization grade, tumour location, gender and resting motor threshold did not affect the model's performance, revealing that these features were not as effective as microstructural measures. These results shed light on the potential patterns of tumour-related microstructural white matter changes in the prediction of functional deficits. KW - machine learning KW - support vector machine KW - tractography KW - diffusion MRI; KW - corticospinal tract Y1 - 2022 U6 - https://doi.org/10.1093/braincomms/fcac141 SN - 2632-1297 VL - 4 IS - 3 PB - Oxford University Press CY - Oxford ER - TY - JOUR A1 - Ring, Raphaela M. A1 - Eisenmann, Clemens A1 - Kandil, Farid A1 - Steckhan, Nico A1 - Demmrich, Sarah A1 - Klatte, Caroline A1 - Kessler, Christian S. A1 - Jeitler, Michael A1 - Boschmann, Michael A1 - Michalsen, Andreas A1 - Blakeslee, Sarah B. A1 - Stöckigt, Barbara A1 - Stritter, Wiebke A1 - Koppold-Liebscher, Daniela A. T1 - Mental and behavioural responses to Bahá’í fasting: Looking behind the scenes of a religiously motivated intermittent fast using a mixed methods approach JF - Nutrients N2 - Background/Objective: Historically, fasting has been practiced not only for medical but also for religious reasons. Baha'is follow an annual religious intermittent dry fast of 19 days. We inquired into motivation behind and subjective health impacts of Baha'i fasting. Methods: A convergent parallel mixed methods design was embedded in a clinical single arm observational study. Semi-structured individual interviews were conducted before (n = 7), during (n = 8), and after fasting (n = 8). Three months after the fasting period, two focus group interviews were conducted (n = 5/n = 3). A total of 146 Baha'i volunteers answered an online survey at five time points before, during, and after fasting. Results: Fasting was found to play a central role for the religiosity of interviewees, implying changes in daily structures, spending time alone, engaging in religious practices, and experiencing social belonging. Results show an increase in mindfulness and well-being, which were accompanied by behavioural changes and experiences of self-efficacy and inner freedom. Survey scores point to an increase in mindfulness and well-being during fasting, while stress, anxiety, and fatigue decreased. Mindfulness remained elevated even three months after the fast. Conclusion: Baha'i fasting seems to enhance participants' mindfulness and well-being, lowering stress levels and reducing fatigue. Some of these effects lasted more than three months after fasting. KW - intermittent food restriction KW - mindfulness KW - self-efficacy KW - well-being KW - mixed methods KW - health behaviour KW - coping ability KW - religiously motivated KW - dry fasting Y1 - 2022 U6 - https://doi.org/10.3390/nu14051038 SN - 2072-6643 VL - 14 IS - 5 PB - MDPI CY - Basel ER - TY - BOOK A1 - Kuban, Robert A1 - Rotta, Randolf A1 - Nolte, Jörg A1 - Chromik, Jonas A1 - Beilharz, Jossekin Jakob A1 - Pirl, Lukas A1 - Friedrich, Tobias A1 - Lenzner, Pascal A1 - Weyand, Christopher A1 - Juiz, Carlos A1 - Bermejo, Belen A1 - Sauer, Joao A1 - Coelh, Leandro dos Santos A1 - Najafi, Pejman A1 - Pünter, Wenzel A1 - Cheng, Feng A1 - Meinel, Christoph A1 - Sidorova, Julia A1 - Lundberg, Lars A1 - Vogel, Thomas A1 - Tran, Chinh A1 - Moser, Irene A1 - Grunske, Lars A1 - Elsaid, Mohamed Esameldin Mohamed A1 - Abbas, Hazem M. A1 - Rula, Anisa A1 - Sejdiu, Gezim A1 - Maurino, Andrea A1 - Schmidt, Christopher A1 - Hügle, Johannes A1 - Uflacker, Matthias A1 - Nozza, Debora A1 - Messina, Enza A1 - Hoorn, André van A1 - Frank, Markus A1 - Schulz, Henning A1 - Alhosseini Almodarresi Yasin, Seyed Ali A1 - Nowicki, Marek A1 - Muite, Benson K. A1 - Boysan, Mehmet Can A1 - Bianchi, Federico A1 - Cremaschi, Marco A1 - Moussa, Rim A1 - Abdel-Karim, Benjamin M. A1 - Pfeuffer, Nicolas A1 - Hinz, Oliver A1 - Plauth, Max A1 - Polze, Andreas A1 - Huo, Da A1 - Melo, Gerard de A1 - Mendes Soares, Fábio A1 - Oliveira, Roberto Célio Limão de A1 - Benson, Lawrence A1 - Paul, Fabian A1 - Werling, Christian A1 - Windheuser, Fabian A1 - Stojanovic, Dragan A1 - Djordjevic, Igor A1 - Stojanovic, Natalija A1 - Stojnev Ilic, Aleksandra A1 - Weidmann, Vera A1 - Lowitzki, Leon A1 - Wagner, Markus A1 - Ifa, Abdessatar Ben A1 - Arlos, Patrik A1 - Megia, Ana A1 - Vendrell, Joan A1 - Pfitzner, Bjarne A1 - Redondo, Alberto A1 - Ríos Insua, David A1 - Albert, Justin Amadeus A1 - Zhou, Lin A1 - Arnrich, Bert A1 - Szabó, Ildikó A1 - Fodor, Szabina A1 - Ternai, Katalin A1 - Bhowmik, Rajarshi A1 - Campero Durand, Gabriel A1 - Shevchenko, Pavlo A1 - Malysheva, Milena A1 - Prymak, Ivan A1 - Saake, Gunter ED - Meinel, Christoph ED - Polze, Andreas ED - Beins, Karsten ED - Strotmann, Rolf ED - Seibold, Ulrich ED - Rödszus, Kurt ED - Müller, Jürgen T1 - HPI Future SOC Lab – Proceedings 2019 N2 - The “HPI Future SOC Lab” is a cooperation of the Hasso Plattner Institute (HPI) and industry partners. Its mission is to enable and promote exchange and interaction between the research community and the industry partners. The HPI Future SOC Lab provides researchers with free of charge access to a complete infrastructure of state of the art hard and software. This infrastructure includes components, which might be too expensive for an ordinary research environment, such as servers with up to 64 cores and 2 TB main memory. The offerings address researchers particularly from but not limited to the areas of computer science and business information systems. Main areas of research include cloud computing, parallelization, and In-Memory technologies. This technical report presents results of research projects executed in 2019. Selected projects have presented their results on April 9th and November 12th 2019 at the Future SOC Lab Day events. N2 - Das Future SOC Lab am HPI ist eine Kooperation des Hasso-Plattner-Instituts mit verschiedenen Industriepartnern. Seine Aufgabe ist die Ermöglichung und Förderung des Austausches zwischen Forschungsgemeinschaft und Industrie. Am Lab wird interessierten Wissenschaftlern eine Infrastruktur von neuester Hard- und Software kostenfrei für Forschungszwecke zur Verfügung gestellt. Dazu zählen teilweise noch nicht am Markt verfügbare Technologien, die im normalen Hochschulbereich in der Regel nicht zu finanzieren wären, bspw. Server mit bis zu 64 Cores und 2 TB Hauptspeicher. Diese Angebote richten sich insbesondere an Wissenschaftler in den Gebieten Informatik und Wirtschaftsinformatik. Einige der Schwerpunkte sind Cloud Computing, Parallelisierung und In-Memory Technologien. In diesem Technischen Bericht werden die Ergebnisse der Forschungsprojekte des Jahres 2019 vorgestellt. Ausgewählte Projekte stellten ihre Ergebnisse am 09. April und 12. November 2019 im Rahmen des Future SOC Lab Tags vor. T3 - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 158 KW - Future SOC Lab KW - research projects KW - multicore architectures KW - in-memory technology KW - cloud computing KW - machine learning KW - artifical intelligence KW - Future SOC Lab KW - Forschungsprojekte KW - Multicore Architekturen KW - In-Memory Technologie KW - Cloud Computing KW - maschinelles Lernen KW - künstliche Intelligenz Y1 - 2023 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-597915 SN - 978-3-86956-564-4 SN - 1613-5652 SN - 2191-1665 IS - 158 PB - Universitätsverlag Potsdam CY - Potsdam ER - TY - THES A1 - Richly, Keven T1 - Memory-efficient data management for spatio-temporal applications BT - workload-driven fine-grained configuration optimization for storing spatio-temporal data in columnar In-memory databases N2 - The wide distribution of location-acquisition technologies means that large volumes of spatio-temporal data are continuously being accumulated. Positioning systems such as GPS enable the tracking of various moving objects' trajectories, which are usually represented by a chronologically ordered sequence of observed locations. The analysis of movement patterns based on detailed positional information creates opportunities for applications that can improve business decisions and processes in a broad spectrum of industries (e.g., transportation, traffic control, or medicine). Due to the large data volumes generated in these applications, the cost-efficient storage of spatio-temporal data is desirable, especially when in-memory database systems are used to achieve interactive performance requirements. To efficiently utilize the available DRAM capacities, modern database systems support various tuning possibilities to reduce the memory footprint (e.g., data compression) or increase performance (e.g., additional indexes structures). By considering horizontal data partitioning, we can independently apply different tuning options on a fine-grained level. However, the selection of cost and performance-balancing configurations is challenging, due to the vast number of possible setups consisting of mutually dependent individual decisions. In this thesis, we introduce multiple approaches to improve spatio-temporal data management by automatically optimizing diverse tuning options for the application-specific access patterns and data characteristics. Our contributions are as follows: (1) We introduce a novel approach to determine fine-grained table configurations for spatio-temporal workloads. Our linear programming (LP) approach jointly optimizes the (i) data compression, (ii) ordering, (iii) indexing, and (iv) tiering. We propose different models which address cost dependencies at different levels of accuracy to compute optimized tuning configurations for a given workload, memory budgets, and data characteristics. To yield maintainable and robust configurations, we further extend our LP-based approach to incorporate reconfiguration costs as well as optimizations for multiple potential workload scenarios. (2) To optimize the storage layout of timestamps in columnar databases, we present a heuristic approach for the workload-driven combined selection of a data layout and compression scheme. By considering attribute decomposition strategies, we are able to apply application-specific optimizations that reduce the memory footprint and improve performance. (3) We introduce an approach that leverages past trajectory data to improve the dispatch processes of transportation network companies. Based on location probabilities, we developed risk-averse dispatch strategies that reduce critical delays. (4) Finally, we used the use case of a transportation network company to evaluate our database optimizations on a real-world dataset. We demonstrate that workload-driven fine-grained optimizations allow us to reduce the memory footprint (up to 71% by equal performance) or increase the performance (up to 90% by equal memory size) compared to established rule-based heuristics. Individually, our contributions provide novel approaches to the current challenges in spatio-temporal data mining and database research. Combining them allows in-memory databases to store and process spatio-temporal data more cost-efficiently. N2 - Durch die starke Verbreitung von Systemen zur Positionsbestimmung werden fortlaufend große Mengen an Bewegungsdaten mit einem räumlichen und zeitlichen Bezug gesammelt. Ortungssysteme wie GPS ermöglichen, die Bewegungen verschiedener Objekte (z. B. Personen oder Fahrzeuge) nachzuverfolgen. Diese werden in der Regel durch eine chronologisch geordnete Abfolge beobachteter Aufenthaltsorte repräsentiert. Die Analyse von Bewegungsmustern auf der Grundlage detaillierter Positionsinformationen schafft in unterschiedlichsten Branchen (z. B. Transportwesen, Verkehrssteuerung oder Medizin) die Möglichkeit Geschäftsentscheidungen und -prozesse zu verbessern. Aufgrund der großen Datenmengen, die bei diesen Anwendungen auftreten, stellt die kosteneffiziente Speicherung von Bewegungsdaten eine Herausforderung dar. Dies ist insbesondere der Fall, wenn Hauptspeicherdatenbanken zur Speicherung eingesetzt werden, um die Anforderungen bezüglich interaktiver Antwortzeiten zu erfüllen. Um die verfügbaren Speicherkapazitäten effizient zu nutzen, unterstützen moderne Datenbanksysteme verschiedene Optimierungsmöglichkeiten, um den Speicherbedarf zu reduzieren (z. B. durch Datenkomprimierung) oder die Performance zu erhöhen (z. B. durch Indexstrukturen). Dabei ermöglicht eine horizontale Partitionierung der Daten, dass unabhängig voneinander verschiedene Optimierungen feingranular auf einzelnen Bereichen der Daten angewendet werden können. Die Auswahl von Konfigurationen, die sowohl die Kosten als auch Leistungsanforderungen berücksichtigen, ist jedoch aufgrund der großen Anzahl möglicher Kombinationen -- die aus voneinander abhängigen Einzelentscheidungen bestehen -- komplex. In dieser Dissertation präsentieren wir mehrere Ansätze zur Verbesserung der Datenverwaltung, indem wir die Auswahl verschiedener Datenbankoptimierungen automatisch für die anwendungsspezifischen Zugriffsmuster und Dateneigenschaften anpassen. Diesbezüglich leistet die vorliegende Dissertation die folgenden Beiträge: (1) Wir stellen einen neuen Ansatz vor, um feingranulare Tabellenkonfigurationen für räumlich-zeitliche Workloads zu bestimmen. In diesem Zusammenhang optimiert unser Linear Programming (LP) Ansatz gemeinsam (i) die Datenkompression, (ii) die Sortierung, (iii) die Indizierung und (iv) die Datenplatzierung. Hierzu schlagen wir verschiedene Modelle mit unterschiedlichen Kostenabhängigkeiten vor, um optimierte Konfigurationen für einen gegebenen Workload, ein Speicherbudget und die vorliegenden Dateneigenschaften zu berechnen. Durch die Erweiterung des LP-basierten Ansatzes zur Berücksichtigung von Modifikationskosten und verschiedener potentieller Workloads ist es möglich, die Wartbarkeit und Robustheit der bestimmten Tabellenkonfiguration zu erhöhen. (2) Um die Speicherung von Timestamps in spalten-orientierten Datenbanken zu optimieren, stellen wir einen heuristischen Ansatz für die kombinierte Auswahl eines Speicherlayouts und eines Kompressionsschemas vor. Zudem sind wir durch die Berücksichtigung von Strategien zur Aufteilung von Attributen in der Lage, anwendungsspezifische Optimierungen anzuwenden, die den Speicherbedarf reduzieren und die Performance verbessern. (3) Wir stellen einen Ansatz vor, der in der Vergangenheit beobachtete Bewegungsmuster nutzt, um die Zuweisungsprozesse von Vermittlungsdiensten zur Personenbeförderung zu verbessern. Auf der Grundlage von Standortwahrscheinlichkeiten haben wir verschiedene Strategien für die Vergabe von Fahraufträgen an Fahrer entwickelt, die kritische Verspätungen reduzieren. (4) Abschließend haben wir unsere Datenbankoptimierungen anhand eines realen Datensatzes eines Transportdienstleisters evaluiert. In diesem Zusammenhang zeigen wir, dass wir durch feingranulare workload-basierte Optimierungen den Speicherbedarf (um bis zu 71% bei vergleichbarer Performance) reduzieren oder die Performance (um bis zu 90% bei gleichem Speicherverbrauch) im Vergleich zu regelbasierten Heuristiken verbessern können. Die einzelnen Beiträge stellen neuartige Ansätze für aktuelle Herausforderungen im Bereich des Data Mining und der Datenbankforschung dar. In Kombination ermöglichen sie eine kosteneffizientere Speicherung und Verarbeitung von Bewegungsdaten in Hauptspeicherdatenbanken. KW - spatio-temporal data management KW - trajectory data KW - columnar databases KW - in-memory data management KW - database tuning KW - spaltenorientierte Datenbanken KW - Datenbankoptimierung KW - Hauptspeicher Datenmanagement KW - Datenverwaltung für Daten mit räumlich-zeitlichem Bezug KW - Trajektoriendaten Y1 - 2024 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-635473 ER - TY - JOUR A1 - Rosin, Paul L. A1 - Lai, Yu-Kun A1 - Mould, David A1 - Yi, Ran A1 - Berger, Itamar A1 - Doyle, Lars A1 - Lee, Seungyong A1 - Li, Chuan A1 - Liu, Yong-Jin A1 - Semmo, Amir A1 - Shamir, Ariel A1 - Son, Minjung A1 - Winnemöller, Holger T1 - NPRportrait 1.0: A three-level benchmark for non-photorealistic rendering of portraits JF - Computational visual media N2 - Recently, there has been an upsurge of activity in image-based non-photorealistic rendering (NPR), and in particular portrait image stylisation, due to the advent of neural style transfer (NST). However, the state of performance evaluation in this field is poor, especially compared to the norms in the computer vision and machine learning communities. Unfortunately, the task of evaluating image stylisation is thus far not well defined, since it involves subjective, perceptual, and aesthetic aspects. To make progress towards a solution, this paper proposes a new structured, three-level, benchmark dataset for the evaluation of stylised portrait images. Rigorous criteria were used for its construction, and its consistency was validated by user studies. Moreover, a new methodology has been developed for evaluating portrait stylisation algorithms, which makes use of the different benchmark levels as well as annotations provided by user studies regarding the characteristics of the faces. We perform evaluation for a wide variety of image stylisation methods (both portrait-specific and general purpose, and also both traditional NPR approaches and NST) using the new benchmark dataset. KW - non-photorealistic rendering (NPR) KW - image stylization KW - style transfer KW - portrait KW - evaluation KW - benchmark Y1 - 2022 U6 - https://doi.org/10.1007/s41095-021-0255-3 SN - 2096-0433 SN - 2096-0662 VL - 8 IS - 3 SP - 445 EP - 465 PB - Springer Nature CY - London ER - TY - JOUR A1 - Vitagliano, Gerardo A1 - Hameed, Mazhar A1 - Jiang, Lan A1 - Reisener, Lucas A1 - Wu, Eugene A1 - Naumann, Felix T1 - Pollock: a data loading benchmark JF - Proceedings of the VLDB Endowment N2 - Any system at play in a data-driven project has a fundamental requirement: the ability to load data. The de-facto standard format to distribute and consume raw data is CSV. Yet, the plain text and flexible nature of this format make such files often difficult to parse and correctly load their content, requiring cumbersome data preparation steps. We propose a benchmark to assess the robustness of systems in loading data from non-standard CSV formats and with structural inconsistencies. First, we formalize a model to describe the issues that affect real-world files and use it to derive a systematic lpollutionz process to generate dialects for any given grammar. Our benchmark leverages the pollution framework for the csv format. To guide pollution, we have surveyed thousands of real-world, publicly available csv files, recording the problems we encountered. We demonstrate the applicability of our benchmark by testing and scoring 16 different systems: popular csv parsing frameworks, relational database tools, spreadsheet systems, and a data visualization tool. Y1 - 2023 U6 - https://doi.org/10.14778/3594512.3594518 SN - 2150-8097 VL - 16 IS - 8 SP - 1870 EP - 1882 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Wiemker, Veronika A1 - Bunova, Anna A1 - Neufeld, Maria A1 - Gornyi, Boris A1 - Yurasova, Elena A1 - Konigorski, Stefan A1 - Kalinina, Anna A1 - Kontsevaya, Anna A1 - Ferreira-Borges, Carina A1 - Probst, Charlotte T1 - Pilot study to evaluate usability and acceptability of the 'Animated Alcohol Assessment Tool' in Russian primary healthcare JF - Digital health N2 - Background and aims: Accurate and user-friendly assessment tools quantifying alcohol consumption are a prerequisite to effective prevention and treatment programmes, including Screening and Brief Intervention. Digital tools offer new potential in this field. We developed the ‘Animated Alcohol Assessment Tool’ (AAA-Tool), a mobile app providing an interactive version of the World Health Organization's Alcohol Use Disorders Identification Test (AUDIT) that facilitates the description of individual alcohol consumption via culturally informed animation features. This pilot study evaluated the Russia-specific version of the Animated Alcohol Assessment Tool with regard to (1) its usability and acceptability in a primary healthcare setting, (2) the plausibility of its alcohol consumption assessment results and (3) the adequacy of its Russia-specific vessel and beverage selection. Methods: Convenience samples of 55 patients (47% female) and 15 healthcare practitioners (80% female) in 2 Russian primary healthcare facilities self-administered the Animated Alcohol Assessment Tool and rated their experience on the Mobile Application Rating Scale – User Version. Usage data was automatically collected during app usage, and additional feedback on regional content was elicited in semi-structured interviews. Results: On average, patients completed the Animated Alcohol Assessment Tool in 6:38 min (SD = 2.49, range = 3.00–17.16). User satisfaction was good, with all subscale Mobile Application Rating Scale – User Version scores averaging >3 out of 5 points. A majority of patients (53%) and practitioners (93%) would recommend the tool to ‘many people’ or ‘everyone’. Assessed alcohol consumption was plausible, with a low number (14%) of logically impossible entries. Most patients reported the Animated Alcohol Assessment Tool to reflect all vessels (78%) and all beverages (71%) they typically used. Conclusion: High acceptability ratings by patients and healthcare practitioners, acceptable completion time, plausible alcohol usage assessment results and perceived adequacy of region-specific content underline the Animated Alcohol Assessment Tool's potential to provide a novel approach to alcohol assessment in primary healthcare. After its validation, the Animated Alcohol Assessment Tool might contribute to reducing alcohol-related harm by facilitating Screening and Brief Intervention implementation in Russia and beyond. KW - Alcohol use assessment KW - Alcohol Use Disorders Identification Test KW - screening tools KW - digital health KW - mobile applications KW - Russia KW - primary healthcare KW - usability KW - acceptability Y1 - 2022 U6 - https://doi.org/10.1177/20552076211074491 SN - 2055-2076 VL - 8 PB - Sage Publications CY - London ER - TY - JOUR A1 - Fehr, Jana A1 - Piccininni, Marco A1 - Kurth, Tobias A1 - Konigorski, Stefan T1 - Assessing the transportability of clinical prediction models for cognitive impairment using causal models JF - BMC medical research methodology N2 - Background Machine learning models promise to support diagnostic predictions, but may not perform well in new settings. Selecting the best model for a new setting without available data is challenging. We aimed to investigate the transportability by calibration and discrimination of prediction models for cognitive impairment in simulated external settings with different distributions of demographic and clinical characteristics. Methods We mapped and quantified relationships between variables associated with cognitive impairment using causal graphs, structural equation models, and data from the ADNI study. These estimates were then used to generate datasets and evaluate prediction models with different sets of predictors. We measured transportability to external settings under guided interventions on age, APOE & epsilon;4, and tau-protein, using performance differences between internal and external settings measured by calibration metrics and area under the receiver operating curve (AUC). Results Calibration differences indicated that models predicting with causes of the outcome were more transportable than those predicting with consequences. AUC differences indicated inconsistent trends of transportability between the different external settings. Models predicting with consequences tended to show higher AUC in the external settings compared to internal settings, while models predicting with parents or all variables showed similar AUC. Conclusions We demonstrated with a practical prediction task example that predicting with causes of the outcome results in better transportability compared to anti-causal predictions when considering calibration differences. We conclude that calibration performance is crucial when assessing model transportability to external settings. KW - Alzheimer's Disease KW - Clinical risk prediction KW - DAG KW - Causality; KW - Transportability Y1 - 2023 U6 - https://doi.org/10.1186/s12874-023-02003-6 SN - 1471-2288 VL - 23 IS - 1 PB - BMC CY - London ER - TY - JOUR A1 - Garrels, Tim A1 - Khodabakhsh, Athar A1 - Renard, Bernhard Y. A1 - Baum, Katharina T1 - LazyFox: fast and parallelized overlapping community detection in large graphs JF - PEERJ Computer Science N2 - The detection of communities in graph datasets provides insight about a graph's underlying structure and is an important tool for various domains such as social sciences, marketing, traffic forecast, and drug discovery. While most existing algorithms provide fast approaches for community detection, their results usually contain strictly separated communities. However, most datasets would semantically allow for or even require overlapping communities that can only be determined at much higher computational cost. We build on an efficient algorithm, FOX, that detects such overlapping communities. FOX measures the closeness of a node to a community by approximating the count of triangles which that node forms with that community. We propose LAZYFOX, a multi-threaded adaptation of the FOX algorithm, which provides even faster detection without an impact on community quality. This allows for the analyses of significantly larger and more complex datasets. LAZYFOX enables overlapping community detection on complex graph datasets with millions of nodes and billions of edges in days instead of weeks. As part of this work, LAZYFOX's implementation was published and is available as a tool under an MIT licence at https://github.com/TimGarrels/LazyFox. KW - Overlapping community detection KW - Large networks KW - Weighted clustering coefficient KW - Heuristic triangle estimation KW - Parallelized algorithm KW - C++ tool KW - Runtime improvement KW - Open source KW - Graph algorithm KW - Community analysis Y1 - 2023 U6 - https://doi.org/10.7717/peerj-cs.1291 SN - 2376-5992 VL - 9 PB - PeerJ Inc. CY - London ER - TY - JOUR A1 - Kappattanavar, Arpita Mallikarjuna A1 - Hecker, Pascal A1 - Moontaha, Sidratul A1 - Steckhan, Nico A1 - Arnrich, Bert T1 - Food choices after cognitive load BT - an affective computing approach JF - Sensors N2 - Psychology and nutritional science research has highlighted the impact of negative emotions and cognitive load on calorie consumption behaviour using subjective questionnaires. Isolated studies in other domains objectively assess cognitive load without considering its effects on eating behaviour. This study aims to explore the potential for developing an integrated eating behaviour assistant system that incorporates cognitive load factors. Two experimental sessions were conducted using custom-developed experimentation software to induce different stimuli. During these sessions, we collected 30 h of physiological, food consumption, and affective states questionnaires data to automatically detect cognitive load and analyse its effect on food choice. Utilising grid search optimisation and leave-one-subject-out cross-validation, a support vector machine model achieved a mean classification accuracy of 85.12% for the two cognitive load tasks using eight relevant features. Statistical analysis was performed on calorie consumption and questionnaire data. Furthermore, 75% of the subjects with higher negative affect significantly increased consumption of specific foods after high-cognitive-load tasks. These findings offer insights into the intricate relationship between cognitive load, affective states, and food choice, paving the way for an eating behaviour assistant system to manage food choices during cognitive load. Future research should enhance system capabilities and explore real-world applications. KW - cognitive load KW - eating behaviour KW - machine learning KW - physiological signals KW - photoplethysmography KW - electrodermal activity KW - sensors Y1 - 2023 U6 - https://doi.org/10.3390/s23146597 SN - 1424-8220 VL - 23 IS - 14 PB - MDPI CY - Basel ER - TY - JOUR A1 - Cohen, Sarel A1 - Hershcovitch, Moshik A1 - Taraz, Martin A1 - Kissig, Otto A1 - Issac, Davis A1 - Wood, Andrew A1 - Waddington, Daniel A1 - Chin, Peter A1 - Friedrich, Tobias T1 - Improved and optimized drug repurposing for the SARS-CoV-2 pandemic JF - PLoS one N2 - The active global SARS-CoV-2 pandemic caused more than 426 million cases and 5.8 million deaths worldwide. The development of completely new drugs for such a novel disease is a challenging, time intensive process. Despite researchers around the world working on this task, no effective treatments have been developed yet. This emphasizes the importance of drug repurposing, where treatments are found among existing drugs that are meant for different diseases. A common approach to this is based on knowledge graphs, that condense relationships between entities like drugs, diseases and genes. Graph neural networks (GNNs) can then be used for the task at hand by predicting links in such knowledge graphs. Expanding on state-of-the-art GNN research, Doshi et al. recently developed the Dr-COVID model. We further extend their work using additional output interpretation strategies. The best aggregation strategy derives a top-100 ranking of 8,070 candidate drugs, 32 of which are currently being tested in COVID-19-related clinical trials. Moreover, we present an alternative application for the model, the generation of additional candidates based on a given pre-selection of drug candidates using collaborative filtering. In addition, we improved the implementation of the Dr-COVID model by significantly shortening the inference and pre-processing time by exploiting data-parallelism. As drug repurposing is a task that requires high computation and memory resources, we further accelerate the post-processing phase using a new emerging hardware-we propose a new approach to leverage the use of high-capacity Non-Volatile Memory for aggregate drug ranking. Y1 - 2023 U6 - https://doi.org/10.1371/journal.pone.0266572 SN - 1932-6203 VL - 18 IS - 3 PB - PLoS CY - San Fransisco ER - TY - JOUR A1 - Piro, Vitor C. A1 - Renard, Bernhard Y. T1 - Contamination detection and microbiome exploration with GRIMER JF - GigaScience N2 - Background: Contamination detection is a important step that should be carefully considered in early stages when designing and performing microbiome studies to avoid biased outcomes. Detecting and removing true contaminants is challenging, especially in low-biomass samples or in studies lacking proper controls. Interactive visualizations and analysis platforms are crucial to better guide this step, to help to identify and detect noisy patterns that could potentially be contamination. Additionally, external evidence, like aggregation of several contamination detection methods and the use of common contaminants reported in the literature, could help to discover and mitigate contamination. Results: We propose GRIMER, a tool that performs automated analyses and generates a portable and interactive dashboard integrating annotation, taxonomy, and metadata. It unifies several sources of evidence to help detect contamination. GRIMER is independent of quantification methods and directly analyzes contingency tables to create an interactive and offline report. Reports can be created in seconds and are accessible for nonspecialists, providing an intuitive set of charts to explore data distribution among observations and samples and its connections with external sources. Further, we compiled and used an extensive list of possible external contaminant taxa and common contaminants with 210 genera and 627 species reported in 22 published articles. Conclusion: GRIMER enables visual data exploration and analysis, supporting contamination detection in microbiome studies. The tool and data presented are open source and available at https://gitlab.com/dacs-hpi/grimer. KW - Contamination KW - Microbiome KW - Visualization KW - Taxonomy Y1 - 2023 U6 - https://doi.org/10.1093/gigascience/giad017 SN - 2047-217X VL - 12 PB - Oxford Univ. Press CY - Oxford ER - TY - JOUR A1 - Gärtner, Thomas A1 - Schneider, Juliana A1 - Arnrich, Bert A1 - Konigorski, Stefan T1 - Comparison of Bayesian Networks, G-estimation and linear models to estimate causal treatment effects in aggregated N-of-1 trials with carry-over effects JF - BMC Medical Research Methodology N2 - Background The aggregation of a series of N-of-1 trials presents an innovative and efficient study design, as an alternative to traditional randomized clinical trials. Challenges for the statistical analysis arise when there is carry-over or complex dependencies of the treatment effect of interest. Methods In this study, we evaluate and compare methods for the analysis of aggregated N-of-1 trials in different scenarios with carry-over and complex dependencies of treatment effects on covariates. For this, we simulate data of a series of N-of-1 trials for Chronic Nonspecific Low Back Pain based on assumed causal relationships parameterized by directed acyclic graphs. In addition to existing statistical methods such as regression models, Bayesian Networks, and G-estimation, we introduce a carry-over adjusted parametric model (COAPM). Results The results show that all evaluated existing models have a good performance when there is no carry-over and no treatment dependence. When there is carry-over, COAPM yields unbiased and more efficient estimates while all other methods show some bias in the estimation. When there is known treatment dependence, all approaches that are capable to model it yield unbiased estimates. Finally, the efficiency of all methods decreases slightly when there are missing values, and the bias in the estimates can also increase. Conclusions This study presents a systematic evaluation of existing and novel approaches for the statistical analysis of a series of N-of-1 trials. We derive practical recommendations which methods may be best in which scenarios. KW - N-of-1 trials KW - Randomized clinical trials KW - Bayesian Networks; KW - G-estimation KW - Linear model KW - Simulation study KW - Chronic Nonspecific Low KW - Back Pain Y1 - 2023 U6 - https://doi.org/10.1186/s12874-023-02012-5 SN - 1471-2288 VL - 23 IS - 1 PB - BMC CY - London ER - TY - JOUR A1 - Lewkowicz, Daniel A1 - Böttinger, Erwin A1 - Siegel, Martin T1 - Economic evaluation of digital therapeutic care apps for unsupervised treatment of low back pain BT - Monte Carlo Simulation JF - JMIR mhealth and uhealth N2 - Background: Digital therapeutic care (DTC) programs are unsupervised app-based treatments that provide video exercises and educational material to patients with nonspecific low back pain during episodes of pain and functional disability. German statutory health insurance can reimburse DTC programs since 2019, but evidence on efficacy and reasonable pricing remains scarce. This paper presents a probabilistic sensitivity analysis (PSA) to evaluate the efficacy and cost-utility of a DTC app against treatment as usual (TAU) in Germany. Objective: The aim of this study was to perform a PSA in the form of a Monte Carlo simulation based on the deterministic base case analysis to account for model assumptions and parameter uncertainty. We also intend to explore to what extent the results in this probabilistic analysis differ from the results in the base case analysis and to what extent a shortage of outcome data concerning quality-of-life (QoL) metrics impacts the overall results. Methods: The PSA builds upon a state-transition Markov chain with a 4-week cycle length over a model time horizon of 3 years from a recently published deterministic cost-utility analysis. A Monte Carlo simulation with 10,000 iterations and a cohort size of 10,000 was employed to evaluate the cost-utility from a societal perspective. Quality-adjusted life years (QALYs) were derived from Veterans RAND 6-Dimension (VR-6D) and Short-Form 6-Dimension (SF-6D) single utility scores. Finally, we also simulated reducing the price for a 3-month app prescription to analyze at which price threshold DTC would result in being the dominant strategy over TAU in Germany. Results: The Monte Carlo simulation yielded on average a euro135.97 (a currency exchange rate of EUR euro1=US $1.069 is applicable) incremental cost and 0.004 incremental QALYs per person and year for the unsupervised DTC app strategy compared to in-person physiotherapy in Germany. The corresponding incremental cost-utility ratio (ICUR) amounts to an additional euro34,315.19 per additional QALY. DTC yielded more QALYs in 54.96% of the iterations. DTC dominates TAU in 24.04% of the iterations for QALYs. Reducing the app price in the simulation from currently euro239.96 to euro164.61 for a 3-month prescription could yield a negative ICUR and thus make DTC the dominant strategy, even though the estimated probability of DTC being more effective than TAU is only 54.96%. Conclusions: Decision-makers should be cautious when considering the reimbursement of DTC apps since no significant treatment effect was found, and the probability of cost-effectiveness remains below 60% even for an infinite willingness-to-pay threshold. More app-based studies involving the utilization of QoL outcome parameters are urgently needed to account for the low and limited precision of the available QoL input parameters, which are crucial to making profound recommendations concerning the cost-utility of novel apps. KW - cost-utility analysis KW - cost KW - probabilistic sensitivity analysis KW - Monte Carlo simulation KW - low back pain KW - pain KW - economic KW - cost-effectiveness KW - Markov model KW - digital therapy KW - digital health app KW - mHealth KW - mobile health KW - health app KW - mobile app KW - orthopedic KW - QUALY KW - DALY KW - quality-adjusted life years KW - disability-adjusted life years KW - time horizon KW - veteran KW - statistics Y1 - 2023 U6 - https://doi.org/10.2196/44585 SN - 2291-5222 VL - 11 PB - JMIR Publications CY - Toronto ER - TY - JOUR A1 - Moontaha, Sidratul A1 - Schumann, Franziska Elisabeth Friederike A1 - Arnrich, Bert T1 - Online learning for wearable EEG-Based emotion classification JF - Sensors N2 - Giving emotional intelligence to machines can facilitate the early detection and prediction of mental diseases and symptoms. Electroencephalography (EEG)-based emotion recognition is widely applied because it measures electrical correlates directly from the brain rather than indirect measurement of other physiological responses initiated by the brain. Therefore, we used non-invasive and portable EEG sensors to develop a real-time emotion classification pipeline. The pipeline trains different binary classifiers for Valence and Arousal dimensions from an incoming EEG data stream achieving a 23.9% (Arousal) and 25.8% (Valence) higher F1-Score on the state-of-art AMIGOS dataset than previous work. Afterward, the pipeline was applied to the curated dataset from 15 participants using two consumer-grade EEG devices while watching 16 short emotional videos in a controlled environment. Mean F1-Scores of 87% (Arousal) and 82% (Valence) were achieved for an immediate label setting. Additionally, the pipeline proved to be fast enough to achieve predictions in real-time in a live scenario with delayed labels while continuously being updated. The significant discrepancy from the readily available labels on the classification scores leads to future work to include more data. Thereafter, the pipeline is ready to be used for real-time applications of emotion classification. KW - online learning KW - real-time KW - emotion classification KW - AMIGOS dataset KW - wearable EEG (muse and neurosity crown) KW - psychopy experiments Y1 - 2023 U6 - https://doi.org/10.3390/s23052387 SN - 1424-8220 VL - 23 IS - 5 PB - MDPI CY - Basel ER - TY - JOUR A1 - Kirchler, Matthias A1 - Konigorski, Stefan A1 - Norden, Matthias A1 - Meltendorf, Christian A1 - Kloft, Marius A1 - Schurmann, Claudia A1 - Lippert, Christoph T1 - transferGWAS BT - GWAS of images using deep transfer learning JF - Bioinformatics N2 - Motivation: Medical images can provide rich information about diseases and their biology. However, investigating their association with genetic variation requires non-standard methods. We propose transferGWAS, a novel approach to perform genome-wide association studies directly on full medical images. First, we learn semantically meaningful representations of the images based on a transfer learning task, during which a deep neural network is trained on independent but similar data. Then, we perform genetic association tests with these representations. Results: We validate the type I error rates and power of transferGWAS in simulation studies of synthetic images. Then we apply transferGWAS in a genome-wide association study of retinal fundus images from the UK Biobank. This first-of-a-kind GWAS of full imaging data yielded 60 genomic regions associated with retinal fundus images, of which 7 are novel candidate loci for eye-related traits and diseases. Y1 - 2022 U6 - https://doi.org/10.1093/bioinformatics/btac369 SN - 1367-4803 SN - 1460-2059 VL - 38 IS - 14 SP - 3621 EP - 3628 PB - Oxford Univ. Press CY - Oxford ER - TY - THES A1 - Lorson, Annalena T1 - Understanding early stage evolution of digital innovation units in manufacturing companies T1 - Verständnis der frühphasigen Entwicklung digitaler Innovationseinheiten in Fertigungsunternehmen N2 - The dynamic landscape of digital transformation entails an impact on industrial-age manufacturing companies that goes beyond product offerings, changing operational paradigms, and requiring an organization-wide metamorphosis. An initiative to address the given challenges is the creation of Digital Innovation Units (DIUs) – departments or distinct legal entities that use new structures and practices to develop digital products, services, and business models and support or drive incumbents’ digital transformation. With more than 300 units in German-speaking countries alone and an increasing number of scientific publications, DIUs have become a widespread phenomenon in both research and practice. This dissertation examines the evolution process of DIUs in the manufacturing industry during their first three years of operation, through an extensive longitudinal single-case study and several cross-case syntheses of seven DIUs. Building on the lenses of organizational change and development, time, and socio-technical systems, this research provides insights into the fundamentals, temporal dynamics, socio-technical interactions, and relational dynamics of a DIU’s evolution process. Thus, the dissertation promotes a dynamic understanding of DIUs and adds a two-dimensional perspective to the often one-dimensional view of these units and their interactions with the main organization throughout the startup and growth phases of a DIU. Furthermore, the dissertation constructs a phase model that depicts the early stages of DIU evolution based on these findings and by incorporating literature from information systems research. As a result, it illustrates the progressive intensification of collaboration between the DIU and the main organization. After being implemented, the DIU sparks initial collaboration and instigates change within (parts of) the main organization. Over time, it adapts to the corporate environment to some extent, responding to changing circumstances in order to contribute to long-term transformation. Temporally, the DIU drives the early phases of cooperation and adaptation in particular, while the main organization triggers the first major evolutionary step and realignment of the DIU. Overall, the thesis identifies DIUs as malleable organizational structures that are crucial for digital transformation. Moreover, it provides guidance for practitioners on the process of building a new DIU from scratch or optimizing an existing one. N2 - Die digitale Transformation produzierender Unternehmen geht über die bloße Veränderung des Produktangebots hinaus; sie durchdringt operative Paradigmen und erfordert eine umfassende, unternehmensweite Metamorphose. Eine Initiative, den damit verbundenen Herausforderungen zu begegnen, ist der Aufbau einer Digital Innovation Unit (DIU) (zu deutsch: digitale Innovationseinheit) – eine Abteilung oder separate rechtliche Einheit, die neue organisationale Strukturen und Arbeitspraktiken nutzt, um digitale Produkte, Dienstleistungen und Geschäftsmodelle zu entwickeln und die digitale Transformation von etabliertenUnternehmen zu unterstützen oder voranzutreiben. Mit mehr als 300 Einheitenallein im deutschsprachigen Raum und einer wachsenden Zahl wissenschaftlicher Publikationen sind DIUs sowohl in der Forschung als auch in der Praxis ein weit verbreitetes Phänomen. Auf Basis einer umfassenden Längsschnittstudie und mehrerer Querschnittsanalysen von sieben Fertigungsunternehmen und ihren DIUs untersucht diese Dissertation den Entwicklungsprozess von DIUs in den ersten drei Betriebsjahren. Gestützt auf theoretische Perspektiven zu organisatorischem Wandel, Zeit und sozio-technischen Systemen bietet sie Einblicke in die Grundlagen, die zeitlichen Dynamiken, die sozio-technischen Interaktionen und die Beziehungsdynamiken des Entwicklungsprozesses von DIUs. Die Dissertation erweitert somit das dynamische Verständnis von DIUs und fügt der oft eindimensionalen Sichtweise auf diese Einheiten und ihre Interaktionen mit der Hauptorganisation eine zweidimensionale Perspektive entlang der Gründungs- und Wachstumsphasen einer DIU hinzu. Darüber hinaus konstruiert die Dissertation ein Phasenmodell, das die frühen Phasen der DIU-Entwicklung auf der Grundlage dieser Erkenntnisse und unter Einbeziehung von Literatur aus der Wirtschaftsinformatikforschung abbildet. Es veranschaulicht die schrittweise Intensivierung der Zusammenarbeit zwischen der DIU und der Hauptorganisation. Nach ihrer Implementierung initiiert die DIU die anfängliche Zusammenarbeit und stößt Veränderungen innerhalb (von Teilen) der Hauptorganisation an. Im Laufe der Zeit passt sich die DIU bis zu einem gewissen Grad dem Unternehmensumfeld an und reagiert auf sich verändernde Umstände, um zu einer langfristigen Veränderung beizutragen. Zeitlich gesehen treibt die DIU vor allem die frühen Phasen der Zusammenarbeit und Anpassung voran, während die Hauptorganisation den ersten großen Entwicklungsschritt und die Neuausrichtung der DIU auslöst. Insgesamt identifiziert die Dissertation DIUs als anpassungsfähige Organisationsstrukturen, die für die digitale Transformation entscheidend sind. Darüber hinaus bietet sie Praktikern einen Leitfaden für den Aufbau einer neuen oder die Optimierung einer bestehenden DIU. KW - digital transformation KW - digital innovation units KW - evolution of digital innovation units KW - manufacturing companies KW - digitale Transformation KW - digitale Innovationseinheit KW - Entwicklung digitaler Innovationseinheiten KW - Fertigungsunternehmen Y1 - 2024 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-639141 ER - TY - JOUR A1 - Cseh, Agnes A1 - Faenza, Yuri A1 - Kavitha, Telikepalli A1 - Powers, Vladlena T1 - Understanding popular matchings via stable matchings JF - SIAM journal on discrete mathematics N2 - An instance of the marriage problem is given by a graph G = (A boolean OR B, E), together with, for each vertex of G, a strict preference order over its neighbors. A matching M of G is popular in the marriage instance if M does not lose a head-to-head election against any matching where vertices are voters. Every stable matching is a min-size popular matching; another subclass of popular matchings that always exists and can be easily computed is the set of dominant matchings. A popular matching M is dominant if M wins the head-to-head election against any larger matching. Thus, every dominant matching is a max-size popular matching, and it is known that the set of dominant matchings is the linear image of the set of stable matchings in an auxiliary graph. Results from the literature seem to suggest that stable and dominant matchings behave, from a complexity theory point of view, in a very similar manner within the class of popular matchings. The goal of this paper is to show that there are instead differences in the tractability of stable and dominant matchings and to investigate further their importance for popular matchings. First, we show that it is easy to check if all popular matchings are also stable; however, it is co-NP hard to check if all popular matchings are also dominant. Second, we show how some new and recent hardness results on popular matching problems can be deduced from the NP-hardness of certain problems on stable matchings, also studied in this paper, thus showing that stable matchings can be employed to show not only positive results on popular matchings (as is known) but also most negative ones. Problems for which we show new hardness results include finding a min-size (resp., max-size) popular matching that is not stable (resp., dominant). A known result for which we give a new and simple proof is the NP-hardness of finding a popular matching when G is nonbipartite. KW - popular matching KW - stable matching KW - complexity KW - dominant matching Y1 - 2022 U6 - https://doi.org/10.1137/19M124770X SN - 0895-4801 SN - 1095-7146 VL - 36 IS - 1 SP - 188 EP - 213 PB - Society for Industrial and Applied Mathematics CY - Philadelphia ER - TY - THES A1 - Huegle, Johannes T1 - Causal discovery in practice: Non-parametric conditional independence testing and tooling for causal discovery T1 - Kausale Entdeckung in der Praxis: Nichtparametrische bedingte Unabhängigkeitstests und Werkzeuge für die Kausalentdeckung N2 - Knowledge about causal structures is crucial for decision support in various domains. For example, in discrete manufacturing, identifying the root causes of failures and quality deviations that interrupt the highly automated production process requires causal structural knowledge. However, in practice, root cause analysis is usually built upon individual expert knowledge about associative relationships. But, "correlation does not imply causation", and misinterpreting associations often leads to incorrect conclusions. Recent developments in methods for causal discovery from observational data have opened the opportunity for a data-driven examination. Despite its potential for data-driven decision support, omnipresent challenges impede causal discovery in real-world scenarios. In this thesis, we make a threefold contribution to improving causal discovery in practice. (1) The growing interest in causal discovery has led to a broad spectrum of methods with specific assumptions on the data and various implementations. Hence, application in practice requires careful consideration of existing methods, which becomes laborious when dealing with various parameters, assumptions, and implementations in different programming languages. Additionally, evaluation is challenging due to the lack of ground truth in practice and limited benchmark data that reflect real-world data characteristics. To address these issues, we present a platform-independent modular pipeline for causal discovery and a ground truth framework for synthetic data generation that provides comprehensive evaluation opportunities, e.g., to examine the accuracy of causal discovery methods in case of inappropriate assumptions. (2) Applying constraint-based methods for causal discovery requires selecting a conditional independence (CI) test, which is particularly challenging in mixed discrete-continuous data omnipresent in many real-world scenarios. In this context, inappropriate assumptions on the data or the commonly applied discretization of continuous variables reduce the accuracy of CI decisions, leading to incorrect causal structures. Therefore, we contribute a non-parametric CI test leveraging k-nearest neighbors methods and prove its statistical validity and power in mixed discrete-continuous data, as well as the asymptotic consistency when used in constraint-based causal discovery. An extensive evaluation of synthetic and real-world data shows that the proposed CI test outperforms state-of-the-art approaches in the accuracy of CI testing and causal discovery, particularly in settings with low sample sizes. (3) To show the applicability and opportunities of causal discovery in practice, we examine our contributions in real-world discrete manufacturing use cases. For example, we showcase how causal structural knowledge helps to understand unforeseen production downtimes or adds decision support in case of failures and quality deviations in automotive body shop assembly lines. N2 - Kenntnisse über die Strukturen zugrundeliegender kausaler Mechanismen sind eine Voraussetzung für die Entscheidungsunterstützung in verschiedenen Bereichen. In der Fertigungsindustrie beispielsweise erfordert die Fehler-Ursachen-Analyse von Störungen und Qualitätsabweichungen, die den hochautomatisierten Produktionsprozess unterbrechen, kausales Strukturwissen. In Praxis stützt sich die Fehler-Ursachen-Analyse in der Regel jedoch auf individuellem Expertenwissen über assoziative Zusammenhänge. Aber "Korrelation impliziert nicht Kausalität", und die Fehlinterpretation assoziativer Zusammenhänge führt häufig zu falschen Schlussfolgerungen. Neueste Entwicklungen von Methoden des kausalen Strukturlernens haben die Möglichkeit einer datenbasierten Betrachtung eröffnet. Trotz seines Potenzials zur datenbasierten Entscheidungsunterstützung wird das kausale Strukturlernen in der Praxis jedoch durch allgegenwärtige Herausforderungen erschwert. In dieser Dissertation leisten wir einen dreifachen Beitrag zur Verbesserung des kausalen Strukturlernens in der Praxis. (1) Das wachsende Interesse an kausalem Strukturlernen hat zu einer Vielzahl von Methoden mit spezifischen statistischen Annahmen über die Daten und verschiedenen Implementierungen geführt. Daher erfordert die Anwendung in der Praxis eine sorgfältige Prüfung der vorhandenen Methoden, was eine Herausforderung darstellt, wenn verschiedene Parameter, Annahmen und Implementierungen in unterschiedlichen Programmiersprachen betrachtet werden. Hierbei wird die Evaluierung von Methoden des kausalen Strukturlernens zusätzlich durch das Fehlen von "Ground Truth" in der Praxis und begrenzten Benchmark-Daten, welche die Eigenschaften realer Datencharakteristiken widerspiegeln, erschwert. Um diese Probleme zu adressieren, stellen wir eine plattformunabhängige modulare Pipeline für kausales Strukturlernen und ein Tool zur Generierung synthetischer Daten vor, die umfassende Evaluierungsmöglichkeiten bieten, z.B. um Ungenauigkeiten von Methoden des Lernens kausaler Strukturen bei falschen Annahmen an die Daten aufzuzeigen. (2) Die Anwendung von constraint-basierten Methoden des kausalen Strukturlernens erfordert die Wahl eines bedingten Unabhängigkeitstests (CI-Test), was insbesondere bei gemischten diskreten und kontinuierlichen Daten, die in vielen realen Szenarien allgegenwärtig sind, die Anwendung erschwert. Beispielsweise führen falsche Annahmen der CI-Tests oder die Diskretisierung kontinuierlicher Variablen zu einer Verschlechterung der Korrektheit der Testentscheidungen, was in fehlerhaften kausalen Strukturen resultiert. Um diese Probleme zu adressieren, stellen wir einen nicht-parametrischen CI-Test vor, der auf Nächste-Nachbar-Methoden basiert, und beweisen dessen statistische Validität und Trennschärfe bei gemischten diskreten und kontinuierlichen Daten, sowie dessen asymptotische Konsistenz in constraint-basiertem kausalem Strukturlernen. Eine umfangreiche Evaluation auf synthetischen und realen Daten zeigt, dass der vorgeschlagene CI-Test bestehende Verfahren hinsichtlich der Korrektheit der Testentscheidung und gelernter kausaler Strukturen übertrifft, insbesondere bei geringen Stichprobengrößen. (3) Um die Anwendbarkeit und Möglichkeiten kausalen Strukturlernens in der Praxis aufzuzeigen, untersuchen wir unsere Beiträge in realen Anwendungsfällen aus der Fertigungsindustrie. Wir zeigen an mehreren Beispielen aus der automobilen Karosseriefertigungen wie kausales Strukturwissen helfen kann, unvorhergesehene Produktionsausfälle zu verstehen oder eine Entscheidungsunterstützung bei Störungen und Qualitätsabweichungen zu geben. KW - causal discovery KW - causal structure learning KW - causal AI KW - non-parametric conditional independence testing KW - manufacturing KW - causal reasoning KW - mixed data KW - kausale KI KW - kausale Entdeckung KW - kausale Schlussfolgerung KW - kausales Strukturlernen KW - Fertigung KW - gemischte Daten KW - nicht-parametrische bedingte Unabhängigkeitstests Y1 - 2024 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-635820 ER - TY - JOUR A1 - Casel, Katrin A1 - Fernau, Henning A1 - Ghadikolaei, Mehdi Khosravian A1 - Monnot, Jerome A1 - Sikora, Florian T1 - On the complexity of solution extension of optimization problems JF - Theoretical computer science : the journal of the EATCS N2 - The question if a given partial solution to a problem can be extended reasonably occurs in many algorithmic approaches for optimization problems. For instance, when enumerating minimal vertex covers of a graph G = (V, E), one usually arrives at the problem to decide for a vertex set U subset of V (pre-solution), if there exists a minimal vertex cover S (i.e., a vertex cover S subset of V such that no proper subset of S is a vertex cover) with U subset of S (minimal extension of U). We propose a general, partial-order based formulation of such extension problems which allows to model parameterization and approximation aspects of extension, and also highlights relationships between extension tasks for different specific problems. As examples, we study a number of specific problems which can be expressed and related in this framework. In particular, we discuss extension variants of the problems dominating set and feedback vertex/edge set. All these problems are shown to be NP-complete even when restricted to bipartite graphs of bounded degree, with the exception of our extension version of feedback edge set on undirected graphs which is shown to be solvable in polynomial time. For the extension variants of dominating and feedback vertex set, we also show NP-completeness for the restriction to planar graphs of bounded degree. As non-graph problem, we also study an extension version of the bin packing problem. We further consider the parameterized complexity of all these extension variants, where the parameter is a measure of the pre-solution as defined by our framework. KW - extension problems KW - NP-hardness KW - parameterized complexity Y1 - 2022 U6 - https://doi.org/10.1016/j.tcs.2021.10.017 SN - 0304-3975 SN - 1879-2294 VL - 904 SP - 48 EP - 65 PB - Elsevier CY - Amsterdam [u.a.] ER - TY - JOUR A1 - Coupette, Corinna A1 - Hartung, Dirk A1 - Beckedorf, Janis A1 - Böther, Maximilian A1 - Katz, Daniel Martin T1 - Law smells BT - defining and detecting problematic patterns in legal drafting JF - Artificial intelligence and law N2 - Building on the computer science concept of code smells, we initiate the study of law smells, i.e., patterns in legal texts that pose threats to the comprehensibility and maintainability of the law. With five intuitive law smells as running examples-namely, duplicated phrase, long element, large reference tree, ambiguous syntax, and natural language obsession-, we develop a comprehensive law smell taxonomy. This taxonomy classifies law smells by when they can be detected, which aspects of law they relate to, and how they can be discovered. We introduce text-based and graph-based methods to identify instances of law smells, confirming their utility in practice using the United States Code as a test case. Our work demonstrates how ideas from software engineering can be leveraged to assess and improve the quality of legal code, thus drawing attention to an understudied area in the intersection of law and computer science and highlighting the potential of computational legal drafting. KW - Refactoring KW - Software engineering KW - Law KW - Natural language processing KW - Network analysis Y1 - 2022 U6 - https://doi.org/10.1007/s10506-022-09315-w SN - 0924-8463 SN - 1572-8382 VL - 31 SP - 335 EP - 368 PB - Springer CY - Dordrecht ER - TY - JOUR A1 - Tang, Mitchell A1 - Nakamoto, Carter H. A1 - Stern, Ariel Dora A1 - Mehrotra, Ateev T1 - Trends in remote patient monitoring use in traditional Medicare JF - JAMA Internal Medicine N2 - This cross-sectional study uses traditional Medicare claims data to assess trends in general remote patient monitoring from January 2018 through September 2021. Y1 - 2022 U6 - https://doi.org/10.1001/jamainternmed.2022.3043 SN - 2168-6106 SN - 2168-6114 VL - 182 IS - 9 SP - 1005 EP - 1006 PB - American Veterinary Medical Association CY - Chicago ER - TY - JOUR A1 - Cseh, Ágnes A1 - Juhos, Attila T1 - Pairwise preferences in the stable marriage problem JF - ACM Transactions on Economics and Computation / Association for Computing Machinery N2 - We study the classical, two-sided stable marriage problem under pairwise preferences. In the most general setting, agents are allowed to express their preferences as comparisons of any two of their edges, and they also have the right to declare a draw or even withdraw from such a comparison. This freedom is then gradually restricted as we specify six stages of orderedness in the preferences, ending with the classical case of strictly ordered lists. We study all cases occurring when combining the three known notions of stability-weak, strong, and super-stability-under the assumption that each side of the bipartite market obtains one of the six degrees of orderedness. By designing three polynomial algorithms and two NP-completeness proofs, we determine the complexity of all cases not yet known and thus give an exact boundary in terms of preference structure between tractable and intractable cases. KW - Stable marriage KW - intransitivity KW - acyclic preferences KW - poset KW - weakly KW - stable matching KW - strongly stable matching KW - super stable matching Y1 - 2021 U6 - https://doi.org/10.1145/3434427 SN - 2167-8375 SN - 2167-8383 VL - 9 IS - 1 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Cseh, Ágnes A1 - Kavitha, Telikepalli T1 - Popular matchings in complete graphs JF - Algorithmica : an international journal in computer science N2 - Our input is a complete graph G on n vertices where each vertex has a strict ranking of all other vertices in G. The goal is to construct a matching in G that is popular. A matching M is popular if M does not lose a head-to-head election against any matching M ': here each vertex casts a vote for the matching in {M,M '} in which it gets a better assignment. Popular matchings need not exist in the given instance G and the popular matching problem is to decide whether one exists or not. The popular matching problem in G is easy to solve for odd n. Surprisingly, the problem becomes NP-complete for even n, as we show here. This is one of the few graph theoretic problems efficiently solvable when n has one parity and NP-complete when n has the other parity. KW - Popular matching KW - Complexity KW - Stable matching Y1 - 2021 U6 - https://doi.org/10.1007/s00453-020-00791-7 SN - 0178-4617 SN - 1432-0541 VL - 83 IS - 5 SP - 1493 EP - 1523 PB - Springer CY - New York ER - TY - JOUR A1 - Genske, Ulrich A1 - Jahnke, Paul T1 - Human observer net BT - a platform tool for human observer studies of image data JF - Radiology N2 - Background: Current software applications for human observer studies of images lack flexibility in study design, platform independence, multicenter use, and assessment methods and are not open source, limiting accessibility and expandability. Purpose: To develop a user-friendly software platform that enables efficient human observer studies in medical imaging with flexibility of study design. Materials and Methods: Software for human observer imaging studies was designed as an open-source web application to facilitate access, platform-independent usability, and multicenter studies. Different interfaces for study creation, participation, and management of results were implemented. The software was evaluated in human observer experiments between May 2019 and March 2021, in which duration of observer responses was tracked. Fourteen radiologists evaluated and graded software usability using the 100-point system usability scale. The application was tested in Chrome, Firefox, Safari, and Edge browsers. Results: Software function was designed to allow visual grading analysis (VGA), multiple-alternative forced-choice (m-AFC), receiver operating characteristic (ROC), localization ROC, free-response ROC, and customized designs. The mean duration of reader responses per image or per image set was 6.2 seconds 6 4.8 (standard deviation), 5.8 seconds 6 4.7, 8.7 seconds 6 5.7, and 6.0 seconds 6 4.5 in four-AFC with 160 image quartets per reader, four-AFC with 640 image quartets per reader, localization ROC, and experimental studies, respectively. The mean system usability scale score was 83 6 11 (out of 100). The documented code and a demonstration of the application are available online (https://github.com/genskeu/HON, https://hondemo.pythonanywhere.com/). Conclusion: A user-friendly and efficient open-source application was developed for human reader experiments that enables study design versatility, as well as platform-independent and multicenter usability. Y1 - 2022 U6 - https://doi.org/10.1148/radiol.211832 SN - 0033-8419 SN - 1527-1315 VL - 303 IS - 3 SP - 524 EP - 530 PB - Radiological Society of North America CY - Oak Brook, Ill. ER - TY - JOUR A1 - Puri, Manish A1 - Varde, Aparna S. A1 - Melo, Gerard de T1 - Commonsense based text mining on urban policy JF - Language resources and evaluation N2 - Local laws on urban policy, i.e., ordinances directly affect our daily life in various ways (health, business etc.), yet in practice, for many citizens they remain impervious and complex. This article focuses on an approach to make urban policy more accessible and comprehensible to the general public and to government officials, while also addressing pertinent social media postings. Due to the intricacies of the natural language, ranging from complex legalese in ordinances to informal lingo in tweets, it is practical to harness human judgment here. To this end, we mine ordinances and tweets via reasoning based on commonsense knowledge so as to better account for pragmatics and semantics in the text. Ours is pioneering work in ordinance mining, and thus there is no prior labeled training data available for learning. This gap is filled by commonsense knowledge, a prudent choice in situations involving a lack of adequate training data. The ordinance mining can be beneficial to the public in fathoming policies and to officials in assessing policy effectiveness based on public reactions. This work contributes to smart governance, leveraging transparency in governing processes via public involvement. We focus significantly on ordinances contributing to smart cities, hence an important goal is to assess how well an urban region heads towards a smart city as per its policies mapping with smart city characteristics, and the corresponding public satisfaction. KW - Commonsense reasoning KW - Opinion mining KW - Ordinances KW - Smart cities KW - Social KW - media KW - Text mining Y1 - 2022 U6 - https://doi.org/10.1007/s10579-022-09584-6 SN - 1574-020X SN - 1574-0218 VL - 57 SP - 733 EP - 763 PB - Springer CY - Dordrecht [u.a.] ER - TY - JOUR A1 - Bonnet, Philippe A1 - Dong, Xin Luna A1 - Naumann, Felix A1 - Tözün, Pınar T1 - VLDB 2021 BT - Designing a hybrid conference JF - SIGMOD record N2 - The 47th International Conference on Very Large Databases (VLDB'21) was held on August 16-20, 2021 as a hybrid conference. It attracted 180 in-person attendees in Copenhagen and 840 remote attendees. In this paper, we describe our key decisions as general chairs and program committee chairs and share the lessons we learned. Y1 - 2021 U6 - https://doi.org/10.1145/3516431.3516447 SN - 0163-5808 SN - 1943-5835 VL - 50 IS - 4 SP - 50 EP - 53 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Hagedorn, Christiane A1 - Serth, Sebastian A1 - Meinel, Christoph T1 - The mysterious adventures of Detective Duke BT - how storified programming MOOCs support learners in achieving their learning goals JF - Frontiers in education N2 - About 15 years ago, the first Massive Open Online Courses (MOOCs) appeared and revolutionized online education with more interactive and engaging course designs. Yet, keeping learners motivated and ensuring high satisfaction is one of the challenges today's course designers face. Therefore, many MOOC providers employed gamification elements that only boost extrinsic motivation briefly and are limited to platform support. In this article, we introduce and evaluate a gameful learning design we used in several iterations on computer science education courses. For each of the courses on the fundamentals of the Java programming language, we developed a self-contained, continuous story that accompanies learners through their learning journey and helps visualize key concepts. Furthermore, we share our approach to creating the surrounding story in our MOOCs and provide a guideline for educators to develop their own stories. Our data and the long-term evaluation spanning over four Java courses between 2017 and 2021 indicates the openness of learners toward storified programming courses in general and highlights those elements that had the highest impact. While only a few learners did not like the story at all, most learners consumed the additional story elements we provided. However, learners' interest in influencing the story through majority voting was negligible and did not show a considerable positive impact, so we continued with a fixed story instead. We did not find evidence that learners just participated in the narrative because they worked on all materials. Instead, for 10-16% of learners, the story was their main course motivation. We also investigated differences in the presentation format and concluded that several longer audio-book style videos were most preferred by learners in comparison to animated videos or different textual formats. Surprisingly, the availability of a coherent story embedding examples and providing a context for the practical programming exercises also led to a slightly higher ranking in the perceived quality of the learning material (by 4%). With our research in the context of storified MOOCs, we advance gameful learning designs, foster learner engagement and satisfaction in online courses, and help educators ease knowledge transfer for their learners. KW - gameful learning KW - storytelling KW - programming KW - learner engagement KW - course design KW - MOOCs KW - content gamification KW - narrative Y1 - 2023 U6 - https://doi.org/10.3389/feduc.2022.1016401 SN - 2504-284X VL - 7 PB - Frontiers Media CY - Lausanne ER - TY - THES A1 - Halfpap, Stefan T1 - Integer linear programming-based heuristics for partially replicated database clusters and selecting indexes T1 - Auf ganzzahliger linearer Optimierung basierende Heuristiken für partiell-replizierte Datenbankcluster und das Auswählen von Indizes N2 - Column-oriented database systems can efficiently process transactional and analytical queries on a single node. However, increasing or peak analytical loads can quickly saturate single-node database systems. Then, a common scale-out option is using a database cluster with a single primary node for transaction processing and read-only replicas. Using (the naive) full replication, queries are distributed among nodes independently of the accessed data. This approach is relatively expensive because all nodes must store all data and apply all data modifications caused by inserts, deletes, or updates. In contrast to full replication, partial replication is a more cost-efficient implementation: Instead of duplicating all data to all replica nodes, partial replicas store only a subset of the data while being able to process a large workload share. Besides lower storage costs, partial replicas enable (i) better scaling because replicas must potentially synchronize only subsets of the data modifications and thus have more capacity for read-only queries and (ii) better elasticity because replicas have to load less data and can be set up faster. However, splitting the overall workload evenly among the replica nodes while optimizing the data allocation is a challenging assignment problem. The calculation of optimized data allocations in a partially replicated database cluster can be modeled using integer linear programming (ILP). ILP is a common approach for solving assignment problems, also in the context of database systems. Because ILP is not scalable, existing approaches (also for calculating partial allocations) often fall back to simple (e.g., greedy) heuristics for larger problem instances. Simple heuristics may work well but can lose optimization potential. In this thesis, we present optimal and ILP-based heuristic programming models for calculating data fragment allocations for partially replicated database clusters. Using ILP, we are flexible to extend our models to (i) consider data modifications and reallocations and (ii) increase the robustness of allocations to compensate for node failures and workload uncertainty. We evaluate our approaches for TPC-H, TPC-DS, and a real-world accounting workload and compare the results to state-of-the-art allocation approaches. Our evaluations show significant improvements for varied allocation’s properties: Compared to existing approaches, we can, for example, (i) almost halve the amount of allocated data, (ii) improve the throughput in case of node failures and workload uncertainty while using even less memory, (iii) halve the costs of data modifications, and (iv) reallocate less than 90% of data when adding a node to the cluster. Importantly, we can calculate the corresponding ILP-based heuristic solutions within a few seconds. Finally, we demonstrate that the ideas of our ILP-based heuristics are also applicable to the index selection problem. N2 - Spaltenorientierte Datenbanksysteme können transaktionale und analytische Abfragen effizient auf einem einzigen Rechenknoten verarbeiten. Steigende Lasten oder Lastspitzen können Datenbanksysteme mit nur einem Rechenknoten jedoch schnell überlasten. Dann besteht eine gängige Skalierungsmöglichkeit darin, einen Datenbankcluster mit einem einzigen Rechenknoten für die Transaktionsverarbeitung und Replikatknoten für lesende Datenbankanfragen zu verwenden. Bei der (naiven) vollständigen Replikation werden Anfragen unabhängig von den Daten, auf die zugegriffen wird, auf die Knoten verteilt. Dieser Ansatz ist relativ teuer, da alle Knoten alle Daten speichern und alle Datenänderungen anwenden müssen, die durch das Einfügen, Löschen oder Aktualisieren von Datenbankeinträgen verursacht werden. Im Gegensatz zur vollständigen Replikation ist die partielle Replikation eine kostengünstige Alternative: Anstatt alle Daten auf alle Replikationsknoten zu duplizieren, speichern partielle Replikate nur eine Teilmenge der Daten und können gleichzeitig einen großen Anteil der Anfragelast verarbeiten. Neben niedrigeren Speicherkosten ermöglichen partielle Replikate (i) eine bessere Skalierung, da Replikate potenziell nur Teilmengen der Datenänderungen synchronisieren müssen und somit mehr Kapazität für lesende Anfragen haben, und (ii) eine bessere Elastizität, da Replikate weniger Daten laden müssen und daher schneller eingesetzt werden können. Die gleichmäßige Lastbalancierung auf die Replikatknoten bei gleichzeitiger Optimierung der Datenzuweisung ist jedoch ein schwieriges Zuordnungsproblem. Die Berechnung einer optimierten Datenverteilung in einem Datenbankcluster mit partiellen Replikaten kann mithilfe der ganzzahligen linearen Optimierung (engl. integer linear programming, ILP) durchgeführt werden. ILP ist ein gängiger Ansatz zur Lösung von Zuordnungsproblemen, auch im Kontext von Datenbanksystemen. Da ILP nicht skalierbar ist, greifen bestehende Ansätze (auch zur Berechnung von partiellen Replikationen) für größere Probleminstanzen oft auf einfache Heuristiken (z.B. Greedy-Algorithmen) zurück. Einfache Heuristiken können gut funktionieren, aber auch Optimierungspotenzial einbüßen. In dieser Arbeit stellen wir optimale und ILP-basierte heuristische Ansätze zur Berechnung von Datenzuweisungen für partiell-replizierte Datenbankcluster vor. Mithilfe von ILP können wir unsere Ansätze flexibel erweitern, um (i) Datenänderungen und -umverteilungen zu berücksichtigen und (ii) die Robustheit von Zuweisungen zu erhöhen, um Knotenausfälle und Unsicherheiten bezüglich der Anfragelast zu kompensieren. Wir evaluieren unsere Ansätze für TPC-H, TPC-DS und eine reale Buchhaltungsanfragelast und vergleichen die Ergebnisse mit herkömmlichen Verteilungsansätzen. Unsere Auswertungen zeigen signifikante Verbesserungen für verschiedene Eigenschaften der berechneten Datenzuordnungen: Im Vergleich zu bestehenden Ansätzen können wir beispielsweise (i) die Menge der gespeicherten Daten in Cluster fast halbieren, (ii) den Anfragedurchsatz bei Knotenausfällen und unsicherer Anfragelast verbessern und benötigen dafür auch noch weniger Speicher, (iii) die Kosten von Datenänderungen halbieren, und (iv) weniger als 90 % der Daten umverteilen, wenn ein Rechenknoten zum Cluster hinzugefügt wird. Wichtig ist, dass wir die entsprechenden ILP-basierten heuristischen Lösungen innerhalb weniger Sekunden berechnen können. Schließlich demonstrieren wir, dass die Ideen von unseren ILP-basierten Heuristiken auch auf das Indexauswahlproblem anwendbar sind. KW - database systems KW - integer linear programming KW - partial replication KW - index selection KW - load balancing KW - Datenbanksysteme KW - Indexauswahl KW - ganzzahlige lineare Optimierung KW - Lastverteilung KW - partielle Replikation Y1 - 2024 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-633615 ER - TY - JOUR A1 - Bläsius, Thomas A1 - Friedrich, Tobias A1 - Lischeid, Julius A1 - Meeks, Kitty A1 - Schirneck, Friedrich Martin T1 - Efficiently enumerating hitting sets of hypergraphs arising in data profiling JF - Journal of computer and system sciences : JCSS N2 - The transversal hypergraph problem asks to enumerate the minimal hitting sets of a hypergraph. If the solutions have bounded size, Eiter and Gottlob [SICOMP'95] gave an algorithm running in output-polynomial time, but whose space requirement also scales with the output. We improve this to polynomial delay and space. Central to our approach is the extension problem, deciding for a set X of vertices whether it is contained in any minimal hitting set. We show that this is one of the first natural problems to be W[3]-complete. We give an algorithm for the extension problem running in time O(m(vertical bar X vertical bar+1) n) and prove a SETH-lower bound showing that this is close to optimal. We apply our enumeration method to the discovery problem of minimal unique column combinations from data profiling. Our empirical evaluation suggests that the algorithm outperforms its worst-case guarantees on hypergraphs stemming from real-world databases. KW - Data profiling KW - Enumeration algorithm KW - Minimal hitting set KW - Transversal hypergraph KW - Unique column combination KW - W[3]-Completeness Y1 - 2022 U6 - https://doi.org/10.1016/j.jcss.2021.10.002 SN - 0022-0000 SN - 1090-2724 VL - 124 SP - 192 EP - 213 PB - Elsevier CY - San Diego ER - TY - JOUR A1 - Schlosser, Rainer A1 - Chenavaz, Régis Y. A1 - Dimitrov, Stanko T1 - Circular economy BT - joint dynamic pricing and recycling investments JF - International journal of production economics N2 - In a circular economy, the use of recycled resources in production is a key performance indicator for management. Yet, academic studies are still unable to inform managers on appropriate recycling and pricing policies. We develop an optimal control model integrating a firm's recycling rate, which can use both virgin and recycled resources in the production process. Our model accounts for recycling influence both at the supply- and demandsides. The positive effect of a firm's use of recycled resources diminishes over time but may increase through investments. Using general formulations for demand and cost, we analytically examine joint dynamic pricing and recycling investment policies in order to determine their optimal interplay over time. We provide numerical experiments to assess the existence of a steady-state and to calculate sensitivity analyses with respect to various model parameters. The analysis shows how to dynamically adapt jointly optimized controls to reach sustainability in the production process. Our results pave the way to sounder sustainable practices for firms operating within a circular economy. KW - Dynamic pricing KW - Recycling investments KW - Optimal control KW - General demand function KW - Circular economy Y1 - 2021 U6 - https://doi.org/10.1016/j.ijpe.2021.108117 SN - 0925-5273 SN - 1873-7579 VL - 236 PB - Elsevier CY - Amsterdam ER - TY - JOUR A1 - Thienen, Julia von A1 - Weinstein, Theresa Julia A1 - Meinel, Christoph T1 - Creative metacognition in design thinking BT - exploring theories, educational practices, and their implications for measurement JF - Frontiers in psychology N2 - Design thinking is a well-established practical and educational approach to fostering high-level creativity and innovation, which has been refined since the 1950s with the participation of experts like Joy Paul Guilford and Abraham Maslow. Through real-world projects, trainees learn to optimize their creative outcomes by developing and practicing creative cognition and metacognition. This paper provides a holistic perspective on creativity, enabling the formulation of a comprehensive theoretical framework of creative metacognition. It focuses on the design thinking approach to creativity and explores the role of metacognition in four areas of creativity expertise: Products, Processes, People, and Places. The analysis includes task-outcome relationships (product metacognition), the monitoring of strategy effectiveness (process metacognition), an understanding of individual or group strengths and weaknesses (people metacognition), and an examination of the mutual impact between environments and creativity (place metacognition). It also reviews measures taken in design thinking education, including a distribution of cognition and metacognition, to support students in their development of creative mastery. On these grounds, we propose extended methods for measuring creative metacognition with the goal of enhancing comprehensive assessments of the phenomenon. Proposed methodological advancements include accuracy sub-scales, experimental tasks where examinees explore problem and solution spaces, combinations of naturalistic observations with capability testing, as well as physiological assessments as indirect measures of creative metacognition. KW - accuracy KW - creativity KW - design thinking KW - education KW - measurement KW - metacognition KW - innovation KW - framework Y1 - 2023 U6 - https://doi.org/10.3389/fpsyg.2023.1157001 SN - 1664-1078 VL - 14 PB - Frontiers Research Foundation CY - Lausanne ER - TY - JOUR A1 - Belaid, Mohamed Karim A1 - Rabus, Maximilian A1 - Krestel, Ralf T1 - CrashNet BT - an encoder-decoder architecture to predict crash test outcomes JF - Data mining and knowledge discovery N2 - Destructive car crash tests are an elaborate, time-consuming, and expensive necessity of the automotive development process. Today, finite element method (FEM) simulations are used to reduce costs by simulating car crashes computationally. We propose CrashNet, an encoder-decoder deep neural network architecture that reduces costs further and models specific outcomes of car crashes very accurately. We achieve this by formulating car crash events as time series prediction enriched with a set of scalar features. Traditional sequence-to-sequence models are usually composed of convolutional neural network (CNN) and CNN transpose layers. We propose to concatenate those with an MLP capable of learning how to inject the given scalars into the output time series. In addition, we replace the CNN transpose with 2D CNN transpose layers in order to force the model to process the hidden state of the set of scalars as one time series. The proposed CrashNet model can be trained efficiently and is able to process scalars and time series as input in order to infer the results of crash tests. CrashNet produces results faster and at a lower cost compared to destructive tests and FEM simulations. Moreover, it represents a novel approach in the car safety management domain. KW - Predictive models KW - Time series analysis KW - Supervised deep neural KW - networks KW - Car safety management Y1 - 2021 U6 - https://doi.org/10.1007/s10618-021-00761-9 SN - 1384-5810 SN - 1573-756X VL - 35 IS - 4 SP - 1688 EP - 1709 PB - Springer CY - Dordrecht ER - TY - GEN A1 - Benson, Lawrence A1 - Makait, Hendrik A1 - Rabl, Tilmann T1 - Viper BT - An Efficient Hybrid PMem-DRAM Key-Value Store T2 - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät N2 - Key-value stores (KVSs) have found wide application in modern software systems. For persistence, their data resides in slow secondary storage, which requires KVSs to employ various techniques to increase their read and write performance from and to the underlying medium. Emerging persistent memory (PMem) technologies offer data persistence at close-to-DRAM speed, making them a promising alternative to classical disk-based storage. However, simply drop-in replacing existing storage with PMem does not yield good results, as block-based access behaves differently in PMem than on disk and ignores PMem's byte addressability, layout, and unique performance characteristics. In this paper, we propose three PMem-specific access patterns and implement them in a hybrid PMem-DRAM KVS called Viper. We employ a DRAM-based hash index and a PMem-aware storage layout to utilize the random-write speed of DRAM and efficient sequential-write performance PMem. Our evaluation shows that Viper significantly outperforms existing KVSs for core KVS operations while providing full data persistence. Moreover, Viper outperforms existing PMem-only, hybrid, and disk-based KVSs by 4-18x for write workloads, while matching or surpassing their get performance. T3 - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 20 KW - memory Y1 - 2021 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-559664 SN - 2150-8097 IS - 9 ER - TY - GEN A1 - Kruse, Sebastian A1 - Kaoudi, Zoi A1 - Contreras-Rojas, Bertty A1 - Chawla, Sanjay A1 - Naumann, Felix A1 - Quiané-Ruiz, Jorge-Arnulfo T1 - RHEEMix in the data jungle BT - a cost-based optimizer for cross-platform systems T2 - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät N2 - Data analytics are moving beyond the limits of a single platform. In this paper, we present the cost-based optimizer of Rheem, an open-source cross-platform system that copes with these new requirements. The optimizer allocates the subtasks of data analytic tasks to the most suitable platforms. Our main contributions are: (i) a mechanism based on graph transformations to explore alternative execution strategies; (ii) a novel graph-based approach to determine efficient data movement plans among subtasks and platforms; and (iii) an efficient plan enumeration algorithm, based on a novel enumeration algebra. We extensively evaluate our optimizer under diverse real tasks. We show that our optimizer can perform tasks more than one order of magnitude faster when using multiple platforms than when using a single platform. T3 - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 22 KW - cross-platform KW - polystore KW - query optimization KW - data processing Y1 - 2020 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-519443 IS - 6 ER - TY - THES A1 - Limberger, Daniel T1 - Concepts and techniques for 3D-embedded treemaps and their application to software visualization T1 - Konzepte und Techniken für 3D-eingebettete Treemaps und ihre Anwendung auf Softwarevisualisierung N2 - This thesis addresses concepts and techniques for interactive visualization of hierarchical data using treemaps. It explores (1) how treemaps can be embedded in 3D space to improve their information content and expressiveness, (2) how the readability of treemaps can be improved using level-of-detail and degree-of-interest techniques, and (3) how to design and implement a software framework for the real-time web-based rendering of treemaps embedded in 3D. With a particular emphasis on their application, use cases from software analytics are taken to test and evaluate the presented concepts and techniques. Concerning the first challenge, this thesis shows that a 3D attribute space offers enhanced possibilities for the visual mapping of data compared to classical 2D treemaps. In particular, embedding in 3D allows for improved implementation of visual variables (e.g., by sketchiness and color weaving), provision of new visual variables (e.g., by physically based materials and in situ templates), and integration of visual metaphors (e.g., by reference surfaces and renderings of natural phenomena) into the three-dimensional representation of treemaps. For the second challenge—the readability of an information visualization—the work shows that the generally higher visual clutter and increased cognitive load typically associated with three-dimensional information representations can be kept low in treemap-based representations of both small and large hierarchical datasets. By introducing an adaptive level-of-detail technique, we cannot only declutter the visualization results, thereby reducing cognitive load and mitigating occlusion problems, but also summarize and highlight relevant data. Furthermore, this approach facilitates automatic labeling, supports the emphasis on data outliers, and allows visual variables to be adjusted via degree-of-interest measures. The third challenge is addressed by developing a real-time rendering framework with WebGL and accumulative multi-frame rendering. The framework removes hardware constraints and graphics API requirements, reduces interaction response times, and simplifies high-quality rendering. At the same time, the implementation effort for a web-based deployment of treemaps is kept reasonable. The presented visualization concepts and techniques are applied and evaluated for use cases in software analysis. In this domain, data about software systems, especially about the state and evolution of the source code, does not have a descriptive appearance or natural geometric mapping, making information visualization a key technology here. In particular, software source code can be visualized with treemap-based approaches because of its inherently hierarchical structure. With treemaps embedded in 3D, we can create interactive software maps that visually map, software metrics, software developer activities, or information about the evolution of software systems alongside their hierarchical module structure. Discussions on remaining challenges and opportunities for future research for 3D-embedded treemaps and their applications conclude the thesis. N2 - Diese Doktorarbeit behandelt Konzepte und Techniken zur interaktiven Visualisierung hierarchischer Daten mit Hilfe von Treemaps. Sie untersucht (1), wie Treemaps im 3D-Raum eingebettet werden können, um ihre Informationsinhalte und Ausdrucksfähigkeit zu verbessern, (2) wie die Lesbarkeit von Treemaps durch Techniken wie Level-of-Detail und Degree-of-Interest verbessert werden kann, und (3) wie man ein Software-Framework für das Echtzeit-Rendering von Treemaps im 3D-Raum entwirft und implementiert. Dabei werden Anwendungsfälle aus der Software-Analyse besonders betont und zur Verprobung und Bewertung der Konzepte und Techniken verwendet. Hinsichtlich der ersten Herausforderung zeigt diese Arbeit, dass ein 3D-Attributraum im Vergleich zu klassischen 2D-Treemaps verbesserte Möglichkeiten für die visuelle Kartierung von Daten bietet. Insbesondere ermöglicht die Einbettung in 3D eine verbesserte Umsetzung von visuellen Variablen (z.B. durch Skizzenhaftigkeit und Farbverwebungen), die Bereitstellung neuer visueller Variablen (z.B. durch physikalisch basierte Materialien und In-situ-Vorlagen) und die Integration visueller Metaphern (z.B. durch Referenzflächen und Darstellungen natürlicher Phänomene) in die dreidimensionale Darstellung von Treemaps. Für die zweite Herausforderung – die Lesbarkeit von Informationsvisualisierungen – zeigt die Arbeit, dass die allgemein höhere visuelle Unübersichtlichkeit und die damit einhergehende, erhöhte kognitive Belastung, die typischerweise mit dreidimensionalen Informationsdarstellungen verbunden sind, in Treemap-basierten Darstellungen sowohl kleiner als auch großer hierarchischer Datensätze niedrig gehalten werden können. Durch die Einführung eines adaptiven Level-of-Detail-Verfahrens lassen sich nicht nur die Visualisierungsergebnisse übersichtlicher gestalten, die kognitive Belastung reduzieren und Verdeckungsprobleme verringern, sondern auch relevante Daten zusammenfassen und hervorheben. Darüber hinaus erleichtert dieser Ansatz eine automatische Beschriftung, unterstützt die Hervorhebung von Daten-Ausreißern und ermöglicht die Anpassung von visuellen Variablen über Degree-of-Interest-Maße. Die dritte Herausforderung wird durch die Entwicklung eines Echtzeit-Rendering-Frameworks mit WebGL und akkumulativem Multi-Frame-Rendering angegangen. Das Framework hebt mehrere Hardwarebeschränkungen und Anforderungen an die Grafik-API auf, verkürzt die Reaktionszeiten auf Interaktionen und vereinfacht qualitativ hochwertiges Rendering. Gleichzeitig wird der Implementierungsaufwand für einen webbasierten Einsatz von Treemaps geringgehalten. Die vorgestellten Visualisierungskonzepte und -techniken werden für Anwendungsfälle in der Softwareanalyse eingesetzt und evaluiert. In diesem Bereich haben Daten über Softwaresysteme, insbesondere über den Zustand und die Evolution des Quellcodes, keine anschauliche Erscheinung oder natürliche geometrische Zuordnung, so dass die Informationsvisualisierung hier eine Schlüsseltechnologie darstellt. Insbesondere Softwarequellcode kann aufgrund seiner inhärenten hierarchischen Struktur mit Hilfe von Treemap-basierten Ansätzen visualisiert werden. Mit in 3D-eingebetteten Treemaps können wir interaktive Softwarelagekarten erstellen, die z.B. Softwaremetriken, Aktivitäten von Softwareentwickler*innen und Informationen über die Evolution von Softwaresystemen in ihrer hierarchischen Modulstruktur abbilden und veranschaulichen. Diskussionen über verbleibende Herausforderungen und Möglichkeiten für zukünftige Forschung zu 3D-eingebetteten Treemaps und deren Anwendungen schließen die Arbeit ab. KW - treemaps KW - software visualization KW - software analytics KW - web-based rendering KW - degree-of-interest techniques KW - labeling KW - 3D-embedding KW - interactive visualization KW - progressive rendering KW - hierarchical data KW - 3D-Einbettung KW - Interessengrad-Techniken KW - hierarchische Daten KW - interaktive Visualisierung KW - Beschriftung KW - progressives Rendering KW - Softwareanalytik KW - Softwarevisualisierung KW - Treemaps KW - Web-basiertes Rendering Y1 - 2024 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-632014 ER - TY - CHAP A1 - Corazza, Giovanni Emanuele A1 - Thienen, Julia von ED - Glăveanu, Vlad Petre T1 - Invention T2 - The Palgrave encyclopedia of the possible N2 - This entry addresses invention from five different perspectives: (i) definition of the term, (ii) mechanisms underlying invention processes, (iii) (pre-)history of human inventions, (iv) intellectual property protection vs open innovation, and (v) case studies of great inventors. Regarding the definition, an invention is the outcome of a creative process taking place within a technological milieu, which is recognized as successful in terms of its effectiveness as an original technology. In the process of invention, a technological possibility becomes realized. Inventions are distinct from either discovery or innovation. In human creative processes, seven mechanisms of invention can be observed, yielding characteristic outcomes: (1) basic inventions, (2) invention branches, (3) invention combinations, (4) invention toolkits, (5) invention exaptations, (6) invention values, and (7) game-changing inventions. The development of humanity has been strongly shaped by inventions ever since early stone tools and the conception of agriculture. An “explosion of creativity” has been associated with Homo sapiens, and inventions in all fields of human endeavor have followed suit, engendering an exponential growth of cumulative culture. This culture development emerges essentially through a reuse of previous inventions, their revision, amendment and rededication. In sociocultural terms, humans have increasingly regulated processes of invention and invention-reuse through concepts such as intellectual property, patents, open innovation and licensing methods. Finally, three case studies of great inventors are considered: Edison, Marconi, and Montessori, next to a discussion of human invention processes as collaborative endeavors. KW - invention KW - creativity KW - invention mechanism KW - cumulative culture KW - technology KW - innovation KW - patent KW - open innovation Y1 - 2023 SN - 978-3-030-90912-3 SN - 978-3-030-90913-0 U6 - https://doi.org/10.1007/978-3-030-90913-0_14 SP - 806 EP - 814 PB - Springer International Publishing CY - Cham ER - TY - JOUR A1 - Hiort, Pauline A1 - Schlaffner, Christoph N. A1 - Steen, Judith A. A1 - Renard, Bernhard Y. A1 - Steen, Hanno T1 - multiFLEX-LF: a computational approach to quantify the modification stoichiometries in label-free proteomics data sets JF - Journal of proteome research N2 - In liquid-chromatography-tandem-mass-spectrometry-based proteomics, information about the presence and stoichiometry ofprotein modifications is not readily available. To overcome this problem,we developed multiFLEX-LF, a computational tool that builds uponFLEXIQuant, which detects modified peptide precursors and quantifiestheir modification extent by monitoring the differences between observedand expected intensities of the unmodified precursors. multiFLEX-LFrelies on robust linear regression to calculate the modification extent of agiven precursor relative to a within-study reference. multiFLEX-LF cananalyze entire label-free discovery proteomics data sets in a precursor-centric manner without preselecting a protein of interest. To analyzemodification dynamics and coregulated modifications, we hierarchicallyclustered the precursors of all proteins based on their computed relativemodification scores. We applied multiFLEX-LF to a data-independent-acquisition-based data set acquired using the anaphase-promoting complex/cyclosome (APC/C) isolated at various time pointsduring mitosis. The clustering of the precursors allows for identifying varying modification dynamics and ordering the modificationevents. Overall, multiFLEX-LF enables the fast identification of potentially differentially modified peptide precursors and thequantification of their differential modification extent in large data sets using a personal computer. Additionally, multiFLEX-LF candrive the large-scale investigation of the modification dynamics of peptide precursors in time-series and case-control studies.multiFLEX-LF is available athttps://gitlab.com/SteenOmicsLab/multiflex-lf. KW - bioinformatics tool KW - label-free quantification KW - LC-MS KW - MS KW - post-translational modification KW - modification stoichiometry KW - PTM KW - quantification Y1 - 2022 U6 - https://doi.org/10.1021/acs.jproteome.1c00669 SN - 1535-3893 SN - 1535-3907 VL - 21 IS - 4 SP - 899 EP - 909 PB - American Chemical Society CY - Washington ER - TY - JOUR A1 - Wittig, Alice A1 - Miranda, Fabio Malcher A1 - Hölzer, Martin A1 - Altenburg, Tom A1 - Bartoszewicz, Jakub Maciej A1 - Beyvers, Sebastian A1 - Dieckmann, Marius Alfred A1 - Genske, Ulrich A1 - Giese, Sven Hans-Joachim A1 - Nowicka, Melania A1 - Richard, Hugues A1 - Schiebenhoefer, Henning A1 - Schmachtenberg, Anna-Juliane A1 - Sieben, Paul A1 - Tang, Ming A1 - Tembrockhaus, Julius A1 - Renard, Bernhard Y. A1 - Fuchs, Stephan T1 - CovRadar BT - continuously tracking and filtering SARS-CoV-2 mutations for genomic surveillance JF - Bioinformatics N2 - The ongoing pandemic caused by SARS-CoV-2 emphasizes the importance of genomic surveillance to understand the evolution of the virus, to monitor the viral population, and plan epidemiological responses. Detailed analysis, easy visualization and intuitive filtering of the latest viral sequences are powerful for this purpose. We present CovRadar, a tool for genomic surveillance of the SARS-CoV-2 Spike protein. CovRadar consists of an analytical pipeline and a web application that enable the analysis and visualization of hundreds of thousand sequences. First, CovRadar extracts the regions of interest using local alignment, then builds a multiple sequence alignment, infers variants and consensus and finally presents the results in an interactive app, making accessing and reporting simple, flexible and fast. Y1 - 2022 U6 - https://doi.org/10.1093/bioinformatics/btac411 SN - 1367-4803 SN - 1367-4811 VL - 38 IS - 17 SP - 4223 EP - 4225 PB - Oxford Univ. Press CY - Oxford ER - TY - JOUR A1 - Omolaoye, Temidayo S. A1 - Omolaoye, Victor Adelakun A1 - Kandasamy, Richard K. A1 - Hachim, Mahmood Yaseen A1 - Du Plessis, Stefan S. T1 - Omics and male infertility BT - highlighting the application of transcriptomic data JF - Life : open access journal N2 - Male infertility is a multifaceted disorder affecting approximately 50% of male partners in infertile couples. Over the years, male infertility has been diagnosed mainly through semen analysis, hormone evaluations, medical records and physical examinations, which of course are fundamental, but yet inefficient, because 30% of male infertility cases remain idiopathic. This dilemmatic status of the unknown needs to be addressed with more sophisticated and result-driven technologies and/or techniques. Genetic alterations have been linked with male infertility, thereby unveiling the practicality of investigating this disorder from the "omics" perspective. Omics aims at analyzing the structure and functions of a whole constituent of a given biological function at different levels, including the molecular gene level (genomics), transcript level (transcriptomics), protein level (proteomics) and metabolites level (metabolomics). In the current study, an overview of the four branches of omics and their roles in male infertility are briefly discussed; the potential usefulness of assessing transcriptomic data to understand this pathology is also elucidated. After assessing the publicly obtainable transcriptomic data for datasets on male infertility, a total of 1385 datasets were retrieved, of which 10 datasets met the inclusion criteria and were used for further analysis. These datasets were classified into groups according to the disease or cause of male infertility. The groups include non-obstructive azoospermia (NOA), obstructive azoospermia (OA), non-obstructive and obstructive azoospermia (NOA and OA), spermatogenic dysfunction, sperm dysfunction, and Y chromosome microdeletion. Findings revealed that 8 genes (LDHC, PDHA2, TNP1, TNP2, ODF1, ODF2, SPINK2, PCDHB3) were commonly differentially expressed between all disease groups. Likewise, 56 genes were common between NOA versus NOA and OA (ADAD1, BANF2, BCL2L14, C12orf50, C20orf173, C22orf23, C6orf99, C9orf131, C9orf24, CABS1, CAPZA3, CCDC187, CCDC54, CDKN3, CEP170, CFAP206, CRISP2, CT83, CXorf65, FAM209A, FAM71F1, FAM81B, GALNTL5, GTSF1, H1FNT, HEMGN, HMGB4, KIF2B, LDHC, LOC441601, LYZL2, ODF1, ODF2, PCDHB3, PDHA2, PGK2, PIH1D2, PLCZ1, PROCA1, RIMBP3, ROPN1L, SHCBP1L, SMCP, SPATA16, SPATA19, SPINK2, TEX33, TKTL2, TMCO2, TMCO5A, TNP1, TNP2, TSPAN16, TSSK1B, TTLL2, UBQLN3). These genes, particularly the above-mentioned 8 genes, are involved in diverse biological processes such as germ cell development, spermatid development, spermatid differentiation, regulation of proteolysis, spermatogenesis and metabolic processes. Owing to the stage-specific expression of these genes, any mal-expression can ultimately lead to male infertility. Therefore, currently available data on all branches of omics relating to male fertility can be used to identify biomarkers for diagnosing male infertility, which can potentially help in unravelling some idiopathic cases. KW - male infertility KW - omics KW - genomics KW - transcriptomics KW - proteomics KW - metabolomics Y1 - 2022 U6 - https://doi.org/10.3390/life12020280 SN - 2075-1729 VL - 12 IS - 2 PB - MDPI CY - Basel ER -