004 Datenverarbeitung; Informatik
Refine
Year of publication
Document Type
- Article (342)
- Monograph/Edited Volume (168)
- Doctoral Thesis (161)
- Conference Proceeding (61)
- Postprint (50)
- Master's Thesis (10)
- Other (9)
- Preprint (3)
- Part of a Book (2)
- Bachelor Thesis (1)
Language
- English (614)
- German (193)
- Multiple languages (2)
Keywords
- Informatik (21)
- machine learning (20)
- Didaktik (15)
- Hochschuldidaktik (14)
- Ausbildung (13)
- Cloud Computing (13)
- answer set programming (13)
- cloud computing (13)
- maschinelles Lernen (11)
- Forschungsprojekte (10)
Institute
- Institut für Informatik und Computational Science (271)
- Hasso-Plattner-Institut für Digital Engineering gGmbH (214)
- Hasso-Plattner-Institut für Digital Engineering GmbH (137)
- Extern (65)
- Fachgruppe Betriebswirtschaftslehre (40)
- Mathematisch-Naturwissenschaftliche Fakultät (24)
- Wirtschaftswissenschaften (19)
- Institut für Mathematik (16)
- Bürgerliches Recht (12)
- Digital Engineering Fakultät (8)
Data preparation stands as a cornerstone in the landscape of data science workflows, commanding a significant portion—approximately 80%—of a data scientist's time. The extensive time consumption in data preparation is primarily attributed to the intricate challenge faced by data scientists in devising tailored solutions for downstream tasks. This complexity is further magnified by the inadequate availability of metadata, the often ad-hoc nature of preparation tasks, and the necessity for data scientists to grapple with a diverse range of sophisticated tools, each presenting its unique intricacies and demands for proficiency.
Previous research in data management has traditionally concentrated on preparing the content within columns and rows of a relational table, addressing tasks, such as string disambiguation, date standardization, or numeric value normalization, commonly referred to as data cleaning. This focus assumes a perfectly structured input table. Consequently, the mentioned data cleaning tasks can be effectively applied only after the table has been successfully loaded into the respective data cleaning environment, typically in the later stages of the data processing pipeline.
While current data cleaning tools are well-suited for relational tables, extensive data repositories frequently contain data stored in plain text files, such as CSV files, due to their adaptable standard. Consequently, these files often exhibit tables with a flexible layout of rows and columns, lacking a relational structure. This flexibility often results in data being distributed across cells in arbitrary positions, typically guided by user-specified formatting guidelines.
Effectively extracting and leveraging these tables in subsequent processing stages necessitates accurate parsing. This thesis emphasizes what we define as the “structure” of a data file—the fundamental characters within a file essential for parsing and comprehending its content. Concentrating on the initial stages of the data preprocessing pipeline, this thesis addresses two crucial aspects: comprehending the structural layout of a table within a raw data file and automatically identifying and rectifying any structural issues that might hinder its parsing. Although these issues may not directly impact the table's content, they pose significant challenges in parsing the table within the file.
Our initial contribution comprises an extensive survey of commercially available data preparation tools. This survey thoroughly examines their distinct features, the lacking features, and the necessity for preliminary data processing despite these tools. The primary goal is to elucidate the current state-of-the-art in data preparation systems while identifying areas for enhancement. Furthermore, the survey explores the encountered challenges in data preprocessing, emphasizing opportunities for future research and improvement.
Next, we propose a novel data preparation pipeline designed for detecting and correcting structural errors. The aim of this pipeline is to assist users at the initial preprocessing stage by ensuring the correct loading of their data into their preferred systems. Our approach begins by introducing SURAGH, an unsupervised system that utilizes a pattern-based method to identify dominant patterns within a file, independent of external information, such as data types, row structures, or schemata. By identifying deviations from the dominant pattern, it detects ill-formed rows. Subsequently, our structure correction system, TASHEEH, gathers the identified ill-formed rows along with dominant patterns and employs a novel pattern transformation algebra to automatically rectify errors. Our pipeline serves as an end-to-end solution, transforming a structurally broken CSV file into a well-formatted one, usually suitable for seamless loading.
Finally, we introduce MORPHER, a user-friendly GUI integrating the functionalities of both SURAGH and TASHEEH. This interface empowers users to access the pipeline's features through visual elements. Our extensive experiments demonstrate the effectiveness of our data preparation systems, requiring no user involvement. Both SURAGH and TASHEEH outperform existing state-of-the-art methods significantly in both precision and recall.
Formal constraints on crossing dependencies have played a large role in research on the formal complexity of natural language grammars and parsing. Here we ask whether the apparent evidence for constraints on crossing dependencies in treebanks might arise because of independent constraints on trees, such as low arity and dependency length minimization. We address this question using two sets of experiments. In Experiment 1, we compare the distribution of formal properties of crossing dependencies, such as gap degree, between real trees and baseline trees matched for rate of crossing dependencies and various other properties. In Experiment 2, we model whether two dependencies cross, given certain psycholinguistic properties of the dependencies. We find surprisingly weak evidence for constraints originating from the mild context-sensitivity literature (gap degree and well-nestedness) beyond what can be explained by constraints on rate of crossing dependencies, topological properties of the trees, and dependency length. However, measures that have emerged from the parsing literature (e.g., edge degree, end-point crossings, and heads' depth difference) differ strongly between real and random trees. Modeling results show that cognitive metrics relating to information locality and working-memory limitations affect whether two dependencies cross or not, but they do not fully explain the distribution of crossing dependencies in natural languages. Together these results suggest that crossing constraints are better characterized by processing pressures than by mildly context-sensitive constraints.
The purpose of this study was to examine the moderating effects of technology use for relationship maintenance on the longitudinal associations among self-isolation during the coronavirus-19 (COVID-19) pandemic and romantic relationship quality among adolescents. Participants were 239 (120 female; M age = 16.69, standard deviation [SD] = 0.61; 60 percent Caucasian) 11th and 12th graders from three midwestern high schools. To qualify for this study, adolescents had to be in the same romantic relationship for the duration of the study, similar to 7 months (M length of relationship = 10.03 months). Data were collected in October of 2019 (Time 1) and again 7 months later in May of 2020 (Time 2). Adolescents completed a romantic relationship questionnaire at Time 1 and again at Time 2, along with questionnaires on frequency of self-isolation during the COVID-19 pandemic and use of technology for romantic relationship maintenance. Findings revealed that increases in self-isolation during the COVID-19 pandemic related positively to the use of technology for romantic relationship maintenance and negatively to Time 2 romantic relationship quality. High use of technology for romantic relationship maintenance buffered against the negative effects of self-isolation during the COVID-19 pandemic on adolescents' romantic relationship quality 7 months later, whereas low use strengthened the negative relationship between self-isolation during the COVID-19 pandemic and romantic relationship quality. These findings suggest the importance of considering the implications of societal crisis or pandemics on adolescents' close relationships, particularly their romantic relationships.
Traditionally, business models and software designs used to model the usage of artificial intelligence (AI) at a very specific point in the process or rather fix implemented application. Since applications can be based on AI, such as networked artificial neural networks (ANN) on top of which applications are installed, these on-top applications can be instructed directly from their underlying ANN compartments [1]. However, with the integration of several AI-based systems, their coordination is a highly relevant target factor for the operation and improvement of networked processes, such as they can be found in cross-organizational production contexts spanning multiple distributed locations. This work aims to extend prior research on managing artificial knowledge transfers among interlinked AIs as coordination instrument by examining effects of different activation types (respective activation rates and cycles) on by ANN-instructed production machines. In a design-science-oriented way, this paper conceptualizes rhythmic state descriptions for dynamic systems and associated 14 experiment designs. Two experiments have been realized, analyzed and evaluated thereafter in regard with their activities and processes induced. Findings show that the simulator [2] used and experiments designed and realized, here, (I) enable research on ANN activation types, (II) illustrate ANN-based production networks disrupted by activation types and clarify the need for harmonizing them. Further, (III) management interventions are derived for harmonizing interlinked ANNs. This study establishes the importance of site-specific coordination mechanisms and novel forms of management interventions as drivers of efficient artificial knowledge transfer.
With the further development of more and more production machines into cyber-physical systems, and their greater integration with artificial intelligence (AI) techniques, the coordination of intelligent systems is a highly relevant target factor for the operation and improvement of networked processes, such as they can be found in cross-organizational production contexts spanning multiple distributed locations. This work aims to extend prior research on managing their artificial knowledge transfers as coordination instrument by examining effects of different activation types (respective activation rates and cycles) on by Artificial Neural Network (ANN)-instructed production machines. For this, it provides a new integration type of ANN-based cyber-physical production system as a tool to research artificial knowledge transfers: In a design-science-oriented way, a prototype of a simulation system is constructed as Open Source information system which will be used in on-building research to (I) enable research on ANN activation types in production networks, (II) illustrate ANN-based production networks disrupted by activation types and clarify the need for harmonizing them, and (III) demonstrate conceptual management interventions. This simulator shall establish the importance of site-specific coordination mechanisms and novel forms of management interventions as drivers of efficient artificial knowledge transfer.
A remarkable peculiarity of videoconferencing (VC) applications – the self-view – a.k.a. digital mirror, is examined as a potential reason behind the voiced exhaustion among users. This work draws on technostress research and objective self-awareness theory and proposes the communication role (sender vs. receiver) as an interaction variable. We report the results of two studies among European employees (n1 = 176, n2 = 253) with a one-year time lag. A higher frequency of self-view in a VC when receiving a message, i.e., listening to others, indirectly increases negative affect (study 1 & 2) and exhaustion (study 2) via the increased state of public self-awareness. Self-viewing in the role of message sender, e.g., as an online presenter, also increases public self-awareness, but its overall effects are less harmful. As for individual differences, users predisposed to public self-consciousness were more concerned with how other VC participants perceived them. Gender effects were insignificant.
Organizations are investing billions on innovation and agility initiatives to stay competitive in their increasingly uncertain business environments. Design Thinking, an innovation approach based on human-centered exploration, ideation and experimentation, has gained increasing popularity. The market for Design Thinking, including software products and general services, is projected to reach 2.500 million $ (US-Dollar) by 2028. A dispersed set of positive outcomes have been attributed to Design Thinking. However, there is no clear understanding of what exactly comprises the impact of Design Thinking and how it is created. To support a billion-dollar market, it is essential to understand the value Design Thinking is bringing to organizations not only to justify large investments, but to continuously improve the approach and its application.
Following a qualitative research approach combined with results from a systematic literature review, the results presented in this dissertation offer a structured understanding of Design Thinking impact. The results are structured along two main perspectives of impact: the individual and the organizational perspective. First, insights from qualitative data analysis demonstrate that measuring and assessing the impact of Design Thinking is currently one central challenge for Design Thinking practitioners in organizations. Second, the interview data revealed several effects Design Thinking has on individuals, demonstrating how Design Thinking can impact boundary management behaviors and enable employees to craft their jobs more actively.
Contributing to innovation management research, the work presented in this dissertation systematically explains the Design Thinking impact, allowing other researchers to both locate and integrate their work better. The results of this research advance the theoretical rigor of Design Thinking impact research, offering multiple theoretical underpinnings explaining the variety of Design Thinking impact. Furthermore, this dissertation contains three specific propositions on how Design Thinking creates an impact: Design Thinking creates an impact through integration, enablement, and engagement. Integration refers to how Design Thinking enables organizations through effectively combining things, such as for example fostering balance between exploitation and exploration activities. Through Engagement, Design Thinking impacts organizations involving users and other relevant stakeholders in their work. Moreover, Design Thinking creates impact through Enablement, making it possible for individuals to enact a specific behavior or experience certain states.
By synthesizing multiple theoretical streams into these three overarching themes, the results of this research can help bridge disciplinary boundaries, for example between business, psychology and design, and enhance future collaborative research. Practitioners benefit from the results as multiple desirable outcomes are detailed in this thesis, such as successful individual job crafting behaviors, which can be expected from practicing Design Thinking. This allows practitioners to enact more evidence-based decision-making concerning Design Thinking implementation. Overall, considering multiple levels of impact as well as a broad range of theoretical underpinnings are paramount to understanding and fostering Design Thinking impact.
Technology for humanity
(2023)
The usage of data to improve or create business models has become vital for companies in the 21st century. However, to extract value from data it is important to understand the business model. Taxonomies for data-driven business models (DDBM) aim to provide guidance for the development and ideation of new business models relying on data. In IS research, however, different taxonomies have emerged in recent years, partly redundant, partly contradictory. Thus, there is a need to synthesize the common ground of these taxonomies within IS research. Based on 26 IS-related taxonomies and 30 cases, we derive and define 14 generic building blocks of DDBM to develop a consolidated taxonomy that represents the current state-of-the-art. Thus, we integrate existing research on DDBM and provide avenues for further exploration of data-induced potentials for business models as well as for the development and analysis of general or industry-specific DDBM.
Due to changing customer behavior in digitalization, banks urge to change their traditional value creation in order to improve interaction with customers. New digital technologies such as core banking solutions change organizational structures to provide organizational and individual affordances in IT-supported personal advisory. Based on adaptive structuration theory and with qualitative data from 24 German banks, we identify first, second and third order issues of organizational change in value creation, which are connected with a set of affordances and constraints as the outcomes for customer interaction.