004 Datenverarbeitung; Informatik
Refine
Year of publication
Document Type
- Article (168)
- Monograph/Edited Volume (157)
- Doctoral Thesis (148)
- Postprint (49)
- Conference Proceeding (29)
- Master's Thesis (8)
- Other (7)
- Part of a Book (2)
- Bachelor Thesis (1)
- Habilitation Thesis (1)
Language
- English (466)
- German (104)
- Multiple languages (2)
Is part of the Bibliography
- yes (572) (remove)
Keywords
- machine learning (16)
- answer set programming (13)
- Hasso-Plattner-Institut (10)
- cloud computing (10)
- Cloud Computing (9)
- Hasso Plattner Institute (9)
- Forschungskolleg (8)
- Klausurtagung (8)
- Machine Learning (8)
- Maschinelles Lernen (8)
Institute
- Hasso-Plattner-Institut für Digital Engineering gGmbH (194)
- Institut für Informatik und Computational Science (159)
- Hasso-Plattner-Institut für Digital Engineering GmbH (106)
- Fachgruppe Betriebswirtschaftslehre (27)
- Mathematisch-Naturwissenschaftliche Fakultät (23)
- Wirtschaftswissenschaften (18)
- Extern (17)
- Bürgerliches Recht (12)
- Digital Engineering Fakultät (8)
- Institut für Physik und Astronomie (8)
Column-oriented database systems can efficiently process transactional and analytical queries on a single node. However, increasing or peak analytical loads can quickly saturate single-node database systems. Then, a common scale-out option is using a database cluster with a single primary node for transaction processing and read-only replicas. Using (the naive) full replication, queries are distributed among nodes independently of the accessed data. This approach is relatively expensive because all nodes must store all data and apply all data modifications caused by inserts, deletes, or updates.
In contrast to full replication, partial replication is a more cost-efficient implementation: Instead of duplicating all data to all replica nodes, partial replicas store only a subset of the data while being able to process a large workload share. Besides lower storage costs, partial replicas enable (i) better scaling because replicas must potentially synchronize only subsets of the data modifications and thus have more capacity for read-only queries and (ii) better elasticity because replicas have to load less data and can be set up faster. However, splitting the overall workload evenly among the replica nodes while optimizing the data allocation is a challenging assignment problem.
The calculation of optimized data allocations in a partially replicated database cluster can be modeled using integer linear programming (ILP). ILP is a common approach for solving assignment problems, also in the context of database systems. Because ILP is not scalable, existing approaches (also for calculating partial allocations) often fall back to simple (e.g., greedy) heuristics for larger problem instances. Simple heuristics may work well but can lose optimization potential.
In this thesis, we present optimal and ILP-based heuristic programming models for calculating data fragment allocations for partially replicated database clusters. Using ILP, we are flexible to extend our models to (i) consider data modifications and reallocations and (ii) increase the robustness of allocations to compensate for node failures and workload uncertainty. We evaluate our approaches for TPC-H, TPC-DS, and a real-world accounting workload and compare the results to state-of-the-art allocation approaches. Our evaluations show significant improvements for varied allocation’s properties: Compared to existing approaches, we can, for example, (i) almost halve the amount of allocated data, (ii) improve the throughput in case of node failures and workload uncertainty while using even less memory, (iii) halve the costs of data modifications, and (iv) reallocate less than 90% of data when adding a node to the cluster. Importantly, we can calculate the corresponding ILP-based heuristic solutions within a few seconds. Finally, we demonstrate that the ideas of our ILP-based heuristics are also applicable to the index selection problem.
The transversal hypergraph problem asks to enumerate the minimal hitting sets of a hypergraph. If the solutions have bounded size, Eiter and Gottlob [SICOMP'95] gave an algorithm running in output-polynomial time, but whose space requirement also scales with the output. We improve this to polynomial delay and space. Central to our approach is the extension problem, deciding for a set X of vertices whether it is contained in any minimal hitting set. We show that this is one of the first natural problems to be W[3]-complete. We give an algorithm for the extension problem running in time O(m(vertical bar X vertical bar+1) n) and prove a SETH-lower bound showing that this is close to optimal. We apply our enumeration method to the discovery problem of minimal unique column combinations from data profiling. Our empirical evaluation suggests that the algorithm outperforms its worst-case guarantees on hypergraphs stemming from real-world databases.
Viper
(2021)
Key-value stores (KVSs) have found wide application in modern software systems. For persistence, their data resides in slow secondary storage, which requires KVSs to employ various techniques to increase their read and write performance from and to the underlying medium. Emerging persistent memory (PMem) technologies offer data persistence at close-to-DRAM speed, making them a promising alternative to classical disk-based storage. However, simply drop-in replacing existing storage with PMem does not yield good results, as block-based access behaves differently in PMem than on disk and ignores PMem's byte addressability, layout, and unique performance characteristics. In this paper, we propose three PMem-specific access patterns and implement them in a hybrid PMem-DRAM KVS called Viper. We employ a DRAM-based hash index and a PMem-aware storage layout to utilize the random-write speed of DRAM and efficient sequential-write performance PMem. Our evaluation shows that Viper significantly outperforms existing KVSs for core KVS operations while providing full data persistence. Moreover, Viper outperforms existing PMem-only, hybrid, and disk-based KVSs by 4-18x for write workloads, while matching or surpassing their get performance.
Selbstbestimmtes Lernen mit Onlinekursen findet zunehmend mehr Akzeptanz in unserer Gesellschaft. Lernende können mithilfe von Onlinekursen selbst festlegen, was sie wann lernen und Kurse können durch vielfältige Adaptionen an den Lernfortschritt der Nutzer angepasst und individualisiert werden. Auf der einen Seite ist eine große Zielgruppe für diese Lernangebote vorhanden. Auf der anderen Seite sind die Erstellung von Onlinekursen, ihre Bereitstellung, Wartung und Betreuung kostenintensiv, wodurch hochwertige Angebote häufig kostenpflichtig angeboten werden müssen, um als Anbieter zumindest kostenneutral agieren zu können. In diesem Beitrag erörtern und diskutieren wir ein offenes, nachhaltiges datengetriebenes zweiseitiges Geschäftsmodell zur Verwertung geprüfter Onlinekurse und deren kostenfreie Bereitstellung für jeden Lernenden. Kern des Geschäftsmodells ist die Nutzung der dabei entstehenden Verhaltensdaten, die daraus mögliche Ableitung von Persönlichkeitsmerkmalen und Interessen und deren Nutzung im kommerziellen Kontext. Dies ist eine bei der Websuche bereits weitläufig akzeptierte Methode, welche nun auf den Lernkontext übertragen wird. Welche Möglichkeiten, Herausforderungen, aber auch Barrieren überwunden werden müssen, damit das Geschäftsmodell nachhaltig und ethisch vertretbar funktioniert, werden zwei unabhängige, jedoch synergetisch verbundene Geschäftsmodelle vorgestellt und diskutiert. Zusätzlich wurde die Akzeptanz und Erwartung der Zielgruppe für das vorgestellte Geschäftsmodell untersucht, um notwendige Kernressourcen für die Praxis abzuleiten. Die Ergebnisse der Untersuchung zeigen, dass das Geschäftsmodell von den Nutzer*innen grundlegend akzeptiert wird. 10 % der Befragten würden es bevorzugen, mit virtuellen Assistenten – anstelle mit Tutor*innen zu lernen. Zudem ist der Großteil der Nutzer*innen sich nicht darüber bewusst, dass Persönlichkeitsmerkmale anhand des Nutzerverhaltens abgeleitet werden können.
Digital Platforms (DPs) has established themself in recent years as a central concept of the Information Technology Science. Due to the great diversity of digital platform concepts, clear definitions are still required. Furthermore, DPs are subject to dynamic changes from internal and external factors, which pose challenges for digital platform operators, developers and customers. Which current digital platform research directions should be taken to address these challenges remains open so far. The following paper aims to contribute to this by outlining a systematic literature review (SLR) on digital platform concepts in the context of the Industrial Internet of Things (IIoT) for manufacturing companies and provides a basis for (1) a selection of definitions of current digital platform and ecosystem concepts and (2) a selection of current digital platform research directions. These directions are diverted into (a) occurrence of digital platforms, (b) emergence of digital platforms, (c) evaluation of digital platforms, (d) development of digital platforms, and (e) selection of digital platforms.
The intensity of cosmic radiation may differ over five orders of magnitude within a few hours or days during the Solar Particle Events (SPEs), thus increasing for several orders of magnitude the probability of Single Event Upsets (SEUs) in space-borne electronic systems. Therefore, it is vital to enable the early detection of the SEU rate changes in order to ensure timely activation of dynamic radiation hardening measures. In this paper, an embedded approach for the prediction of SPEs and SRAM SEU rate is presented. The proposed solution combines the real-time SRAM-based SEU monitor, the offline-trained machine learning model and online learning algorithm for the prediction. With respect to the state-of-the-art, our solution brings the following benefits: (1) Use of existing on-chip data storage SRAM as a particle detector, thus minimizing the hardware and power overhead, (2) Prediction of SRAM SEU rate one hour in advance, with the fine-grained hourly tracking of SEU variations during SPEs as well as under normal conditions, (3) Online optimization of the prediction model for enhancing the prediction accuracy during run-time, (4) Negligible cost of hardware accelerator design for the implementation of selected machine learning model and online learning algorithm. The proposed design is intended for a highly dependable and self-adaptive multiprocessing system employed in space applications, allowing to trigger the radiation mitigation mechanisms before the onset of high radiation levels.
Invention
(2023)
This entry addresses invention from five different perspectives: (i) definition of the term, (ii) mechanisms underlying invention processes, (iii) (pre-)history of human inventions, (iv) intellectual property protection vs open innovation, and (v) case studies of great inventors. Regarding the definition, an invention is the outcome of a creative process taking place within a technological milieu, which is recognized as successful in terms of its effectiveness as an original technology. In the process of invention, a technological possibility becomes realized. Inventions are distinct from either discovery or innovation. In human creative processes, seven mechanisms of invention can be observed, yielding characteristic outcomes: (1) basic inventions, (2) invention branches, (3) invention combinations, (4) invention toolkits, (5) invention exaptations, (6) invention values, and (7) game-changing inventions. The development of humanity has been strongly shaped by inventions ever since early stone tools and the conception of agriculture. An “explosion of creativity” has been associated with Homo sapiens, and inventions in all fields of human endeavor have followed suit, engendering an exponential growth of cumulative culture. This culture development emerges essentially through a reuse of previous inventions, their revision, amendment and rededication. In sociocultural terms, humans have increasingly regulated processes of invention and invention-reuse through concepts such as intellectual property, patents, open innovation and licensing methods. Finally, three case studies of great inventors are considered: Edison, Marconi, and Montessori, next to a discussion of human invention processes as collaborative endeavors.
In control theory, to solve a finite-horizon sequential decision problem (SDP) commonly means to find a list of decision rules that result in an optimal expected total reward (or cost) when taking a given number of decision steps. SDPs are routinely solved using Bellman's backward induction. Textbook authors (e.g. Bertsekas or Puterman) typically give more or less formal proofs to show that the backward induction algorithm is correct as solution method for deterministic and stochastic SDPs. Botta, Jansson and Ionescu propose a generic framework for finite horizon, monadic SDPs together with a monadic version of backward induction for solving such SDPs. In monadic SDPs, the monad captures a generic notion of uncertainty, while a generic measure function aggregates rewards. In the present paper, we define a notion of correctness for monadic SDPs and identify three conditions that allow us to prove a correctness result for monadic backward induction that is comparable to textbook correctness proofs for ordinary backward induction. The conditions that we impose are fairly general and can be cast in category-theoretical terms using the notion of Eilenberg-Moore algebra. They hold in familiar settings like those of deterministic or stochastic SDPs, but we also give examples in which they fail. Our results show that backward induction can safely be employed for a broader class of SDPs than usually treated in textbooks. However, they also rule out certain instances that were considered admissible in the context of Botta et al. 's generic framework. Our development is formalised in Idris as an extension of the Botta et al. framework and the sources are available as supplementary material.
CovRadar
(2022)
The ongoing pandemic caused by SARS-CoV-2 emphasizes the importance of genomic surveillance to understand the evolution of the virus, to monitor the viral population, and plan epidemiological responses. Detailed analysis, easy visualization and intuitive filtering of the latest viral sequences are powerful for this purpose. We present CovRadar, a tool for genomic surveillance of the SARS-CoV-2 Spike protein. CovRadar consists of an analytical pipeline and a web application that enable the analysis and visualization of hundreds of thousand sequences. First, CovRadar extracts the regions of interest using local alignment, then builds a multiple sequence alignment, infers variants and consensus and finally presents the results in an interactive app, making accessing and reporting simple, flexible and fast.
The management of knowledge in organizations considers both established long-term
processes and cooperation in agile project teams. Since knowledge can be both tacit and explicit, its transfer from the individual to the organizational knowledge base poses a challenge in organizations. This challenge increases when the fluctuation of knowledge carriers is exceptionally high. Especially in large projects in which external consultants are involved, there is a risk that critical, company-relevant knowledge generated in the project will leave the company with the external knowledge carrier and thus be lost. In this paper, we show the advantages of an early warning system for knowledge management to avoid this loss. In particular, the potential of visual analytics in the context of knowledge management systems is presented and discussed. We present a project for the development of a business-critical software system and discuss the first implementations and results.
Openness indicators for the evaluation of digital platforms between the launch and maturity phase
(2024)
In recent years, the evaluation of digital platforms has become an important focus in the field of information systems science. The identification of influential indicators that drive changes in digital platforms, specifically those related to openness, is still an unresolved issue. This paper addresses the challenge of identifying measurable indicators and characterizing the transition from launch to maturity in digital platforms. It proposes a systematic analytical approach to identify relevant openness indicators for evaluation purposes. The main contributions of this study are the following (1) the development of a comprehensive procedure for analyzing indicators, (2) the categorization of indicators as evaluation metrics within a multidimensional grid-box model, (3) the selection and evaluation of relevant indicators, (4) the identification and assessment of digital platform architectures during the launch-to-maturity transition, and (5) the evaluation of the applicability of the conceptualization and design process for digital platform evaluation.
The increasing demand for software engineers cannot completely be fulfilled by university education and conventional training approaches due to limited capacities. Accordingly, an alternative approach is necessary where potential software engineers are being educated in software engineering skills using new methods. We suggest micro tasks combined with theoretical lessons to overcome existing skill deficits and acquire fast trainable capabilities. This paper addresses the gap between demand and supply of software engineers by introducing an actionoriented and scenario-based didactical approach, which enables non-computer scientists to code. Therein, the learning content is provided in small tasks and embedded in learning factory scenarios. Therefore, different requirements for software engineers from the market side and from an academic viewpoint are analyzed and synthesized into an integrated, yet condensed skills catalogue. This enables the development of training and education units that focus on the most important skills demanded on the market. To achieve this objective, individual learning scenarios are developed. Of course, proper basic skills in coding cannot be learned over night but software programming is also no sorcery.
Um in der digitalisierten Wirtschaft mitzuspielen, müssen Unternehmen, Markt und insbesondere Kunden detailliert verstanden werden. Neben den „Big Playern“ aus dem Silicon Valley sieht der deutsche Mittelstand, der zu großen Teilen noch auf gewachsenen IT-Infrastrukturen und Prozessen agiert, oft alt aus. Um in den nächsten Jahren nicht gänzlich abgehängt zu werden, ist ein Umbruch notwendig. Sowohl Leistungserstellungsprozesse als auch Leistungsangebot müssen transparent und datenbasiert ausgerichtet werden. Nur so können Geschäftsvorfälle, das Marktgeschehen sowie Handeln der Akteure integrativ bewertet und fundierte Entscheidungen getroffen werden. In diesem Beitrag wird das Konzept der Data-Driven Organization vorgestellt und aufgezeigt, wie Unternehmen den eigenen Analyticsreifegrad ermitteln und in einem iterativen Transformationsprozess steigern können.
Machine learning for improvement of thermal conditions inside a hybrid ventilated animal building
(2021)
In buildings with hybrid ventilation, natural ventilation opening positions (windows), mechanical ventilation rates, heating, and cooling are manipulated to maintain desired thermal conditions. The indoor temperature is regulated solely by ventilation (natural and mechanical) when the external conditions are favorable to save external heating and cooling energy. The ventilation parameters are determined by a rule-based control scheme, which is not optimal. This study proposes a methodology to enable real-time optimum control of ventilation parameters. We developed offline prediction models to estimate future thermal conditions from the data collected from building in operation. The developed offline model is then used to find the optimal controllable ventilation parameters in real-time to minimize the setpoint deviation in the building. With the proposed methodology, the experimental building's setpoint deviation improved for 87% of time, on average, by 0.53 degrees C compared to the current deviations.
In virtual collaboration at the workplace, a growing number of teams apply supportive conversational agents (CAs). They take on different work-related tasks for teams and single users such as scheduling meetings or stimulating creativity. Previous research merely focused on these positive aspects of introducing CAs at the workplace, omitting ethical challenges faced by teams using these often artificial intelligence (AI)-enabled technologies. Thus, on the one hand, CAs can present themselves as benevolent teammates, but on the other hand, they can collect user data, reduce worker autonomy, or foster social isolation by their service. In this work, we conducted 15 expert interviews with senior researchers from the fields of ethics, collaboration, and computer science in order to derive ethical guidelines for introducing CAs in virtual team collaboration. We derived 14 guidelines and seven research questions to pave the way for future research on the dark sides of human–agent interaction in organizations.
Graphs play an important role in many areas of Computer Science. In particular, our work is motivated by model-driven software development and by graph databases. For this reason, it is very important to have the means to express and to reason about the properties that a given graph may satisfy. With this aim, in this paper we present a visual logic that allows us to describe graph properties, including navigational properties, i.e., properties about the paths in a graph. The logic is equipped with a deductive tableau method that we have proved to be sound and complete.
Observing inconsistent results in prior studies, this paper applies the elaboration likelihood model to investigate the impact of affective and cognitive cues embedded in social media messages on audience engagement during a political event. Leveraging a rich dataset in the context of the 2020 U.S. presidential elections containing more than 3 million tweets, we found the prominence of both cue types. For the overall sample, positivity and sentiment are negatively related to engagement. In contrast, the post-hoc sub-sample analysis of tweets from famous users shows that emotionally charged content is more engaging. The role of sentiment decreases when the number of followers grows and ultimately becomes insignificant for Twitter participants with a vast number of followers. Prosocial orientation (“we-talk”) is consistently associated with more likes, comments, and retweets in the overall sample and sub-samples.
Does a smile open all doors?
(2020)
Online photographs govern an individual’s choices across a variety of contexts. In sharing arrangements, facial appearance has been shown to affect the desire to collaborate, interest to explore a listing, and even willingness to pay for a stay. Because of the ubiquity of online images and their influence on social attitudes, it seems crucial to be able to control these aspects. The present study examines the effect of different photographic self-disclosures on the provider’s perceptions and willingness to accept a potential co-sharer. The findings from our experiment in the accommodation-sharing context suggest social attraction mediates the effect of photographic self-disclosures on willingness to host. Implications of the results for IS research and practitioners are discussed.
Despite the phenomenal growth of Big Data Analytics in the last few years, little research is done to explicate the relationship between Big Data Analytics Capability (BDAC) and indirect strategic value derived from such digital capabilities. We attempt to address this gap by proposing a conceptual model of the BDAC - Innovation relationship using dynamic capability theory. The work expands on BDAC business value research and extends the nominal research done on BDAC – innovation. We focus on BDAC's relationship with different innovation objects, namely product, business process, and business model innovation, impacting all value chain activities. The insights gained will stimulate academic and practitioner interest in explicating strategic value generated from BDAC and serve as a framework for future research on the subject