Refine
Year of publication
- 2020 (41) (remove)
Document Type
- Article (39)
- Doctoral Thesis (1)
- Review (1)
Language
- English (41) (remove)
Is part of the Bibliography
- yes (41)
Keywords
- machine learning (3)
- performance (3)
- run time analysis (3)
- theory (3)
- Algorithms (2)
- data wrangling (2)
- dynamic (2)
- genetic programming (2)
- heuristics (2)
- scalability (2)
Institute
- Hasso-Plattner-Institut für Digital Engineering gGmbH (41) (remove)
Technology pivots were designed to help digital startups make adjustments to the technology underpinning their products and services. While academia and the media make liberal use of the term "technology pivot," they rarely align themselves to Ries' foundational conceptualization. Recent research suggests that a more granulated conceptualization of technology pivots is required. To scientifically derive a comprehensive conceptualization, we conduct a Delphi study with a panel of 38 experts drawn from academia and practice to explore their understanding of "technology pivots." Our study thus makes an important contribution to advance the seminal work by Ries on technology pivots.
Background:
Childhood and adolescence are critical stages of life for mental health and well-being. Schools are a key setting for mental health promotion and illness prevention. One in five children and adolescents have a mental disorder, about half of mental disorders beginning before the age of 14. Beneficial and explainable artificial intelligence can replace current paper- based and online approaches to school mental health surveys. This can enhance data acquisition, interoperability, data driven analysis, trust and compliance. This paper presents a model for using chatbots for non-obtrusive data collection and supervised machine learning models for data analysis; and discusses ethical considerations pertaining to the use of these models.
Methods:
For data acquisition, the proposed model uses chatbots which interact with students. The conversation log acts as the source of raw data for the machine learning. Pre-processing of the data is automated by filtering for keywords and phrases.
Existing survey results, obtained through current paper-based data collection methods, are evaluated by domain experts (health professionals). These can be used to create a test dataset to validate the machine learning models. Supervised learning
can then be deployed to classify specific behaviour and mental health patterns.
Results:
We present a model that can be used to improve upon current paper-based data collection and manual data analysis methods. An open-source GitHub repository contains necessary tools and components of this model. Privacy is respected through
rigorous observance of confidentiality and data protection requirements. Critical reflection on these ethics and law aspects is included in the project.
Conclusions:
This model strengthens mental health surveillance in schools. The same tools and components could be applied to other public health data. Future extensions of this model could also incorporate unsupervised learning to find clusters and patterns
of unknown effects.
In the stable marriage problem, a set of men and a set of women are given, each of whom has a strictly ordered preference list over the acceptable agents in the opposite class. A matching is called stable if it is not blocked by any pair of agents, who mutually prefer each other to their respective partner. Ties in the preferences allow for three different definitions for a stable matching: weak, strong and super-stability. Besides this, acceptable pairs in the instance can be restricted in their ability of blocking a matching or being part of it, which again generates three categories of restrictions on acceptable pairs. Forced pairs must be in a stable matching, forbidden pairs must not appear in it, and lastly, free pairs cannot block any matching.
Our computational complexity study targets the existence of a stable solution for each of the three stability definitions, in the presence of each of the three types of restricted pairs. We solve all cases that were still open. As a byproduct, we also derive that the maximum size weakly stable matching problem is hard even in very dense graphs, which may be of independent interest.
While many optimization problems work with a fixed number of decision variables and thus a fixed-length representation of possible solutions, genetic programming (GP) works on variable-length representations. A naturally occurring problem is that of bloat, that is, the unnecessary growth of solution lengths, which may slow down the optimization process. So far, the mathematical runtime analysis could not deal well with bloat and required explicit assumptions limiting bloat.
In this paper, we provide the first mathematical runtime analysis of a GP algorithm that does not require any assumptions on the bloat. Previous performance guarantees were only proven conditionally for runs in which no strong bloat occurs. Together with improved analyses for the case with bloat restrictions our results show that such assumptions on the bloat are not necessary and that the algorithm is efficient without explicit bloat control mechanism.
More specifically, we analyzed the performance of the (1 + 1) GP on the two benchmark functions ORDER and MAJORITY. When using lexicographic parsimony pressure as bloat control, we show a tight runtime estimate of O(T-init + nlogn) iterations both for ORDER and MAJORITY. For the case without bloat control, the bounds O(T-init logT(i)(nit) + n(logn)(3)) and Omega(T-init + nlogn) (and Omega(T-init log T-init) for n = 1) hold for MAJORITY(1).
An unceasing problem of our prevailing society is the fair division of goods. The problem of proportional cake cutting focuses on dividing a heterogeneous and divisible resource, the cake, among n players who value pieces according to their own measure function. The goal is to assign each player a not necessarily connected part of the cake that the player evaluates at least as much as her proportional share. <br /> In this article, we investigate the problem of proportional division with unequal shares, where each player is entitled to receive a predetermined portion of the cake. Our main contribution is threefold. First we present a protocol for integer demands, which delivers a proportional solution in fewer queries than all known protocols. By giving a matching lower bound, we then show that our protocol is asymptotically the fastest possible. Finally, we turn to irrational demands and solve the proportional cake cutting problem by reducing it to the same problem with integer demands only. All results remain valid in a highly general cake cutting model, which can be of independent interest.
Estimation-of-distribution algorithms (EDAs) are randomized search heuristics that create a probabilistic model of the solution space, which is updated iteratively, based on the quality of the solutions sampled according to the model. As previous works show, this iteration-based perspective can lead to erratic updates of the model, in particular, to bit-frequencies approaching a random boundary value. In order to overcome this problem, we propose a new EDA based on the classic compact genetic algorithm (cGA) that takes into account a longer history of samples and updates its model only with respect to information which it classifies as statistically significant. We prove that this significance-based cGA (sig-cGA) optimizes the commonly regarded benchmark functions OneMax (OM), LeadingOnes, and BinVal all in quasilinear time, a result shown for no other EDA or evolutionary algorithm so far. For the recently proposed stable compact genetic algorithm-an EDA that tries to prevent erratic model updates by imposing a bias to the uniformly distributed model-we prove that it optimizes OM only in a time exponential in its hypothetical population size. Similarly, we show that the convex search algorithm cannot optimize OM in polynomial time.
Challenges for self-driving database systems, which tune their physical design and configuration autonomously, are manifold: Such systems have to anticipate future workloads, find robust configurations efficiently, and incorporate knowledge gained by previous actions into later decisions. We present a component-based framework for self-driving database systems that enables database integration and development of self-managing functionality with low overhead by relying on separation of concerns. By keeping the components of the framework reusable and exchangeable, experiments are simplified, which promotes further research in that area. Moreover, to optimize multiple mutually dependent features, e.g., index selection and compression configurations, we propose a linear programming (LP) based algorithm to derive an efficient tuning order automatically. Afterwards, we demonstrate the applicability and scalability of our approach with reproducible examples.
In many businesses, firms are selling different types of products, which share mutual substitution effects in demand. To compute effective pricing strategies is challenging as the sales probabilities of each of a firm's products can also be affected by the prices of potential substitutes. In this paper, we analyze stochastic dynamic multi-product pricing models for the sale of perishable goods. To circumvent the limitations of time-consuming optimal solutions for highly complex models, we propose different relaxation techniques, which allow to reduce the size of critical model components, such as the state space, the action space, or the set of potential sales events. Our heuristics are able to decrease the size of those components by forming corresponding clusters and using subsets of representative elements. Using numerical examples, we verify that our heuristics make it possible to dramatically reduce the computation time while still obtaining close-to-optimal expected profits. Further, we show that our heuristics are (i) flexible, (ii) scalable, and (iii) can be arbitrarily combined in a mutually supportive way.
In many revenue management applications risk-averse decision-making is crucial. In dynamic settings, however, it is challenging to find the right balance between maximizing expected rewards and minimizing various kinds of risk. In existing approaches utility functions, chance constraints, or (conditional) value at risk considerations are used to influence the distribution of rewards in a preferred way. Nevertheless, common techniques are not flexible enough and typically numerically complex. In our model, we exploit the fact that a distribution is characterized by its mean and higher moments. We present a multi-valued dynamic programming heuristic to compute risk-sensitive feedback policies that are able to directly control the moments of future rewards. Our approach is based on recursive formulations of higher moments and does not require an extension of the state space. Finally, we propose a self-tuning algorithm, which allows to identify feedback policies that approximate predetermined (risk-sensitive) target distributions. We illustrate the effectiveness and the flexibility of our approach for different dynamic pricing scenarios. (C) 2020 Elsevier Ltd. All rights reserved.
TPC-H continues to be the most widely used benchmark for relational OLAP systems. It poses a number of challenges, also known as "choke points", which database systems have to solve in order to achieve good benchmark results. Examples include joins across multiple tables, correlated subqueries, and correlations within the TPC-H data set. Knowing the impact of such optimizations helps in developing optimizers as well as in interpreting TPC-H results across database systems.
This paper provides a systematic analysis of choke points and their optimizations. It complements previous work on TPC-H choke points by providing a quantitative discussion of their relevance. It focuses on eleven choke points where the optimizations are beneficial independently of the database system. Of these, the flattening of subqueries and the placement of predicates have the biggest impact. Three queries (Q2, Q17, and Q21) are strongly ifluenced by the choice of an efficient query plan; three others (Q1, Q13, and Q18) are less influenced by plan optimizations and more dependent on an efficient execution engine.