Refine
Has Fulltext
- no (8)
Document Type
- Article (8) (remove)
Language
- English (8)
Is part of the Bibliography
- yes (8)
Keywords
- heuristics (3)
- dynamic (2)
- dynamic programming (2)
- risk aversion (2)
- Body sensor networks (1)
- Competition (1)
- Dynamic pricing (1)
- Dynamic pricing and advertising (1)
- E-commerce (1)
- Feedback heuristics (1)
Institute
- Hasso-Plattner-Institut für Digital Engineering gGmbH (8) (remove)
Dynamic pricing is considered a possibility to gain an advantage over competitors in modern online markets. The past advancements in Reinforcement Learning (RL) provided more capable algorithms that can be used to solve pricing problems. In this paper, we study the performance of Deep Q-Networks (DQN) and Soft Actor Critic (SAC) in different market models. We consider tractable duopoly settings, where optimal solutions derived by dynamic programming techniques can be used for verification, as well as oligopoly settings, which are usually intractable due to the curse of dimensionality. We find that both algorithms provide reasonable results, while SAC performs better than DQN. Moreover, we show that under certain conditions, RL algorithms can be forced into collusion by their competitors without direct communication.
Indexes are essential for the efficient processing of database workloads. Proposed solutions for the relevant and challenging index selection problem range from metadata-based simple heuristics, over sophisticated multi-step algorithms, to approaches that yield optimal results. The main challenges are (i) to accurately determine the effect of an index on the workload cost while considering the interaction of indexes and (ii) a large number of possible combinations resulting from workloads containing many queries and massive schemata with possibly thousands of attributes. <br /> In this work, we describe and analyze eight index selection algorithms that are based on different concepts and compare them along different dimensions, such as solution quality, runtime, multi-column support, solution granularity, and complexity. In particular, we analyze the solutions of the algorithms for the challenging analytical Join Order, TPC-H, and TPC-DS benchmarks. Afterward, we assess strengths and weaknesses, infer insights for index selection in general and each approach individually, before we give recommendations on when to use which approach.
In many revenue management applications risk-averse decision-making is crucial. In dynamic settings, however, it is challenging to find the right balance between maximizing expected rewards and minimizing various kinds of risk. In existing approaches utility functions, chance constraints, or (conditional) value at risk considerations are used to influence the distribution of rewards in a preferred way. Nevertheless, common techniques are not flexible enough and typically numerically complex. In our model, we exploit the fact that a distribution is characterized by its mean and higher moments. We present a multi-valued dynamic programming heuristic to compute risk-sensitive feedback policies that are able to directly control the moments of future rewards. Our approach is based on recursive formulations of higher moments and does not require an extension of the state space. Finally, we propose a self-tuning algorithm, which allows to identify feedback policies that approximate predetermined (risk-sensitive) target distributions. We illustrate the effectiveness and the flexibility of our approach for different dynamic pricing scenarios. (C) 2020 Elsevier Ltd. All rights reserved.
Currently available wearables are usually based on a single sensor node with integrated capabilities for classifying different activities. The next generation of cooperative wearables could be able to identify not only activities, but also to evaluate them qualitatively using the data of several sensor nodes attached to the body, to provide detailed feedback for the improvement of the execution. Especially within the application domains of sports and health-care, such immediate feedback to the execution of body movements is crucial for (re-) learning and improving motor skills. To enable such systems for a broad range of activities, generalized approaches for human motion assessment within sensor networks are required. In this paper, we present a generalized trainable activity assessment chain (AAC) for the online assessment of periodic human activity within a wireless body area network. AAC evaluates the execution of separate movements of a prior trained activity on a fine-grained quality scale. We connect qualitative assessment with human knowledge by projecting the AAC on the hierarchical decomposition of motion performed by the human body as well as establishing the assessment on a kinematic evaluation of biomechanically distinct motion fragments. We evaluate AAC in a real-world setting and show that AAC successfully delimits the movements of correctly performed activity from faulty executions and provides detailed reasons for the activity assessment.
Challenges for self-driving database systems, which tune their physical design and configuration autonomously, are manifold: Such systems have to anticipate future workloads, find robust configurations efficiently, and incorporate knowledge gained by previous actions into later decisions. We present a component-based framework for self-driving database systems that enables database integration and development of self-managing functionality with low overhead by relying on separation of concerns. By keeping the components of the framework reusable and exchangeable, experiments are simplified, which promotes further research in that area. Moreover, to optimize multiple mutually dependent features, e.g., index selection and compression configurations, we propose a linear programming (LP) based algorithm to derive an efficient tuning order automatically. Afterwards, we demonstrate the applicability and scalability of our approach with reproducible examples.
In many businesses, firms are selling different types of products, which share mutual substitution effects in demand. To compute effective pricing strategies is challenging as the sales probabilities of each of a firm's products can also be affected by the prices of potential substitutes. In this paper, we analyze stochastic dynamic multi-product pricing models for the sale of perishable goods. To circumvent the limitations of time-consuming optimal solutions for highly complex models, we propose different relaxation techniques, which allow to reduce the size of critical model components, such as the state space, the action space, or the set of potential sales events. Our heuristics are able to decrease the size of those components by forming corresponding clusters and using subsets of representative elements. Using numerical examples, we verify that our heuristics make it possible to dramatically reduce the computation time while still obtaining close-to-optimal expected profits. Further, we show that our heuristics are (i) flexible, (ii) scalable, and (iii) can be arbitrarily combined in a mutually supportive way.
In dynamic decision problems, it is challenging to find the right balance between maximizing expected rewards and minimizing risks. In this paper, we consider NP-hard mean-variance (MV) optimization problems in Markov decision processes with a finite time horizon. We present a heuristic approach to solve MV problems, which is based on state-dependent risk aversion and efficient dynamic programming techniques. Our approach can also be applied to mean-semivariance (MSV) problems, which particularly focus on the downside risk. We demonstrate the applicability and the effectiveness of our heuristic for dynamic pricing applications. Using reproducible examples, we show that our approach outperforms existing state-of-the-art benchmark models for MV and MSV problems while also providing competitive runtimes. Further, compared to models based on constant risk levels, we find that state-dependent risk aversion allows to more effectively intervene in case sales processes deviate from their planned paths. Our concepts are domain independent, easy to implement and of low computational complexity.
We examine a special class of dynamic pricing and advertising models for the sale of perishable goods, including marginal unit costs and inventory holding costs. The time horizon is assumed to be finite and we allow several model parameters to be dependent on time. For the stochastic version of the model, we derive closed-form expressions of the value function as well as of the optimal pricing and advertising policy in feedback form. Moreover, we show that for small unit shares, the model converges to a deterministic version of the problem, whose explicit solution is characterized by an overage and an underage case. We quantify the close relationship between the open-loop solution of the deterministic model and the expected evolution of optimally controlled stochastic sales processes. For both models, we derive sensitivity results. We find that in the case of positive holding costs, on average, optimal prices increase in time and advertising rates decrease. Furthermore, we analytically verify the excellent quality of optimal feedback policies of deterministic models applied in stochastic models. (C) 2015 Elsevier B.V. All rights reserved.