Refine
Has Fulltext
- no (2)
Year of publication
- 2022 (2) (remove)
Document Type
- Article (2)
Language
- English (2)
Is part of the Bibliography
- yes (2) (remove)
Keywords
- Competition (1)
- Dynamic pricing (1)
- E-commerce (1)
- Markov decision process; (1)
- Price collusion (1)
- Reinforcement learning (1)
- dynamic pricing (1)
- dynamic programming (1)
- heuristics (1)
- mean-variance optimization (1)
Institute
- Hasso-Plattner-Institut für Digital Engineering gGmbH (2) (remove)
In dynamic decision problems, it is challenging to find the right balance between maximizing expected rewards and minimizing risks. In this paper, we consider NP-hard mean-variance (MV) optimization problems in Markov decision processes with a finite time horizon. We present a heuristic approach to solve MV problems, which is based on state-dependent risk aversion and efficient dynamic programming techniques. Our approach can also be applied to mean-semivariance (MSV) problems, which particularly focus on the downside risk. We demonstrate the applicability and the effectiveness of our heuristic for dynamic pricing applications. Using reproducible examples, we show that our approach outperforms existing state-of-the-art benchmark models for MV and MSV problems while also providing competitive runtimes. Further, compared to models based on constant risk levels, we find that state-dependent risk aversion allows to more effectively intervene in case sales processes deviate from their planned paths. Our concepts are domain independent, easy to implement and of low computational complexity.
Dynamic pricing is considered a possibility to gain an advantage over competitors in modern online markets. The past advancements in Reinforcement Learning (RL) provided more capable algorithms that can be used to solve pricing problems. In this paper, we study the performance of Deep Q-Networks (DQN) and Soft Actor Critic (SAC) in different market models. We consider tractable duopoly settings, where optimal solutions derived by dynamic programming techniques can be used for verification, as well as oligopoly settings, which are usually intractable due to the curse of dimensionality. We find that both algorithms provide reasonable results, while SAC performs better than DQN. Moreover, we show that under certain conditions, RL algorithms can be forced into collusion by their competitors without direct communication.