TY  - JOUR
A1  - Kastius, Alexander
A1  - Schlosser, Rainer
T1  - Dynamic pricing under competition using reinforcement learning
JF  - Journal of revenue and pricing management
N2  - Dynamic pricing is considered a possibility to gain an advantage over competitors in modern online markets. The past advancements in Reinforcement Learning (RL) provided more capable algorithms that can be used to solve pricing problems. In this paper, we study the performance of Deep Q-Networks (DQN) and Soft Actor Critic (SAC) in different market models. We consider tractable duopoly settings, where optimal solutions derived by dynamic programming techniques can be used for verification, as well as oligopoly settings, which are usually intractable due to the curse of dimensionality. We find that both algorithms provide reasonable results, while SAC performs better than DQN. Moreover, we show that under certain conditions, RL algorithms can be forced into collusion by their competitors without direct communication.
KW  - Dynamic pricing
KW  - Competition
KW  - Reinforcement learning
KW  - E-commerce
KW  - Price collusion
Y1  - 2021
U6  - https://doi.org/10.1057/s41272-021-00285-3
SN  - 1476-6930
SN  - 1477-657X
VL  - 21
IS  - 1
SP  - 50
EP  - 63
PB  - Springer Nature Switzerland AG
CY  - Cham
ER  - 
TY  - JOUR
A1  - Balta Beylergil, Sinem
A1  - Beck, Anne
A1  - Deserno, Lorenz
A1  - Lorenz, Robert C.
A1  - Rapp, Michael Armin
A1  - Schlagenhauf, Florian
A1  - Heinz, Andreas
A1  - Obermayer, Klaus
T1  - Dorsolateral prefrontal cortex contributes to the impaired behavioral adaptation in alcohol dependence
JF  - NeuroImage: Clinical : a journal of diseases affecting the nervous system
N2  - Substance-dependent individuals often lack the ability to adjust decisions flexibly in response to the changes in reward contingencies. Prediction errors (PEs) are thought to mediate flexible decision-making by updating the reward values associated with available actions. In this study, we explored whether the neurobiological correlates of PEs are altered in alcohol dependence. Behavioral, and functional magnetic resonance imaging (fMRI) data were simultaneously acquired from 34 abstinent alcohol-dependent patients (ADP) and 26 healthy controls (HC) during a probabilistic reward-guided decision-making task with dynamically changing reinforcement contingencies. A hierarchical Bayesian inference method was used to fit and compare learning models with different assumptions about the amount of task-related information subjects may have inferred during the experiment. Here, we observed that the best-fitting model was a modified Rescorla-Wagner type model, the “double-update” model, which assumes that subjects infer the knowledge that reward contingencies are anti-correlated, and integrate both actual and hypothetical outcomes into their decisions. Moreover, comparison of the best-fitting model's parameters showed that ADP were less sensitive to punishments compared to HC. Hence, decisions of ADP after punishments were loosely coupled with the expected reward values assigned to them. A correlation analysis between the model-generated PEs and the fMRI data revealed a reduced association between these PEs and the BOLD activity in the dorsolateral prefrontal cortex (DLPFC) of ADP. A hemispheric asymmetry was observed in the DLPFC when positive and negative PE signals were analyzed separately. The right DLPFC activity in ADP showed a reduced correlation with positive PEs. On the other hand, ADP, particularly the patients with high dependence severity, recruited the left DLPFC to a lesser extent than HC for processing negative PE signals. These results suggest that the DLPFC, which has been linked to adaptive control of action selection, may play an important role in cognitive inflexibility observed in alcohol dependence when reinforcement contingencies change. Particularly, the left DLPFC may contribute to this impaired behavioral adaptation, possibly by impeding the extinction of the actions that no longer lead to a reward.
KW  - Alcohol dependence
KW  - Prediction error
KW  - Reinforcement learning
KW  - Reversal learning
KW  - Dorsolateral prefrontal cortex
KW  - Decision-making
Y1  - 2017
U6  - https://doi.org/10.1016/j.nicl.2017.04.010
SN  - 2213-1582
VL  - 15
SP  - 80
EP  - 94
PB  - Elsevier
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Sebold, Miriam Hannah
A1  - Garbusow, Maria
A1  - Jetzschmann, P.
A1  - Schad, Daniel
A1  - Nebe, S.
A1  - Schlagenhauf, Florian
A1  - Heinz, A.
A1  - Rapp, Michael Armin
A1  - Romanczuk-Seiferth, Nina
T1  - Reward and avoidance learning in the context of aversive environments and possible implications for depressive symptoms
JF  - Psychopharmacology
N2  - Background Aversive stimuli in the environment influence human actions. This includes valence-dependent influences on action selection, e.g., increased avoidance but decreased approach behavior. However, it is yet unclear how aversive stimuli interact with complex learning and decision-making in the reward and avoidance domain. Moreover, the underlying computational mechanisms of these decision-making biases are unknown. Methods To elucidate these mechanisms, 54 healthy young male subjects performed a two-step sequential decision-making task, which allows to computationally model different aspects of learning, e.g., model-free, habitual, and model-based, goal-directed learning. We used a within-subject design, crossing task valence (reward vs. punishment learning) with emotional context (aversive vs. neutral background stimuli). We analyzed choice data, applied a computational model, and performed simulations. Results Whereas model-based learning was not affected, aversive stimuli interacted with model-free learning in a way that depended on task valence. Thus, aversive stimuli increased model-free avoidance learning but decreased model-free reward learning. The computational model confirmed this effect: the parameter lambda that indicates the influence of reward prediction errors on decision values was increased in the punishment condition but decreased in the reward condition when aversive stimuli were present. Further, by using the inferred computational parameters to simulate choice data, our effects were captured. Exploratory analyses revealed that the observed biases were associated with subclinical depressive symptoms. Conclusion Our data show that aversive environmental stimuli affect complex learning and decision-making, which depends on task valence. Further, we provide a model of the underlying computations of this affective modulation. Finally, our finding of increased decision-making biases in subjects reporting subclinical depressive symptoms matches recent reports of amplified Pavlovian influences on action selection in depression and suggests a potential vulnerability factor for mood disorders. We discuss our findings in the light of the involvement of the neuromodulators serotonin and dopamine.
KW  - Reward learning
KW  - Avoidance learning
KW  - Reinforcement learning
KW  - Computational psychiatry
KW  - Decision-making
KW  - Affective modulation
KW  - Depression symptoms
Y1  - 2019
U6  - https://doi.org/10.1007/s00213-019-05299-9
SN  - 0033-3158
SN  - 1432-2072
VL  - 236
IS  - 8
SP  - 2437
EP  - 2449
PB  - Springer
CY  - New York
ER  - 
TY  - JOUR
A1  - Sebold, Miriam Hannah
A1  - Nebe, Stephan
A1  - Garbusow, Maria
A1  - Guggenmos, Matthias
A1  - Schad, Daniel
A1  - Beck, Anne
A1  - Kuitunen-Paul, Sören
A1  - Sommer, Christian
A1  - Frank, Robin
A1  - Neu, Peter
A1  - Zimmermann, Ulrich S.
A1  - Rapp, Michael Armin
A1  - Smolka, Michael N.
A1  - Huys, Quentin J. M.
A1  - Schlagenhauf, Florian
A1  - Heinz, Andreas
T1  - When Habits Are Dangerous: Alcohol Expectancies and Habitual Decision Making Predict Relapse in Alcohol Dependence
JF  - Biological psychiatry : a journal of psychiatric neuroscience and therapeutics ; a publication of the Society of Biological Psychiatry
N2  - BACKGROUND: Addiction is supposedly characterized by a shift from goal-directed to habitual decision making, thus facilitating automatic drug intake. The two-step task allows distinguishing between these mechanisms by computationally modeling goal-directed and habitual behavior as model-based and model-free control. In addicted patients, decision making may also strongly depend upon drug-associated expectations. Therefore, we investigated model-based versus model-free decision making and its neural correlates as well as alcohol expectancies in alcohol-dependent patients and healthy controls and assessed treatment outcome in patients. METHODS: Ninety detoxified, medication-free, alcohol-dependent patients and 96 age-and gender-matched control subjects underwent functional magnetic resonance imaging during the two-step task. Alcohol expectancies were measured with the Alcohol Expectancy Questionnaire. Over a follow-up period of 48 weeks, 37 patients remained abstinent and 53 patients relapsed as indicated by the Alcohol Timeline Followback method. RESULTS: Patients who relapsed displayed reduced medial prefrontal cortex activation during model-based decision making. Furthermore, high alcohol expectancies were associated with low model-based control in relapsers, while the opposite was observed in abstainers and healthy control subjects. However, reduced model-based control per se was not associated with subsequent relapse. CONCLUSIONS: These findings suggest that poor treatment outcome in alcohol dependence does not simply result from a shift from model-based to model-free control but is instead dependent on the interaction between high drug expectancies and low model-based decision making. Reduced model-based medial prefrontal cortex signatures in those who relapse point to a neural correlate of relapse risk. These observations suggest that therapeutic interventions should target subjective alcohol expectancies.
KW  - Alcohol dependence
KW  - Alcohol expectancy
KW  - Goal-directed control
KW  - Medial prefrontal cortex
KW  - Reinforcement learning
KW  - Treatment outcome
Y1  - 2017
U6  - https://doi.org/10.1016/j.biopsych.2017.04.019
SN  - 0006-3223
SN  - 1873-2402
VL  - 82
SP  - 847
EP  - 856
PB  - Elsevier
CY  - New York
ER  - 
TY  - JOUR
A1  - Friedel, Eva
A1  - Schlagenhauf, Florian
A1  - Beck, Anne
A1  - Dolan, Raymond J.
A1  - Huys, Quentin J. M.
A1  - Rapp, Michael Armin
A1  - Heinz, Andreas
T1  - The effects of life stress and neural learning signals on fluid intelligence
JF  - European archives of psychiatry and clinical neuroscience : official organ of the German Society for Biological Psychiatry
N2  - Fluid intelligence (fluid IQ), defined as the capacity for rapid problem solving and behavioral adaptation, is known to be modulated by learning and experience. Both stressful life events (SLES) and neural correlates of learning [specifically, a key mediator of adaptive learning in the brain, namely the ventral striatal representation of prediction errors (PE)] have been shown to be associated with individual differences in fluid IQ. Here, we examine the interaction between adaptive learning signals (using a well-characterized probabilistic reversal learning task in combination with fMRI) and SLES on fluid IQ measures. We find that the correlation between ventral striatal BOLD PE and fluid IQ, which we have previously reported, is quantitatively modulated by the amount of reported SLES. Thus, after experiencing adversity, basic neuronal learning signatures appear to align more closely with a general measure of flexible learning (fluid IQ), a finding complementing studies on the effects of acute stress on learning. The results suggest that an understanding of the neurobiological correlates of trait variables like fluid IQ needs to take socioemotional influences such as chronic stress into account.
KW  - Reinforcement learning
KW  - Prediction error signal
KW  - Ventral striatum
KW  - Stress
KW  - Intelligence
Y1  - 2015
U6  - https://doi.org/10.1007/s00406-014-0519-3
SN  - 0940-1334
SN  - 1433-8491
VL  - 265
IS  - 1
SP  - 35
EP  - 43
PB  - Springer
CY  - Heidelberg
ER  - 
TY  - JOUR
A1  - Sebold, Miriam Hannah
A1  - Deserno, Lorenz
A1  - Nebe, Stefan
A1  - Schad, Daniel
A1  - Garbusow, Maria
A1  - Haegele, Claudia
A1  - Keller, Juergen
A1  - Juenger, Elisabeth
A1  - Kathmann, Norbert
A1  - Smolka, Michael N.
A1  - Rapp, Michael Armin
A1  - Schlagenhauf, Florian
A1  - Heinz, Andreas
A1  - Huys, Quentin J. M.
T1  - Model-based and model-free decisions in alcohol dependence
JF  - Neuropsychobiology : international journal of experimental and clinical research in biological psychiatry, pharmacopsychiatry, Biological Psychology/Pharmacopsychology and Pharmacoelectroencephalography
N2  - Background: Human and animal work suggests a shift from goal-directed to habitual decision-making in addiction. However, the evidence for this in human alcohol dependence is as yet inconclusive. Methods: Twenty-six healthy controls and 26 recently detoxified alcohol-dependent patients underwent behavioral testing with a 2-step task designed to disentangle goal-directed and habitual response patterns. Results: Alcohol-dependent patients showed less evidence of goal-directed choices than healthy controls, particularly after losses. There was no difference in the strength of the habitual component. The group differences did not survive controlling for performance on the Digit Symbol Substitution Task. Conclusion: Chronic alcohol use appears to selectively impair goal-directed function, rather than promoting habitual responding. It appears to do so particularly after nonrewards, and this may be mediated by the effects of alcohol on more general cognitive functions subserved by the prefrontal cortex.
KW  - Alcohol dependence
KW  - Decision-making
KW  - Reinforcement learning
KW  - Dopamine
KW  - Computational psychiatry
Y1  - 2014
U6  - https://doi.org/10.1159/000362840
SN  - 0302-282X
SN  - 1423-0224
VL  - 70
IS  - 2
SP  - 122
EP  - 131
PB  - Karger
CY  - Basel
ER  -