Return to Colloquia & Seminar listing
Relaxation schemes for min max generalization in batch mode reinforcement learning.
PDE and Applied Math SeminarSpeaker: | Prof. Quentin Louveaux, Univ. Liege Belgique + UC Davis |
Location: | 3106 MSB |
Start time: | Tue, May 6 2014, 3:10PM |
Reinforcement learning is a control paradigm where an agent tries to interact with its environment in order to maximize a reward. We assume that the space in which the agent lies is a discrete-time Markov process whose only knowledge is given through a batch collection of trajectories. In this talk, we are interested in providing a worst-case performance guarantee of a given policy. It was shown that such a guarantee can be modeled through a quadratically constrained quadratic program. We show that such a problem is NP-hard and we propose two tractable relaxation schemes to tackle it. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, can also be solved in polynomial time. We also theoretically prove and empirically illustrate that both relaxation schemes provide better results than those previously proposed in the literature. This is a joint work with Raphaël Fonteneau, Bernard Boigelot and Damien Ernst.