Return to Colloquia & Seminar listing
Relaxation schemes for min max generalization in batch mode reinforcement learning.
PDE & Applied Mathematics| Speaker: | Prof. Quentin Louveaux, Univ. Liege Belgique + UC Davis |
| Location: | 3106 MSB |
| Start time: | Tue, May 6 2014, 3:10PM |
Description
Reinforcement learning is a control paradigm where an agent tries to interact with its environment
in order to maximize a reward. We assume that the space in which the agent lies is a discrete-time Markov
process whose only knowledge is given through a batch collection of trajectories. In this talk, we are interested
in providing a worst-case performance guarantee of a given policy. It was shown that such a guarantee
can be modeled through a quadratically constrained quadratic program. We show that such a problem is
NP-hard and we propose two tractable relaxation schemes to tackle it. The first relaxation scheme works by dropping
some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based
on a Lagrangian relaxation where all constraints are dualized, can also be solved in polynomial time. We also
theoretically prove and empirically illustrate that both relaxation schemes provide better results than those previously
proposed in the literature.
This is a joint work with Raphaël Fonteneau, Bernard Boigelot and Damien Ernst.
