Treffer: Pathwise uniform value in gambling houses and Partially Observable Markov Decision Processes

Title:
Pathwise uniform value in gambling houses and Partially Observable Markov Decision Processes
Contributors:
Centre d'économie de la Sorbonne (CES), Université Paris 1 Panthéon-Sorbonne (UP1)-Centre National de la Recherche Scientifique (CNRS), Paris School of Economics (PSE), Université Paris 1 Panthéon-Sorbonne (UP1)-École normale supérieure - Paris (ENS-PSL), Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-École des hautes études en sciences sociales (EHESS)-École nationale des ponts et chaussées (ENPC)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Groupe de recherche en économie mathématique et quantitative (GREMAQ), Université Toulouse Capitole (UT Capitole), Communauté d'universités et établissements de Toulouse (Comue de Toulouse)-Communauté d'universités et établissements de Toulouse (Comue de Toulouse)-Institut National de la Recherche Agronomique (INRA)-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS), ANR-10-BLAN-0112,JEUDY,Comportement en temps long pour les jeux dynamiques à temps continu et discret(2010)
Publisher Information:
CCSD
Publication Year:
2016
Document Type:
Report report
Language:
English
Relation:
https://hal.science/hal-01395429v1; info:eu-repo/semantics/altIdentifier/arxiv/1505.07495; ARXIV: 1505.07495
Rights:
info:eu-repo/semantics/OpenAccess
Accession Number:
edsbas.5220560
Database:
BASE

Weitere Informationen

In several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a robust notion of value for the infinitely repeated problem, namely the pathwise uniform value. This solves two open problems. First, this shows that for any ǫ > 0, the decision-maker has a pure strategy σ which is ǫ-optimal in any n-stage game, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, the strategy σ can be chosen such that under the long-run average payoff criterion, the decision-maker has more than the limit of the n-stage values.