Treffer: Pathwise uniform value in gambling houses and Partially Observable Markov Decision Processes
Title:
Pathwise uniform value in gambling houses and Partially Observable Markov Decision Processes
Authors:
Contributors:
Centre d'économie de la Sorbonne (CES), Université Paris 1 Panthéon-Sorbonne (UP1)-Centre National de la Recherche Scientifique (CNRS), Paris School of Economics (PSE), Université Paris 1 Panthéon-Sorbonne (UP1)-École normale supérieure - Paris (ENS-PSL), Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-École des hautes études en sciences sociales (EHESS)-École nationale des ponts et chaussées (ENPC)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Groupe de recherche en économie mathématique et quantitative (GREMAQ), Université Toulouse Capitole (UT Capitole), Communauté d'universités et établissements de Toulouse (Comue de Toulouse)-Communauté d'universités et établissements de Toulouse (Comue de Toulouse)-Institut National de la Recherche Agronomique (INRA)-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS), ANR-10-BLAN-0112,JEUDY,Comportement en temps long pour les jeux dynamiques à temps continu et discret(2010)
Source:
https://hal.science/hal-01302567 ; 2016.
Publisher Information:
CCSD
Publication Year:
2016
Subject Terms:
Document Type:
Report
report
Language:
English
Relation:
https://hal.science/hal-01395429v1; info:eu-repo/semantics/altIdentifier/arxiv/1505.07495; ARXIV: 1505.07495
Availability:
Rights:
info:eu-repo/semantics/OpenAccess
Accession Number:
edsbas.5220560
Database:
BASE
Weitere Informationen
In several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a robust notion of value for the infinitely repeated problem, namely the pathwise uniform value. This solves two open problems. First, this shows that for any ǫ > 0, the decision-maker has a pure strategy σ which is ǫ-optimal in any n-stage game, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, the strategy σ can be chosen such that under the long-run average payoff criterion, the decision-maker has more than the limit of the n-stage values.