Treffer: Pathwise uniform value in gambling houses and Partially Observable Markov Decision Processes

Title:

Pathwise uniform value in gambling houses and Partially Observable Markov Decision Processes

Authors:

Contributors:

Centre d'économie de la Sorbonne (CES), Université Paris 1 Panthéon-Sorbonne (UP1)-Centre National de la Recherche Scientifique (CNRS), Paris School of Economics (PSE), Université Paris 1 Panthéon-Sorbonne (UP1)-École normale supérieure - Paris (ENS-PSL), Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-École des hautes études en sciences sociales (EHESS)-École nationale des ponts et chaussées (ENPC)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Groupe de recherche en économie mathématique et quantitative (GREMAQ), Université Toulouse Capitole (UT Capitole), Communauté d'universités et établissements de Toulouse (Comue de Toulouse)-Communauté d'universités et établissements de Toulouse (Comue de Toulouse)-Institut National de la Recherche Agronomique (INRA)-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS), ANR-10-BLAN-0112,JEUDY,Comportement en temps long pour les jeux dynamiques à temps continu et discret(2010)

Source:

https://hal.science/hal-01302567 ; 2016.

Publisher Information:

CCSD

Publication Year:

2016

Subject Terms:

Dynamic programming, Markov decision processes, Partial Observation, Uniform value, Long-run average payoff, [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], [SHS.ECO]Humanities and Social Sciences/Economics and Finance

Document Type:

Report report

Language:

English

Relation:

https://hal.science/hal-01395429v1; info:eu-repo/semantics/altIdentifier/arxiv/1505.07495; ARXIV: 1505.07495

Availability:

https://hal.science/hal-01302567
https://hal.science/hal-01302567v1/document
https://hal.science/hal-01302567v1/file/1505.07495v2.pdf

Rights:

info:eu-repo/semantics/OpenAccess

Accession Number:

edsbas.5220560

Database:

BASE

Weitere Informationen

In several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a robust notion of value for the infinitely repeated problem, namely the pathwise uniform value. This solves two open problems. First, this shows that for any ǫ > 0, the decision-maker has a pure strategy σ which is ǫ-optimal in any n-stage game, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, the strategy σ can be chosen such that under the long-run average payoff criterion, the decision-maker has more than the limit of the n-stage values.

Treffer: Pathwise uniform value in gambling houses and Partially Observable Markov Decision Processes

Weitere Informationen

Links

Zusatz-Funktionen