Treffer: Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning.

Title:
Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning.
Authors:
Stanković, Miloš S.1,2 (AUTHOR) milos.stankovic@singidunum.ac.rs, Beko, Marko3,4 (AUTHOR) beko.marko@ulusofona.pt, Ilić, Nemanja2,5 (AUTHOR) nemili@etf.rs, Stanković, Srdjan S.6 (AUTHOR) stankovic@etf.rs
Source:
European Journal of Control. Nov2023, Vol. 74, pN.PAG-N.PAG. 1p.
Database:
Supplemental Index

Weitere Informationen

In this paper a new distributed multi-agent Actor-Critic algorithm for reinforcement learning is proposed for solving multi-agent multi-task optimization problems. The Critic algorithm is in the form of a Distributed Emphatic Temporal Difference DETD(λ) algorithm, while the Actor algorithm is proposed as a complementary consensus based policy gradient algorithm, derived from a global objective function having the role of a scalarizing function in multi-objective optimization. It is demonstrated that the Feller-Markov properties hold for the newly derived Actor algorithm. A proof of the weak convergence of the algorithm to the limit set of an attached ODE is derived under mild conditions, using a specific decomposition between the Critic and the Actor algorithms and additional two-time-scale stochastic approximation arguments. An experimental verification of the algorithm properties is given, showing that the algorithm can represent an efficient tool for practice. [ABSTRACT FROM AUTHOR]