Robust Partially Observable Markov Decision Processes

Media type: E-Book

Title: Robust Partially Observable Markov Decision Processes

Contributor: Rasouli, Mohammad [Author]; Saghafian, Soroush [Other]

imprint: [S.l.]: SSRN, [2018]

Extent: 1 Online-Ressource (32 p)

Language: English

DOI: 10.2139/ssrn.3195310

Identifier:

Origination:

Footnote: Nach Informationen von SSRN wurde die ursprüngliche Fassung des Dokuments June 13, 2018 erstellt

Description: In a variety of applications, decisions needs to be made dynamically after receiving imperfect observations about the state of an underlying system. Partially Observable Markov Decision Processes (POMDPs) are widely used in such applications. To use a POMDP, however, a decision-maker must have access to reliable estimations of core state and observation transition probabilities under each possible state and action pair. This is often challenging mainly due to lack of ample data, especially when some actions are not taken frequently enough in practice. This significantly limits the application of POMDPs in real-world settings. In healthcare, for example, medical tests are typically subject to false-positive and false-negative errors, and hence, the decision-maker has imperfect information about the health state of a patient. Furthermore, since some treatment options have not been recommended or explored in the past, data cannot be used to reliably estimate all the required transition probabilities regarding the health state of the patient. We introduce an extension of POMDPs, termed Robust POMDPs (RPOMDPs), which allows dynamic decision-making when there is ambiguity regarding transition probabilities. This extension enables making robust decisions by reducing the reliance on a single probabilistic model of transitions, while still allowing for imperfect state observations. We develop dynamic programming equations for solving RPOMDPs, provide a sufficient statistic and an information state, discuss ways in which their computational complexity can be reduced, and connect them to stochastic zero-sum games with imperfect private monitoring

Access State: Open Access

Search in field:

Recently searched for: