AIToolbox
A library that offers tools for AI problem solving.
AIToolbox::POMDP::QMDP Class Reference

This class implements the QMDP algorithm. More...

#include <AIToolbox/POMDP/Algorithms/QMDP.hpp>

Public Member Functions

 QMDP (unsigned horizon, double tolerance=0.001)
 Basic constructor. More...
 
template<IsModel M>
std::tuple< double, ValueFunction, MDP::QFunctionoperator() (const M &m)
 This function applies the QMDP algorithm on the input POMDP. More...
 
void setTolerance (double t)
 This function sets the tolerance parameter. More...
 
void setHorizon (unsigned h)
 This function sets the horizon parameter. More...
 
double getTolerance () const
 This function returns the currently set tolerance parameter. More...
 
unsigned getHorizon () const
 This function returns the current horizon parameter. More...
 

Static Public Member Functions

static VList fromQFunction (size_t O, const MDP::QFunction &qfun)
 This function converts an MDP::QFunction into the equivalent POMDP VList. More...
 

Detailed Description

This class implements the QMDP algorithm.

QMDP is a particular way to approach a POMDP problem and solve it approximately. The idea is to compute a solution that disregards the partial observability for all timesteps but the next one. Thus, we assume that after the next action the agent will suddenly be able to see the true state of the environment, and act accordingly. In doing so then, it will use an MDP value function.

Remember that only the solution process acts this way. When time to act the QMDP solution is simply applied at every timestep, every time assuming that the partial observability is going to last one step.

All in all, this class is pretty much a converter of an MDP::ValueFunction into a POMDP::ValueFunction.

Although the solution is approximate and overconfident (since we assume that partial observability is going to go away, we think we are going to get more reward), it is still good to obtain a closer upper bound on the true solution. This can be used, for example, to boost bounds on online methods, decreasing the time they take to converge.

The solution returned by QMDP will thus have only horizon 1, since the horizon requested is implicitly encoded in the MDP part of the solution.

Constructor & Destructor Documentation

◆ QMDP()

AIToolbox::POMDP::QMDP::QMDP ( unsigned  horizon,
double  tolerance = 0.001 
)

Basic constructor.

QMDP uses MDP::ValueIteration in order to solve the underlying MDP of the POMDP. Thus, its parameters (and bounds) are the same.

Parameters
horizonThe maximum number of iterations to perform.
toleranceThe tolerance factor to stop the value iteration loop.

Member Function Documentation

◆ fromQFunction()

static VList AIToolbox::POMDP::QMDP::fromQFunction ( size_t  O,
const MDP::QFunction qfun 
)
static

This function converts an MDP::QFunction into the equivalent POMDP VList.

This function directly converts a QFunction into the equivalent VList.

The function needs to know the observation space so that if needed the output can be used in a ValueFunction, and possibly with a Policy, without crashing.

Parameters
OThe observation space of the POMDP to make a VList for.
qfunThe MDP QFunction from which to create a VList.
Returns
A VList equivalent to the input QFunction.

◆ getHorizon()

unsigned AIToolbox::POMDP::QMDP::getHorizon ( ) const

This function returns the current horizon parameter.

Returns
The currently set horizon parameter.

◆ getTolerance()

double AIToolbox::POMDP::QMDP::getTolerance ( ) const

This function returns the currently set tolerance parameter.

Returns
The currently set tolerance parameter.

◆ operator()()

template<IsModel M>
std::tuple< double, ValueFunction, MDP::QFunction > AIToolbox::POMDP::QMDP::operator() ( const M &  m)

This function applies the QMDP algorithm on the input POMDP.

This function computes the MDP::QFunction of the underlying MDP of the input POMDP with the parameters set using ValueIteration.

It then converts this solution into the equivalent POMDP::ValueFunction. Finally it returns both (plus the variation for the last iteration of ValueIteration).

Note that no pruning is performed here, so some vectors might be dominated.

Parameters
mThe POMDP to be solved
Returns
A tuple containing the maximum variation for the ValueFunction, the computed ValueFunction and the equivalent MDP::QFunction.

◆ setHorizon()

void AIToolbox::POMDP::QMDP::setHorizon ( unsigned  h)

This function sets the horizon parameter.

Parameters
hThe new horizon parameter.

◆ setTolerance()

void AIToolbox::POMDP::QMDP::setTolerance ( double  t)

This function sets the tolerance parameter.

The tolerance parameter must be >= 0.0, otherwise the function will throw an std::invalid_argument. The tolerance parameter sets the convergence criterion. A tolerance of 0.0 forces the internal ValueIteration to perform a number of iterations equal to the horizon specified. Otherwise, ValueIteration will stop as soon as the difference between two iterations is less than the tolerance specified.

Parameters
tThe new tolerance parameter.

The documentation for this class was generated from the following file: