This class implements the QMDP algorithm. More...

#include <AIToolbox/POMDP/Algorithms/QMDP.hpp>

Public Member Functions
	QMDP (unsigned horizon, double tolerance=0.001)
	Basic constructor. More...

template<IsModel M>
std::tuple< double, ValueFunction, MDP::QFunction >	operator() (const M &m)
	This function applies the QMDP algorithm on the input POMDP. More...

void	setTolerance (double t)
	This function sets the tolerance parameter. More...

void	setHorizon (unsigned h)
	This function sets the horizon parameter. More...

double	getTolerance () const
	This function returns the currently set tolerance parameter. More...

unsigned	getHorizon () const
	This function returns the current horizon parameter. More...

Static Public Member Functions
static VList	fromQFunction (size_t O, const MDP::QFunction &qfun)
	This function converts an MDP::QFunction into the equivalent POMDP VList. More...

Detailed Description

This class implements the QMDP algorithm.

QMDP is a particular way to approach a POMDP problem and solve it approximately. The idea is to compute a solution that disregards the partial observability for all timesteps but the next one. Thus, we assume that after the next action the agent will suddenly be able to see the true state of the environment, and act accordingly. In doing so then, it will use an MDP value function.

Remember that only the solution process acts this way. When time to act the QMDP solution is simply applied at every timestep, every time assuming that the partial observability is going to last one step.

All in all, this class is pretty much a converter of an MDP::ValueFunction into a POMDP::ValueFunction.

Although the solution is approximate and overconfident (since we assume that partial observability is going to go away, we think we are going to get more reward), it is still good to obtain a closer upper bound on the true solution. This can be used, for example, to boost bounds on online methods, decreasing the time they take to converge.

The solution returned by QMDP will thus have only horizon 1, since the horizon requested is implicitly encoded in the MDP part of the solution.

Constructor & Destructor Documentation

◆ QMDP()

AIToolbox::POMDP::QMDP::QMDP	(	unsigned	horizon,
		double	tolerance = `0.001`
	)

Basic constructor.

QMDP uses MDP::ValueIteration in order to solve the underlying MDP of the POMDP. Thus, its parameters (and bounds) are the same.

Parameters

horizon	The maximum number of iterations to perform.
tolerance	The tolerance factor to stop the value iteration loop.

Member Function Documentation

◆ fromQFunction()

static VList AIToolbox::POMDP::QMDP::fromQFunction	(	size_t	O,
		const MDP::QFunction &	qfun
	)

static

This function converts an MDP::QFunction into the equivalent POMDP VList.

This function directly converts a QFunction into the equivalent VList.

The function needs to know the observation space so that if needed the output can be used in a ValueFunction, and possibly with a Policy, without crashing.

Parameters

O	The observation space of the POMDP to make a VList for.
qfun	The MDP QFunction from which to create a VList.

Returns: A VList equivalent to the input QFunction.

◆ getHorizon()

unsigned AIToolbox::POMDP::QMDP::getHorizon ( ) const

This function returns the current horizon parameter.

Returns: The currently set horizon parameter.

◆ getTolerance()

double AIToolbox::POMDP::QMDP::getTolerance ( ) const

This function returns the currently set tolerance parameter.

Returns: The currently set tolerance parameter.

◆ operator()()

template<IsModel M>

std::tuple< double, ValueFunction, MDP::QFunction > AIToolbox::POMDP::QMDP::operator() ( const M & m )

This function applies the QMDP algorithm on the input POMDP.

This function computes the MDP::QFunction of the underlying MDP of the input POMDP with the parameters set using ValueIteration.

It then converts this solution into the equivalent POMDP::ValueFunction. Finally it returns both (plus the variation for the last iteration of ValueIteration).

Note that no pruning is performed here, so some vectors might be dominated.

Parameters

m	The POMDP to be solved

Returns: A tuple containing the maximum variation for the ValueFunction, the computed ValueFunction and the equivalent MDP::QFunction.

◆ setHorizon()

void AIToolbox::POMDP::QMDP::setHorizon ( unsigned h )

This function sets the horizon parameter.

Parameters

h	The new horizon parameter.

◆ setTolerance()

void AIToolbox::POMDP::QMDP::setTolerance ( double t )

This function sets the tolerance parameter.

The tolerance parameter must be >= 0.0, otherwise the function will throw an std::invalid_argument. The tolerance parameter sets the convergence criterion. A tolerance of 0.0 forces the internal ValueIteration to perform a number of iterations equal to the horizon specified. Otherwise, ValueIteration will stop as soon as the difference between two iterations is less than the tolerance specified.

Parameters

t	The new tolerance parameter.

The documentation for this class was generated from the following file:

include/AIToolbox/POMDP/Algorithms/QMDP.hpp

Public Member Functions

Static Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ QMDP()

Member Function Documentation

◆ fromQFunction()

◆ getHorizon()

◆ getTolerance()

◆ operator()()

◆ setHorizon()

◆ setTolerance()