AIToolbox
A library that offers tools for AI problem solving.
|
This class implements the QMDP algorithm. More...
#include <AIToolbox/POMDP/Algorithms/QMDP.hpp>
Public Member Functions | |
QMDP (unsigned horizon, double tolerance=0.001) | |
Basic constructor. More... | |
template<IsModel M> | |
std::tuple< double, ValueFunction, MDP::QFunction > | operator() (const M &m) |
This function applies the QMDP algorithm on the input POMDP. More... | |
void | setTolerance (double t) |
This function sets the tolerance parameter. More... | |
void | setHorizon (unsigned h) |
This function sets the horizon parameter. More... | |
double | getTolerance () const |
This function returns the currently set tolerance parameter. More... | |
unsigned | getHorizon () const |
This function returns the current horizon parameter. More... | |
Static Public Member Functions | |
static VList | fromQFunction (size_t O, const MDP::QFunction &qfun) |
This function converts an MDP::QFunction into the equivalent POMDP VList. More... | |
This class implements the QMDP algorithm.
QMDP is a particular way to approach a POMDP problem and solve it approximately. The idea is to compute a solution that disregards the partial observability for all timesteps but the next one. Thus, we assume that after the next action the agent will suddenly be able to see the true state of the environment, and act accordingly. In doing so then, it will use an MDP value function.
Remember that only the solution process acts this way. When time to act the QMDP solution is simply applied at every timestep, every time assuming that the partial observability is going to last one step.
All in all, this class is pretty much a converter of an MDP::ValueFunction into a POMDP::ValueFunction.
Although the solution is approximate and overconfident (since we assume that partial observability is going to go away, we think we are going to get more reward), it is still good to obtain a closer upper bound on the true solution. This can be used, for example, to boost bounds on online methods, decreasing the time they take to converge.
The solution returned by QMDP will thus have only horizon 1, since the horizon requested is implicitly encoded in the MDP part of the solution.
AIToolbox::POMDP::QMDP::QMDP | ( | unsigned | horizon, |
double | tolerance = 0.001 |
||
) |
Basic constructor.
QMDP uses MDP::ValueIteration in order to solve the underlying MDP of the POMDP. Thus, its parameters (and bounds) are the same.
horizon | The maximum number of iterations to perform. |
tolerance | The tolerance factor to stop the value iteration loop. |
|
static |
This function converts an MDP::QFunction into the equivalent POMDP VList.
This function directly converts a QFunction into the equivalent VList.
The function needs to know the observation space so that if needed the output can be used in a ValueFunction, and possibly with a Policy, without crashing.
O | The observation space of the POMDP to make a VList for. |
qfun | The MDP QFunction from which to create a VList. |
unsigned AIToolbox::POMDP::QMDP::getHorizon | ( | ) | const |
This function returns the current horizon parameter.
double AIToolbox::POMDP::QMDP::getTolerance | ( | ) | const |
This function returns the currently set tolerance parameter.
std::tuple< double, ValueFunction, MDP::QFunction > AIToolbox::POMDP::QMDP::operator() | ( | const M & | m | ) |
This function applies the QMDP algorithm on the input POMDP.
This function computes the MDP::QFunction of the underlying MDP of the input POMDP with the parameters set using ValueIteration.
It then converts this solution into the equivalent POMDP::ValueFunction. Finally it returns both (plus the variation for the last iteration of ValueIteration).
Note that no pruning is performed here, so some vectors might be dominated.
m | The POMDP to be solved |
void AIToolbox::POMDP::QMDP::setHorizon | ( | unsigned | h | ) |
This function sets the horizon parameter.
h | The new horizon parameter. |
void AIToolbox::POMDP::QMDP::setTolerance | ( | double | t | ) |
This function sets the tolerance parameter.
The tolerance parameter must be >= 0.0, otherwise the function will throw an std::invalid_argument. The tolerance parameter sets the convergence criterion. A tolerance of 0.0 forces the internal ValueIteration to perform a number of iterations equal to the horizon specified. Otherwise, ValueIteration will stop as soon as the difference between two iterations is less than the tolerance specified.
t | The new tolerance parameter. |