AIToolbox
A library that offers tools for AI problem solving.
|
This class implements the Point Based Value Iteration algorithm. More...
#include <AIToolbox/POMDP/Algorithms/PBVI.hpp>
Public Member Functions | |
PBVI (size_t nBeliefs, unsigned h, double tolerance) | |
Basic constructor. More... | |
void | setTolerance (double tolerance) |
This function sets the tolerance parameter. More... | |
void | setHorizon (unsigned h) |
This function sets a new horizon parameter. More... | |
void | setBeliefSize (size_t nBeliefs) |
This function sets a new number of support beliefs. More... | |
double | getTolerance () const |
This function returns the currently set tolerance parameter. More... | |
unsigned | getHorizon () const |
This function returns the currently set horizon parameter. More... | |
size_t | getBeliefSize () const |
This function returns the currently set number of support beliefs to use during a solve pass. More... | |
template<IsModel M> | |
std::tuple< double, ValueFunction > | operator() (const M &model, ValueFunction v={}) |
This function solves a POMDP::Model approximately. More... | |
template<IsModel M> | |
std::tuple< double, ValueFunction > | operator() (const M &model, const std::vector< Belief > &beliefs, ValueFunction v={}) |
This function solves a POMDP::Model approximately. More... | |
This class implements the Point Based Value Iteration algorithm.
The idea behind this algorithm is to solve a POMDP Model approximately. When computing a perfect solution, the main problem is pruning the resulting ValueFunction in order to contain only a parsimonious representation. What this means is that many vectors inside can be dominated by others, and so they do not add any additional information, while at the same time occupying memory and computational time.
The way this method tries to fix the problem is by solving the Model in a set of specified Beliefs. Doing so results in no need for pruning at all, since every belief uniquely identifies one of the optimal solution vectors (only uniqueness in the final set is required, but it is way cheaper than linear programming).
The Beliefs can be given as input, or stochastically computed as to cover as much as possible of the belief space, to ensure minimization of the final error. The final solution will be correct 100% in the Beliefs that have been selected, and will (possibly) undershoot in non-covered Beliefs.
In addition, the fact that we solve only for a fixed set of Beliefs guarantees that our final solution is limited in size, which is useful since even small POMDP true solutions can explode in size with high horizons, for very little gain.
There is no convergence guarantee of this method, but the error is bounded.
AIToolbox::POMDP::PBVI::PBVI | ( | size_t | nBeliefs, |
unsigned | h, | ||
double | tolerance | ||
) |
Basic constructor.
This constructor sets the default horizon/tolerance used to solve a POMDP::Model and the number of beliefs used to approximate the ValueFunction.
nBeliefs | The number of support beliefs to use. |
h | The horizon chosen. |
tolerance | The tolerance factor to stop the PBVI loop. |
size_t AIToolbox::POMDP::PBVI::getBeliefSize | ( | ) | const |
This function returns the currently set number of support beliefs to use during a solve pass.
unsigned AIToolbox::POMDP::PBVI::getHorizon | ( | ) | const |
This function returns the currently set horizon parameter.
double AIToolbox::POMDP::PBVI::getTolerance | ( | ) | const |
This function returns the currently set tolerance parameter.
std::tuple< double, ValueFunction > AIToolbox::POMDP::PBVI::operator() | ( | const M & | model, |
const std::vector< Belief > & | beliefs, | ||
ValueFunction | v = {} |
||
) |
This function solves a POMDP::Model approximately.
This function uses and evaluates the input beliefs.
The final solution will only contain ValueFunctions for those Beliefs and will interpolate them for points it did not solve for. Even though the resulting solution is approximate very often it is good enough, and this comes with an incredible increase in speed.
M | The type of POMDP model that needs to be solved. |
model | The POMDP model that needs to be solved. |
beliefs | The list of beliefs to evaluate. |
v | The ValueFunction to startup the process from, if needed. |
std::tuple< double, ValueFunction > AIToolbox::POMDP::PBVI::operator() | ( | const M & | model, |
ValueFunction | v = {} |
||
) |
This function solves a POMDP::Model approximately.
This function computes a set of beliefs for which to solve the input model. The beliefs are chosen stochastically, trying to cover as much as possible of the belief space in order to offer as precise a solution as possible. The final solution will only contain ValueFunctions for those Beliefs and will interpolate them for points it did not solve for. Even though the resulting solution is approximate very often it is good enough, and this comes with an incredible increase in speed.
Note that even in the beliefs sampled the solution is not guaranteed to be optimal. This is because a solution for horizon h can only be computed with the true solution from horizon h-1. If such a solution is approximate (and it is here), then the solution for h will not be optimal by definition.
M | The type of POMDP model that needs to be solved. |
model | The POMDP model that needs to be solved. |
v | The ValueFunction to startup the process from, if needed. |
void AIToolbox::POMDP::PBVI::setBeliefSize | ( | size_t | nBeliefs | ) |
This function sets a new number of support beliefs.
nBeliefs | The new number of support beliefs. |
void AIToolbox::POMDP::PBVI::setHorizon | ( | unsigned | h | ) |
This function sets a new horizon parameter.
h | The new horizon parameter. |
void AIToolbox::POMDP::PBVI::setTolerance | ( | double | tolerance | ) |
This function sets the tolerance parameter.
The tolerance parameter must be >= 0.0, otherwise the function will throw an std::runtime_error. The tolerance parameter sets the convergence criterion. A tolerance of 0.0 forces PBVI to perform a number of iterations equal to the horizon specified. Otherwise, PBVI will stop as soon as the difference between two iterations is less than the tolerance specified.
tolerance | The new tolerance parameter. |