AIToolbox
A library that offers tools for AI problem solving.
AIToolbox::POMDP::FastInformedBound Class Reference

This class implements the Fast Informed Bound algorithm. More...

#include <AIToolbox/POMDP/Algorithms/FastInformedBound.hpp>

Public Member Functions

 FastInformedBound (unsigned horizon, double tolerance=0.001)
 Basic constructor. More...
 
template<IsModel M>
std::tuple< double, MDP::QFunctionoperator() (const M &m, const MDP::QFunction &oldQ={})
 This function computes the Fast Informed Bound for the input POMDP. More...
 
template<IsModel M, typename SOSA >
std::tuple< double, MDP::QFunctionoperator() (const M &m, const SOSA &sosa, MDP::QFunction oldQ={})
 This function computes the Fast Informed Bound for the input POMDP. More...
 
void setTolerance (double tolerance)
 This function sets the tolerance parameter. More...
 
void setHorizon (unsigned h)
 This function sets the horizon parameter. More...
 
double getTolerance () const
 This function returns the currently set tolerance parameter. More...
 
unsigned getHorizon () const
 This function returns the current horizon parameter. More...
 

Detailed Description

This class implements the Fast Informed Bound algorithm.

This class is useful in order to obtain a very simple upper bound for a POMDP.

This upper bound is computed as a simplification over the true ValueFunction POMDP update (via Bellman Equation).

I'm going to go through the whole derivation from scratch here since to somebody reading it might turn out useful (and I have spent some time trying to understand it so might as well write it up).

We first start the derivation through the basic Bellman Equation for POMDPs:

Q(b,a) = Sum_s R(s,a) * b(s) + gamma * Sum_o P(b'|b,a) * V(b')
Q(b,a) = Sum_s R(s,a) * b(s) + gamma * Sum_o P(o|b,a) * V(b')

This should be pretty self-explanatory: the value of a belief and an action is equal to the reward you get by acting, plus the future discounted rewards of whatever belief you end up in, times the probability of ending up there.

From here I just consider the second part (after gamma), since that's the interesting stuff anyway. The rest remains the same.

Sum_o P(o|b,a) * V(b')
Sum_o Sum_s P(o|s,a) * b(s) * V(b')

Here b' is implied since it can be computed from b,a,o.

So at this point we'd normally be done, but the thing is that this to work this formula requires V(b) to be defined in every belief, and we'd have to update it for every belief. This is kind of hard since there's infinite beliefs to go over.

So the trick is that we know that the ValueFunction is piecewise-linear and convex, so we change the formula a bit in order to be able to use the previous-step alphavectors to do the update.

Sum_o max_prev_alpha Sum_s' [ Sum_s P(s',o|s,a) * b(s) ] * prev_alpha(s')

Here P(o|s,a) became P(s',o|s,a) simply because it has ended up inside the Sum_s', to keep its value the same (this is probability math).

So now instead of V, we look, for each observation, inside the best previous alpha we can find for it, and sum over all its values (since it's a vector).

Now to the Fast Informed Bound. What we do is a simple shuffling of terms, which increases the value:

Sum_o Sum_s max_prev_alpha Sum_s' [ P(s',o|s,a) * b(s) ] * prev_alpha(s')

By moving the max inside the Sum_s, we increase the value of the formula. From here on it's just algebra:

Sum_s b(s) Sum_o max_prev_alpha Sum_s' P(s',o|s,a) * prev_alpha(s')
Q(b,a) = Sum_s b(s) * [ R(s,a) + gamma * Sum_o max_prev_alpha Sum_s' P(s',o|s,a) * prev_alpha(s') ]

Finally, since with this method you produce Q(b,a), it means you'll always produce A alphavectors. So you can write that as:

Q(b,a) = Sum_s b(s) * [ R(s,a) + gamma * Sum_o max_a' Sum_s' P(s',o|s,a) * Q(s',a') ]
Q(s,a) = R(s,a) + gamma * Sum_o max_a' Sum_s' P(s',o|s,a) * Q(s',a')

Which is the update we're doing in the code.

Constructor & Destructor Documentation

◆ FastInformedBound()

AIToolbox::POMDP::FastInformedBound::FastInformedBound ( unsigned  horizon,
double  tolerance = 0.001 
)

Basic constructor.

Parameters
horizonThe maximum number of iterations to perform.
toleranceThe tolerance factor to stop the value iteration loop.

Member Function Documentation

◆ getHorizon()

unsigned AIToolbox::POMDP::FastInformedBound::getHorizon ( ) const

This function returns the current horizon parameter.

Returns
The currently set horizon parameter.

◆ getTolerance()

double AIToolbox::POMDP::FastInformedBound::getTolerance ( ) const

This function returns the currently set tolerance parameter.

Returns
The currently set tolerance parameter.

◆ operator()() [1/2]

template<IsModel M>
std::tuple< double, MDP::QFunction > AIToolbox::POMDP::FastInformedBound::operator() ( const M &  m,
const MDP::QFunction oldQ = {} 
)

This function computes the Fast Informed Bound for the input POMDP.

This function returns a QFunction since it's easier to work with. If you want to use it to act within a POMDP, check out QMDP which can transform it into a VList, and from there into a ValueFunction.

This method creates a SOSA matrix for the input model, and uses it to create the bound.

Parameters
mThe POMDP to be solved.
oldQThe QFunction to start iterating from.
Returns
A tuple containing the maximum variation for the QFunction and the computed QFunction.

◆ operator()() [2/2]

template<IsModel M, typename SOSA >
std::tuple< double, MDP::QFunction > AIToolbox::POMDP::FastInformedBound::operator() ( const M &  m,
const SOSA &  sosa,
MDP::QFunction  oldQ = {} 
)

This function computes the Fast Informed Bound for the input POMDP.

Internally, this method uses a SOSA matrix to improve its speed, since otherwise it'd need to multiply the transition and observation matrices over and over.

Since we don't usually store SOSA matrices, the other operator() computes it on the fly.

In case you already have a POMDP with a pre-computed SOSA matrix and don't need to recompute it, you can call this method directly.

You can use both sparse and dense Matrix4D for this method.

Parameters
mThe POMDP to be solved.
sosaThe SOSA matrix of the input POMDP.
oldQThe QFunction to start iterating from.
Returns
A tuple containing the maximum variation for the QFunction and the computed QFunction.

◆ setHorizon()

void AIToolbox::POMDP::FastInformedBound::setHorizon ( unsigned  h)

This function sets the horizon parameter.

Parameters
hThe new horizon parameter.

◆ setTolerance()

void AIToolbox::POMDP::FastInformedBound::setTolerance ( double  tolerance)

This function sets the tolerance parameter.

The tolerance parameter must be >= 0.0, otherwise the function will throw an std::invalid_argument. The tolerance parameter sets the convergence criterion. A tolerance of 0.0 forces the internal loop to perform a number of iterations equal to the horizon specified. Otherwise, FastInformedBound will stop as soon as the difference between two iterations is less than the tolerance specified.

Parameters
toleranceThe new tolerance parameter.

The documentation for this class was generated from the following file: