AIToolbox
A library that offers tools for AI problem solving.
AIToolbox::POMDP::rPOMCP< M, UseEntropy > Class Template Reference

This class represents the rPOMCP online planner. More...

#include <AIToolbox/POMDP/Algorithms/rPOMCP.hpp>

Public Types

using BNode = BeliefNode< UseEntropy >
 
using ANode = ActionNode< UseEntropy >
 
using HNode = HeadBeliefNode< UseEntropy >
 

Public Member Functions

 rPOMCP (const M &m, size_t beliefSize, unsigned iterations, double exp, unsigned k=500)
 Basic constructor. More...
 
size_t sampleAction (const Belief &b, unsigned horizon)
 This function resets the internal graph and samples for the provided belief and horizon. More...
 
size_t sampleAction (size_t a, size_t o, unsigned horizon)
 This function uses the internal graph to plan. More...
 
void setBeliefSize (size_t beliefSize)
 This function sets the new size for initial beliefs created from sampleAction(). More...
 
void setIterations (unsigned iter)
 This function sets the number of performed rollouts in rPOMCP. More...
 
void setExploration (double exp)
 This function sets the new exploration constant for rPOMCP. More...
 
const M & getModel () const
 This function returns the POMDP generative model being used. More...
 
const HNodegetGraph () const
 This function returns a reference to the internal graph structure holding the results of rollouts. More...
 
size_t getBeliefSize () const
 This function returns the initial particle size for converted Beliefs. More...
 
unsigned getIterations () const
 This function returns the number of iterations performed to plan for an action. More...
 
double getExploration () const
 This function returns the currently set exploration constant. More...
 

Detailed Description

template<IsGenerativeModel M, bool UseEntropy>
class AIToolbox::POMDP::rPOMCP< M, UseEntropy >

This class represents the rPOMCP online planner.

rPOMCP works very similarly to POMCP. It is an approximate online planner that works by using particle beliefs in order to efficiently simulate future timesteps.

The main difference is that rPOMCP was made in order to work with belief-dependent reward functions.

This means that rPOMCP won't directly look at the reward of the model. Instead, it is assumed that its reward is directly dependent on its knowledge: rather than trying to steer the environment towards good state, it will try to steer it so that it will increase its knowledge about the current state.

rPOMCP only supports two reward functions: max-of-belief and entropy.

With max-of-belief rPOMCP will act in order to maximize the maximum value of its belief. With entropy rPOMCP will act in order to minimize the entropy of its belief.

These two functions are hardcoded within the internals of rPOMCP, since supporting arbitrary belief-based reward functions is exceedingly hard.

In order to work with belief-based reward functions rPOMCP necessarily has to approximate all rewards, since it uses particle beliefs and not true beliefs.

rPOMCP also employs a different method than POMCP in order to backpropagate rewards within the exploration tree: rather than averaging obtained rewards, it refines them as the particle beliefs become bigger, and updates throughout the tree the old estimates for updated nodes by backpropagating carefully constructed fake rewards.

This is done as soon as enough particles are gathered in the belief to avoid wildly changing updates back in the tree.

Member Typedef Documentation

◆ ANode

template<IsGenerativeModel M, bool UseEntropy>
using AIToolbox::POMDP::rPOMCP< M, UseEntropy >::ANode = ActionNode<UseEntropy>

◆ BNode

template<IsGenerativeModel M, bool UseEntropy>
using AIToolbox::POMDP::rPOMCP< M, UseEntropy >::BNode = BeliefNode<UseEntropy>

◆ HNode

template<IsGenerativeModel M, bool UseEntropy>
using AIToolbox::POMDP::rPOMCP< M, UseEntropy >::HNode = HeadBeliefNode<UseEntropy>

Constructor & Destructor Documentation

◆ rPOMCP()

template<IsGenerativeModel M, bool UseEntropy>
AIToolbox::POMDP::rPOMCP< M, UseEntropy >::rPOMCP ( const M &  m,
size_t  beliefSize,
unsigned  iterations,
double  exp,
unsigned  k = 500 
)

Basic constructor.

Parameters
mThe POMDP model that rPOMCP will operate upon.
beliefSizeThe size of the initial particle belief.
iterationsThe number of episodes to run before completion.
expThe exploration constant. This parameter is VERY important to determine the final rPOMCP performance.
kThe number of samples a belief node must have before it switches to MAX. If very very high is nearly equal to mean.

Member Function Documentation

◆ getBeliefSize()

template<IsGenerativeModel M, bool UseEntropy>
size_t AIToolbox::POMDP::rPOMCP< M, UseEntropy >::getBeliefSize

This function returns the initial particle size for converted Beliefs.

Returns
The initial particle count.

◆ getExploration()

template<IsGenerativeModel M, bool UseEntropy>
double AIToolbox::POMDP::rPOMCP< M, UseEntropy >::getExploration

This function returns the currently set exploration constant.

Returns
The exploration constant.

◆ getGraph()

template<IsGenerativeModel M, bool UseEntropy>
const HeadBeliefNode< UseEntropy > & AIToolbox::POMDP::rPOMCP< M, UseEntropy >::getGraph

This function returns a reference to the internal graph structure holding the results of rollouts.

Returns
The internal graph.

◆ getIterations()

template<IsGenerativeModel M, bool UseEntropy>
unsigned AIToolbox::POMDP::rPOMCP< M, UseEntropy >::getIterations

This function returns the number of iterations performed to plan for an action.

Returns
The number of iterations.

◆ getModel()

template<IsGenerativeModel M, bool UseEntropy>
const M & AIToolbox::POMDP::rPOMCP< M, UseEntropy >::getModel

This function returns the POMDP generative model being used.

Returns
The POMDP generative model.

◆ sampleAction() [1/2]

template<IsGenerativeModel M, bool UseEntropy>
size_t AIToolbox::POMDP::rPOMCP< M, UseEntropy >::sampleAction ( const Belief b,
unsigned  horizon 
)

This function resets the internal graph and samples for the provided belief and horizon.

In general it would be better if the belief did not contain any terminal states; although not necessary, it would prevent unnecessary work from being performed.

Parameters
bThe initial belief for the environment.
horizonThe horizon to plan for.
Returns
The best action.

◆ sampleAction() [2/2]

template<IsGenerativeModel M, bool UseEntropy>
size_t AIToolbox::POMDP::rPOMCP< M, UseEntropy >::sampleAction ( size_t  a,
size_t  o,
unsigned  horizon 
)

This function uses the internal graph to plan.

This function can be called after a previous call to sampleAction with a Belief. Otherwise, it will invoke it anyway with a random belief.

If a graph is already present though, this function will select the branch defined by the input action and observation, and prune the rest. The search will be started using the existing graph: this should make search faster, and also not require any belief updates.

NOTE: Currently there is no particle reinvigoration implemented, so for long horizons you can expect progressively degrading performances.

Parameters
aThe action taken in the last timestep.
oThe observation received in the last timestep.
horizonThe horizon to plan for.
Returns
The best action.

◆ setBeliefSize()

template<IsGenerativeModel M, bool UseEntropy>
void AIToolbox::POMDP::rPOMCP< M, UseEntropy >::setBeliefSize ( size_t  beliefSize)

This function sets the new size for initial beliefs created from sampleAction().

Note that this parameter does not bound particle beliefs created within the tree by result of rollouts: only the ones directly created from true Beliefs.

Parameters
beliefSizeThe new particle belief size.

◆ setExploration()

template<IsGenerativeModel M, bool UseEntropy>
void AIToolbox::POMDP::rPOMCP< M, UseEntropy >::setExploration ( double  exp)

This function sets the new exploration constant for rPOMCP.

This parameter is EXTREMELY important to determine rPOMCP performance and, ultimately, convergence. In general it is better to find it empirically, by testing some values and see which one performs best. Tune this parameter, it really matters!

Parameters
expThe new exploration contant.

◆ setIterations()

template<IsGenerativeModel M, bool UseEntropy>
void AIToolbox::POMDP::rPOMCP< M, UseEntropy >::setIterations ( unsigned  iter)

This function sets the number of performed rollouts in rPOMCP.

Parameters
iterThe new number of rollouts.

The documentation for this class was generated from the following file: