This class represents the rPOMCP online planner. More...

#include <AIToolbox/POMDP/Algorithms/rPOMCP.hpp>

Public Types
using	BNode = BeliefNode< UseEntropy >

using	ANode = ActionNode< UseEntropy >

using	HNode = HeadBeliefNode< UseEntropy >

Public Member Functions
	rPOMCP (const M &m, size_t beliefSize, unsigned iterations, double exp, unsigned k=500)
	Basic constructor. More...

size_t	sampleAction (const Belief &b, unsigned horizon)
	This function resets the internal graph and samples for the provided belief and horizon. More...

size_t	sampleAction (size_t a, size_t o, unsigned horizon)
	This function uses the internal graph to plan. More...

void	setBeliefSize (size_t beliefSize)
	This function sets the new size for initial beliefs created from sampleAction(). More...

void	setIterations (unsigned iter)
	This function sets the number of performed rollouts in rPOMCP. More...

void	setExploration (double exp)
	This function sets the new exploration constant for rPOMCP. More...

const M &	getModel () const
	This function returns the POMDP generative model being used. More...

const HNode &	getGraph () const
	This function returns a reference to the internal graph structure holding the results of rollouts. More...

size_t	getBeliefSize () const
	This function returns the initial particle size for converted Beliefs. More...

unsigned	getIterations () const
	This function returns the number of iterations performed to plan for an action. More...

double	getExploration () const
	This function returns the currently set exploration constant. More...

Detailed Description

template<IsGenerativeModel M, bool UseEntropy>
class AIToolbox::POMDP::rPOMCP< M, UseEntropy >

This class represents the rPOMCP online planner.

rPOMCP works very similarly to POMCP. It is an approximate online planner that works by using particle beliefs in order to efficiently simulate future timesteps.

The main difference is that rPOMCP was made in order to work with belief-dependent reward functions.

This means that rPOMCP won't directly look at the reward of the model. Instead, it is assumed that its reward is directly dependent on its knowledge: rather than trying to steer the environment towards good state, it will try to steer it so that it will increase its knowledge about the current state.

rPOMCP only supports two reward functions: max-of-belief and entropy.

With max-of-belief rPOMCP will act in order to maximize the maximum value of its belief. With entropy rPOMCP will act in order to minimize the entropy of its belief.

These two functions are hardcoded within the internals of rPOMCP, since supporting arbitrary belief-based reward functions is exceedingly hard.

In order to work with belief-based reward functions rPOMCP necessarily has to approximate all rewards, since it uses particle beliefs and not true beliefs.

rPOMCP also employs a different method than POMCP in order to backpropagate rewards within the exploration tree: rather than averaging obtained rewards, it refines them as the particle beliefs become bigger, and updates throughout the tree the old estimates for updated nodes by backpropagating carefully constructed fake rewards.

This is done as soon as enough particles are gathered in the belief to avoid wildly changing updates back in the tree.

Member Typedef Documentation

◆ ANode

template<IsGenerativeModel M, bool UseEntropy>

using AIToolbox::POMDP::rPOMCP< M, UseEntropy >::ANode = ActionNode<UseEntropy>

◆ BNode

template<IsGenerativeModel M, bool UseEntropy>

using AIToolbox::POMDP::rPOMCP< M, UseEntropy >::BNode = BeliefNode<UseEntropy>

◆ HNode

template<IsGenerativeModel M, bool UseEntropy>

using AIToolbox::POMDP::rPOMCP< M, UseEntropy >::HNode = HeadBeliefNode<UseEntropy>

Constructor & Destructor Documentation

◆ rPOMCP()

template<IsGenerativeModel M, bool UseEntropy>

AIToolbox::POMDP::rPOMCP< M, UseEntropy >::rPOMCP	(	const M &	m,
		size_t	beliefSize,
		unsigned	iterations,
		double	exp,
		unsigned	k = `500`
	)

Basic constructor.

Parameters

m	The POMDP model that rPOMCP will operate upon.
beliefSize	The size of the initial particle belief.
iterations	The number of episodes to run before completion.
exp	The exploration constant. This parameter is VERY important to determine the final rPOMCP performance.
k	The number of samples a belief node must have before it switches to MAX. If very very high is nearly equal to mean.

Member Function Documentation

◆ getBeliefSize()

template<IsGenerativeModel M, bool UseEntropy>

size_t AIToolbox::POMDP::rPOMCP< M, UseEntropy >::getBeliefSize

This function returns the initial particle size for converted Beliefs.

Returns: The initial particle count.

◆ getExploration()

template<IsGenerativeModel M, bool UseEntropy>

double AIToolbox::POMDP::rPOMCP< M, UseEntropy >::getExploration

This function returns the currently set exploration constant.

Returns: The exploration constant.

◆ getGraph()

template<IsGenerativeModel M, bool UseEntropy>

const HeadBeliefNode< UseEntropy > & AIToolbox::POMDP::rPOMCP< M, UseEntropy >::getGraph

This function returns a reference to the internal graph structure holding the results of rollouts.

Returns: The internal graph.

◆ getIterations()

template<IsGenerativeModel M, bool UseEntropy>

unsigned AIToolbox::POMDP::rPOMCP< M, UseEntropy >::getIterations

This function returns the number of iterations performed to plan for an action.

Returns: The number of iterations.

◆ getModel()

template<IsGenerativeModel M, bool UseEntropy>

const M & AIToolbox::POMDP::rPOMCP< M, UseEntropy >::getModel

This function returns the POMDP generative model being used.

Returns: The POMDP generative model.

◆ sampleAction() [1/2]

template<IsGenerativeModel M, bool UseEntropy>

size_t AIToolbox::POMDP::rPOMCP< M, UseEntropy >::sampleAction	(	const Belief &	b,
		unsigned	horizon
	)

This function resets the internal graph and samples for the provided belief and horizon.

In general it would be better if the belief did not contain any terminal states; although not necessary, it would prevent unnecessary work from being performed.

Parameters

b	The initial belief for the environment.
horizon	The horizon to plan for.

Returns: The best action.

◆ sampleAction() [2/2]

template<IsGenerativeModel M, bool UseEntropy>

size_t AIToolbox::POMDP::rPOMCP< M, UseEntropy >::sampleAction	(	size_t	a,
		size_t	o,
		unsigned	horizon
	)

This function uses the internal graph to plan.

This function can be called after a previous call to sampleAction with a Belief. Otherwise, it will invoke it anyway with a random belief.

If a graph is already present though, this function will select the branch defined by the input action and observation, and prune the rest. The search will be started using the existing graph: this should make search faster, and also not require any belief updates.

NOTE: Currently there is no particle reinvigoration implemented, so for long horizons you can expect progressively degrading performances.

Parameters

a	The action taken in the last timestep.
o	The observation received in the last timestep.
horizon	The horizon to plan for.

Returns: The best action.

◆ setBeliefSize()

template<IsGenerativeModel M, bool UseEntropy>

void AIToolbox::POMDP::rPOMCP< M, UseEntropy >::setBeliefSize ( size_t beliefSize )

This function sets the new size for initial beliefs created from sampleAction().

Note that this parameter does not bound particle beliefs created within the tree by result of rollouts: only the ones directly created from true Beliefs.

Parameters

beliefSize The new particle belief size.

◆ setExploration()

template<IsGenerativeModel M, bool UseEntropy>

void AIToolbox::POMDP::rPOMCP< M, UseEntropy >::setExploration ( double exp )

This function sets the new exploration constant for rPOMCP.

This parameter is EXTREMELY important to determine rPOMCP performance and, ultimately, convergence. In general it is better to find it empirically, by testing some values and see which one performs best. Tune this parameter, it really matters!

Parameters

exp	The new exploration contant.

◆ setIterations()

template<IsGenerativeModel M, bool UseEntropy>

void AIToolbox::POMDP::rPOMCP< M, UseEntropy >::setIterations ( unsigned iter )

This function sets the number of performed rollouts in rPOMCP.

Parameters

iter	The new number of rollouts.

The documentation for this class was generated from the following file:

include/AIToolbox/POMDP/Algorithms/rPOMCP.hpp

Public Types

Public Member Functions

Detailed Description

template<IsGenerativeModel M, bool UseEntropy> class AIToolbox::POMDP::rPOMCP< M, UseEntropy >

Member Typedef Documentation

◆ ANode

◆ BNode

◆ HNode

Constructor & Destructor Documentation

◆ rPOMCP()

Member Function Documentation

◆ getBeliefSize()

◆ getExploration()

◆ getGraph()

◆ getIterations()

◆ getModel()

◆ sampleAction() [1/2]

◆ sampleAction() [2/2]

◆ setBeliefSize()

◆ setExploration()

◆ setIterations()

template<IsGenerativeModel M, bool UseEntropy>
class AIToolbox::POMDP::rPOMCP< M, UseEntropy >