This class represents the POMCP online planner using UCB1. More...

#include <AIToolbox/POMDP/Algorithms/POMCP.hpp>

Classes
struct	ActionNode

struct	BeliefNode

Public Types
using	SampleBelief = std::vector< size_t >

using	BeliefNodes = std::unordered_map< size_t, BeliefNode >

using	ActionNodes = std::vector< ActionNode >

Public Member Functions
	POMCP (const M &m, size_t beliefSize, unsigned iterations, double exp)
	Basic constructor. More...

size_t	sampleAction (const Belief &b, unsigned horizon)
	This function resets the internal graph and samples for the provided belief and horizon. More...

size_t	sampleAction (size_t a, size_t o, unsigned horizon)
	This function uses the internal graph to plan. More...

void	setBeliefSize (size_t beliefSize)
	This function sets the new size for initial beliefs created from sampleAction(). More...

void	setIterations (unsigned iter)
	This function sets the number of performed rollouts in POMCP. More...

void	setExploration (double exp)
	This function sets the new exploration constant for POMCP. More...

const M &	getModel () const
	This function returns the POMDP generative model being used. More...

const BeliefNode &	getGraph () const
	This function returns a reference to the internal graph structure holding the results of rollouts. More...

size_t	getBeliefSize () const
	This function returns the initial particle size for converted Beliefs. More...

unsigned	getIterations () const
	This function returns the number of iterations performed to plan for an action. More...

double	getExploration () const
	This function returns the currently set exploration constant. More...

Detailed Description

template<IsGenerativeModel M>
class AIToolbox::POMDP::POMCP< M >

This class represents the POMCP online planner using UCB1.

This algorithm is an online planner for POMDPs. As an online planner, it needs to have a generative model of the problem. This means that it only needs a way to sample transitions, observations and rewards from the model, but it does not need to know directly the distribution probabilities for them.

POMCP plans for a single belief at a time. It follows the logic of Monte Carlo Tree Sampling, where a tree structure is build progressively and action values are deduced as averages of the obtained rewards over rollouts. If the number of sample episodes is high enough, it is guaranteed to converge to the optimal solution.

At each rollout, we follow each action and observation within the tree from root to leaves. During this path we chose actions using an algorithm called UCT. What this does is privilege the most promising actions, while guaranteeing that in the limit every action will still be tried an infinite amount of times.

Once we arrive to a leaf in the tree, we then expand it with a single new node, representing a new observation we just collected. We then proceed outside the tree following a random policy, but this time we do not track which actions and observations we actually take/obtain. The final reward obtained by this random rollout policy is used to approximate the values for all nodes visited in this rollout inside the tree, before leaving it.

Since POMCP expands a tree, it can reuse work it has done if multiple action requests are done in order. To do so, it simply asks for the action that has been performed and its respective obtained observation. Then it simply makes that root branch the new root, and starts again.

In order to avoid performing belief updates between each action/observation pair, which can be expensive, POMCP uses particle beliefs. These approximate the beliefs at every step, and are used to select states in the rollouts.

A weakness of this implementation is that, as every particle approximation of continuous values, it will lose particles in time. To fight this a possibility is to implement a particle reinvigoration method, which would introduce noise in the particle beliefs in order to keep them "fresh" (possibly using domain knowledge).

Member Typedef Documentation

◆ ActionNodes

template<IsGenerativeModel M>

using AIToolbox::POMDP::POMCP< M >::ActionNodes = std::vector<ActionNode>

◆ BeliefNodes

template<IsGenerativeModel M>

using AIToolbox::POMDP::POMCP< M >::BeliefNodes = std::unordered_map<size_t, BeliefNode>

◆ SampleBelief

template<IsGenerativeModel M>

using AIToolbox::POMDP::POMCP< M >::SampleBelief = std::vector<size_t>

Constructor & Destructor Documentation

◆ POMCP()

template<IsGenerativeModel M>

AIToolbox::POMDP::POMCP< M >::POMCP	(	const M &	m,
		size_t	beliefSize,
		unsigned	iterations,
		double	exp
	)

Basic constructor.

Parameters

m	The POMDP model that POMCP will operate upon.
beliefSize	The size of the initial particle belief.
iterations	The number of episodes to run before completion.
exp	The exploration constant. This parameter is VERY important to determine the final POMCP performance.

Member Function Documentation

◆ getBeliefSize()

template<IsGenerativeModel M>

size_t AIToolbox::POMDP::POMCP< M >::getBeliefSize

This function returns the initial particle size for converted Beliefs.

Returns: The initial particle count.

◆ getExploration()

template<IsGenerativeModel M>

double AIToolbox::POMDP::POMCP< M >::getExploration

This function returns the currently set exploration constant.

Returns: The exploration constant.

◆ getGraph()

template<IsGenerativeModel M>

const POMCP< M >::BeliefNode & AIToolbox::POMDP::POMCP< M >::getGraph

This function returns a reference to the internal graph structure holding the results of rollouts.

Returns: The internal graph.

◆ getIterations()

template<IsGenerativeModel M>

unsigned AIToolbox::POMDP::POMCP< M >::getIterations

This function returns the number of iterations performed to plan for an action.

Returns: The number of iterations.

◆ getModel()

template<IsGenerativeModel M>

const M & AIToolbox::POMDP::POMCP< M >::getModel

This function returns the POMDP generative model being used.

Returns: The POMDP generative model.

◆ sampleAction() [1/2]

template<IsGenerativeModel M>

size_t AIToolbox::POMDP::POMCP< M >::sampleAction	(	const Belief &	b,
		unsigned	horizon
	)

This function resets the internal graph and samples for the provided belief and horizon.

In general it would be better if the belief did not contain any terminal states; although not necessary, it would prevent unnecessary work from being performed.

Parameters

b	The initial belief for the environment.
horizon	The horizon to plan for.

Returns: The best action.

◆ sampleAction() [2/2]

template<IsGenerativeModel M>

size_t AIToolbox::POMDP::POMCP< M >::sampleAction	(	size_t	a,
		size_t	o,
		unsigned	horizon
	)

This function uses the internal graph to plan.

This function can be called after a previous call to sampleAction with a Belief. Otherwise, it will invoke it anyway with a random belief.

If a graph is already present though, this function will select the branch defined by the input action and observation, and prune the rest. The search will be started using the existing graph: this should make search faster, and also not require any belief updates.

NOTE: Currently there is no particle reinvigoration implemented, so for long horizons you can expect progressively degrading performances.

Parameters

a	The action taken in the last timestep.
o	The observation received in the last timestep.
horizon	The horizon to plan for.

Returns: The best action.

◆ setBeliefSize()

template<IsGenerativeModel M>

void AIToolbox::POMDP::POMCP< M >::setBeliefSize ( size_t beliefSize )

This function sets the new size for initial beliefs created from sampleAction().

Note that this parameter does not bound particle beliefs created within the tree by result of rollouts: only the ones directly created from true Beliefs.

Parameters

beliefSize The new particle belief size.

◆ setExploration()

template<IsGenerativeModel M>

void AIToolbox::POMDP::POMCP< M >::setExploration ( double exp )

This function sets the new exploration constant for POMCP.

This parameter is EXTREMELY important to determine POMCP performance and, ultimately, convergence. In general it is better to find it empirically, by testing some values and see which one performs best. Tune this parameter, it really matters!

Parameters

exp	The new exploration constant.

◆ setIterations()

template<IsGenerativeModel M>

void AIToolbox::POMDP::POMCP< M >::setIterations ( unsigned iter )

This function sets the number of performed rollouts in POMCP.

Parameters

iter	The new number of rollouts.

The documentation for this class was generated from the following file:

include/AIToolbox/POMDP/Algorithms/POMCP.hpp

Classes

Public Types

Public Member Functions

Detailed Description

template<IsGenerativeModel M> class AIToolbox::POMDP::POMCP< M >

Member Typedef Documentation

◆ ActionNodes

◆ BeliefNodes

◆ SampleBelief

Constructor & Destructor Documentation

◆ POMCP()

Member Function Documentation

◆ getBeliefSize()

◆ getExploration()

◆ getGraph()

◆ getIterations()

◆ getModel()

◆ sampleAction() [1/2]

◆ sampleAction() [2/2]

◆ setBeliefSize()

◆ setExploration()

◆ setIterations()

template<IsGenerativeModel M>
class AIToolbox::POMDP::POMCP< M >