AIToolbox
A library that offers tools for AI problem solving.
|
This class represents the rPOMCP online planner. More...
#include <AIToolbox/POMDP/Algorithms/rPOMCP.hpp>
Public Types | |
using | BNode = BeliefNode< UseEntropy > |
using | ANode = ActionNode< UseEntropy > |
using | HNode = HeadBeliefNode< UseEntropy > |
Public Member Functions | |
rPOMCP (const M &m, size_t beliefSize, unsigned iterations, double exp, unsigned k=500) | |
Basic constructor. More... | |
size_t | sampleAction (const Belief &b, unsigned horizon) |
This function resets the internal graph and samples for the provided belief and horizon. More... | |
size_t | sampleAction (size_t a, size_t o, unsigned horizon) |
This function uses the internal graph to plan. More... | |
void | setBeliefSize (size_t beliefSize) |
This function sets the new size for initial beliefs created from sampleAction(). More... | |
void | setIterations (unsigned iter) |
This function sets the number of performed rollouts in rPOMCP. More... | |
void | setExploration (double exp) |
This function sets the new exploration constant for rPOMCP. More... | |
const M & | getModel () const |
This function returns the POMDP generative model being used. More... | |
const HNode & | getGraph () const |
This function returns a reference to the internal graph structure holding the results of rollouts. More... | |
size_t | getBeliefSize () const |
This function returns the initial particle size for converted Beliefs. More... | |
unsigned | getIterations () const |
This function returns the number of iterations performed to plan for an action. More... | |
double | getExploration () const |
This function returns the currently set exploration constant. More... | |
This class represents the rPOMCP online planner.
rPOMCP works very similarly to POMCP. It is an approximate online planner that works by using particle beliefs in order to efficiently simulate future timesteps.
The main difference is that rPOMCP was made in order to work with belief-dependent reward functions.
This means that rPOMCP won't directly look at the reward of the model. Instead, it is assumed that its reward is directly dependent on its knowledge: rather than trying to steer the environment towards good state, it will try to steer it so that it will increase its knowledge about the current state.
rPOMCP only supports two reward functions: max-of-belief and entropy.
With max-of-belief rPOMCP will act in order to maximize the maximum value of its belief. With entropy rPOMCP will act in order to minimize the entropy of its belief.
These two functions are hardcoded within the internals of rPOMCP, since supporting arbitrary belief-based reward functions is exceedingly hard.
In order to work with belief-based reward functions rPOMCP necessarily has to approximate all rewards, since it uses particle beliefs and not true beliefs.
rPOMCP also employs a different method than POMCP in order to backpropagate rewards within the exploration tree: rather than averaging obtained rewards, it refines them as the particle beliefs become bigger, and updates throughout the tree the old estimates for updated nodes by backpropagating carefully constructed fake rewards.
This is done as soon as enough particles are gathered in the belief to avoid wildly changing updates back in the tree.
using AIToolbox::POMDP::rPOMCP< M, UseEntropy >::ANode = ActionNode<UseEntropy> |
using AIToolbox::POMDP::rPOMCP< M, UseEntropy >::BNode = BeliefNode<UseEntropy> |
using AIToolbox::POMDP::rPOMCP< M, UseEntropy >::HNode = HeadBeliefNode<UseEntropy> |
AIToolbox::POMDP::rPOMCP< M, UseEntropy >::rPOMCP | ( | const M & | m, |
size_t | beliefSize, | ||
unsigned | iterations, | ||
double | exp, | ||
unsigned | k = 500 |
||
) |
Basic constructor.
m | The POMDP model that rPOMCP will operate upon. |
beliefSize | The size of the initial particle belief. |
iterations | The number of episodes to run before completion. |
exp | The exploration constant. This parameter is VERY important to determine the final rPOMCP performance. |
k | The number of samples a belief node must have before it switches to MAX. If very very high is nearly equal to mean. |
size_t AIToolbox::POMDP::rPOMCP< M, UseEntropy >::getBeliefSize |
This function returns the initial particle size for converted Beliefs.
double AIToolbox::POMDP::rPOMCP< M, UseEntropy >::getExploration |
This function returns the currently set exploration constant.
const HeadBeliefNode< UseEntropy > & AIToolbox::POMDP::rPOMCP< M, UseEntropy >::getGraph |
This function returns a reference to the internal graph structure holding the results of rollouts.
unsigned AIToolbox::POMDP::rPOMCP< M, UseEntropy >::getIterations |
This function returns the number of iterations performed to plan for an action.
const M & AIToolbox::POMDP::rPOMCP< M, UseEntropy >::getModel |
size_t AIToolbox::POMDP::rPOMCP< M, UseEntropy >::sampleAction | ( | const Belief & | b, |
unsigned | horizon | ||
) |
This function resets the internal graph and samples for the provided belief and horizon.
In general it would be better if the belief did not contain any terminal states; although not necessary, it would prevent unnecessary work from being performed.
b | The initial belief for the environment. |
horizon | The horizon to plan for. |
size_t AIToolbox::POMDP::rPOMCP< M, UseEntropy >::sampleAction | ( | size_t | a, |
size_t | o, | ||
unsigned | horizon | ||
) |
This function uses the internal graph to plan.
This function can be called after a previous call to sampleAction with a Belief. Otherwise, it will invoke it anyway with a random belief.
If a graph is already present though, this function will select the branch defined by the input action and observation, and prune the rest. The search will be started using the existing graph: this should make search faster, and also not require any belief updates.
NOTE: Currently there is no particle reinvigoration implemented, so for long horizons you can expect progressively degrading performances.
a | The action taken in the last timestep. |
o | The observation received in the last timestep. |
horizon | The horizon to plan for. |
void AIToolbox::POMDP::rPOMCP< M, UseEntropy >::setBeliefSize | ( | size_t | beliefSize | ) |
This function sets the new size for initial beliefs created from sampleAction().
Note that this parameter does not bound particle beliefs created within the tree by result of rollouts: only the ones directly created from true Beliefs.
beliefSize | The new particle belief size. |
void AIToolbox::POMDP::rPOMCP< M, UseEntropy >::setExploration | ( | double | exp | ) |
This function sets the new exploration constant for rPOMCP.
This parameter is EXTREMELY important to determine rPOMCP performance and, ultimately, convergence. In general it is better to find it empirically, by testing some values and see which one performs best. Tune this parameter, it really matters!
exp | The new exploration contant. |
void AIToolbox::POMDP::rPOMCP< M, UseEntropy >::setIterations | ( | unsigned | iter | ) |
This function sets the number of performed rollouts in rPOMCP.
iter | The new number of rollouts. |