AIToolbox
A library that offers tools for AI problem solving.
AIToolbox::POMDP::Policy Class Reference

This class represents a POMDP Policy. More...

#include <AIToolbox/POMDP/Policies/Policy.hpp>

Inheritance diagram for AIToolbox::POMDP::Policy:
AIToolbox::PolicyInterface< size_t, Belief, size_t >

Public Types

using Base = PolicyInterface< size_t, Belief, size_t >
 

Public Member Functions

 Policy (size_t s, size_t a, size_t o)
 Basic constrctor. More...
 
 Policy (size_t s, size_t a, size_t o, const ValueFunction &v)
 Basic constrctor. More...
 
virtual size_t sampleAction (const Belief &b) const override
 This function chooses a random action for belief b, following the policy distribution. More...
 
std::tuple< size_t, size_t > sampleAction (const Belief &b, unsigned horizon) const
 This function chooses a random action for belief b when horizon steps are missing, following the policy distribution. More...
 
std::tuple< size_t, size_t > sampleAction (size_t id, size_t o, unsigned horizon) const
 This function chooses a random action after performing a sampled action and observing observation o, for a particular horizon. More...
 
virtual double getActionProbability (const Belief &b, const size_t &a) const override
 This function returns the probability of taking the specified action in the specified belief. More...
 
double getActionProbability (const Belief &b, size_t a, unsigned horizon) const
 This function returns the probability of taking the specified action in the specified belief. More...
 
size_t getO () const
 This function returns the number of observations possible for the agent. More...
 
size_t getH () const
 This function returns the highest horizon available within this Policy. More...
 
const ValueFunctiongetValueFunction () const
 This function returns the internally stored ValueFunction. More...
 
- Public Member Functions inherited from AIToolbox::PolicyInterface< size_t, Belief, size_t >
 PolicyInterface (size_t s, size_t a)
 Basic constructor. More...
 
virtual ~PolicyInterface ()
 Basic virtual destructor. More...
 
virtual size_t sampleAction (const Belief &s) const=0
 This function chooses a random action for state s, following the policy distribution. More...
 
virtual double getActionProbability (const Belief &s, const size_t &a) const=0
 This function returns the probability of taking the specified action in the specified state. More...
 
const size_t & getS () const
 This function returns the number of states of the world. More...
 
const size_t & getA () const
 This function returns the number of available actions to the agent. More...
 

Friends

std::istream & operator>> (std::istream &is, Policy &p)
 This function reads a policy from a file. More...
 

Additional Inherited Members

- Protected Attributes inherited from AIToolbox::PolicyInterface< size_t, Belief, size_t >
size_t S
 
size_t A
 
RandomEngine rand_
 

Detailed Description

This class represents a POMDP Policy.

This class currently represents a basic Policy adaptor for a POMDP::ValueFunction. What this class does is to extract the policy tree contained within a POMDP::ValueFunction. The idea is that, at each horizon, the ValueFunction contains a set of applicable solutions (alpha vectors) for the POMDP. At each Belief point, only one of those vectors applies.

This class finds out at every belief which is the vector that applies, and returns the appropriate action. At the same time, it provides facilities to follow the chosen vector along the tree (since future actions depend on the observations obtained by the agent).

Member Typedef Documentation

◆ Base

Constructor & Destructor Documentation

◆ Policy() [1/2]

AIToolbox::POMDP::Policy::Policy ( size_t  s,
size_t  a,
size_t  o 
)

Basic constrctor.

This constructor initializes the internal ValueFunction as having only the horizon 0 no values solution. This is most useful if the Policy needs to be read from a file.

Parameters
sThe number of states of the world.
aThe number of actions available to the agent.
oThe number of possible observations the agent could make.

◆ Policy() [2/2]

AIToolbox::POMDP::Policy::Policy ( size_t  s,
size_t  a,
size_t  o,
const ValueFunction v 
)

Basic constrctor.

This constructor copies the implied policy contained in a ValueFunction. Keep in mind that the policy stored within a ValueFunction is non-stochastic in nature, since for each state it can only save a single action.

Parameters
sThe number of states of the world.
aThe number of actions available to the agent.
oThe number of possible observations the agent could make.
vThe ValueFunction used as a basis for the Policy.

Member Function Documentation

◆ getActionProbability() [1/2]

virtual double AIToolbox::POMDP::Policy::getActionProbability ( const Belief b,
const size_t &  a 
) const
overridevirtual

This function returns the probability of taking the specified action in the specified belief.

Parameters
bThe selected belief.
aThe selected action.
Returns
The probability of taking the selected action in the specified belief.

◆ getActionProbability() [2/2]

double AIToolbox::POMDP::Policy::getActionProbability ( const Belief b,
size_t  a,
unsigned  horizon 
) const

This function returns the probability of taking the specified action in the specified belief.

Parameters
bThe selected belief.
aThe selected action.
horizonThe requested horizon, meaning the number of timesteps missing until the end of the "episode".
Returns
The probability of taking the selected action in the specified belief in the specified horizon.

◆ getH()

size_t AIToolbox::POMDP::Policy::getH ( ) const

This function returns the highest horizon available within this Policy.

Note that all functions that accept an horizon as a parameter DO NOT check the bounds of that variable. In addition, note that while for S,A,O getters you get a number that exceeds by 1 the values allowed (since counting starts from 0), here the bound is actually included in the limit, as horizon 0 does not really do anything.

Example: getH() returns 5. This means that 5 is the highest allowed parameter for an horizon in any other Policy method.

Returns
The highest horizon policied.

◆ getO()

size_t AIToolbox::POMDP::Policy::getO ( ) const

This function returns the number of observations possible for the agent.

Returns
The total number of observations.

◆ getValueFunction()

const ValueFunction& AIToolbox::POMDP::Policy::getValueFunction ( ) const

This function returns the internally stored ValueFunction.

Returns
The internally stored ValueFunction.

◆ sampleAction() [1/3]

virtual size_t AIToolbox::POMDP::Policy::sampleAction ( const Belief b) const
overridevirtual

This function chooses a random action for belief b, following the policy distribution.

Note that this will sample from the highest horizon that the Policy was computed for.

Parameters
bThe sampled belief of the policy.
Returns
The chosen action.

◆ sampleAction() [2/3]

std::tuple<size_t, size_t> AIToolbox::POMDP::Policy::sampleAction ( const Belief b,
unsigned  horizon 
) const

This function chooses a random action for belief b when horizon steps are missing, following the policy distribution.

There are a couple of differences between this sampling function and the simpler version. The first one is that this function is actually able to sample from different timesteps, since this class is able to maintain a full policy tree over time.

The second difference is that it returns two values. The first one is the requested action. The second return value is an id that allows the policy to compute more efficiently the sampled action during the next timestep, if provided to the Policy together with the obtained observation.

Parameters
bThe sampled belief of the policy.
horizonThe requested horizon, meaning the number of timesteps missing until the end of the "episode". horizon 0 will return a valid, non-specified action.
Returns
A tuple containing the chosen action, plus an id useful to sample an action more efficiently at the next timestep, if required.

◆ sampleAction() [3/3]

std::tuple<size_t, size_t> AIToolbox::POMDP::Policy::sampleAction ( size_t  id,
size_t  o,
unsigned  horizon 
) const

This function chooses a random action after performing a sampled action and observing observation o, for a particular horizon.

This sampling function is provided in case an already sampled action has been performed, an observation registered, and now a new action is needed for the next timestep. Using this function is highly recommended, as no belief update is necessary, and no lookup in a possibly very long list of VEntries required.

Note that this function works if and only if the horizon is going to be 1 (one) less than the value used for the previous sampling, otherwise anything could happen. This does not mean that the calls depend on each other (the function is "pure" in that sense), just that to obtain meaningful values back the horizon should be decreased.

To keep things simple, the id does not store internally the needed horizon value, and you are requested to keep track of it yourself.

An example of usage for this function would be:

horizon = 3;
// First sample
auto result = sampleAction(belief, horizon);
// We do the action, something happens, we get an observation.
size_t observation = performAction(std::get<0>(result));
--horizon;
// We sample again, after reducing the horizon, with the previous id.
result = sampleAction(std::get<1>(result), observation, horizon);
Parameters
idAn id returned from a previous call of sampleAction.
oThe observation obtained after performing a previously sampled action.
horizonThe new horizon, equal to the old sampled horizon - 1.
Returns
A tuple containing the chosen action, plus an id useful to sample an action more efficiently at the next timestep, if required.

Friends And Related Function Documentation

◆ operator>>

std::istream& operator>> ( std::istream &  is,
Policy p 
)
friend

This function reads a policy from a file.

This function reads files that have been outputted through operator<<(std::ostream&, const Policy&). If not enough values can be extracted from the stream, the function stops and the input policy is not modified. In addition, it checks whether the probability values are within 0 and 1.

Parameters
isThe stream were the policy is being read from.
pThe policy that is being assigned.
Returns
The input stream.

The documentation for this class was generated from the following file:
AIToolbox::POMDP::Policy::sampleAction
virtual size_t sampleAction(const Belief &b) const override
This function chooses a random action for belief b, following the policy distribution.