AIToolbox
A library that offers tools for AI problem solving.
|
This class represents a POMDP Policy. More...
#include <AIToolbox/POMDP/Policies/Policy.hpp>
Public Types | |
using | Base = PolicyInterface< size_t, Belief, size_t > |
Public Member Functions | |
Policy (size_t s, size_t a, size_t o) | |
Basic constrctor. More... | |
Policy (size_t s, size_t a, size_t o, const ValueFunction &v) | |
Basic constrctor. More... | |
virtual size_t | sampleAction (const Belief &b) const override |
This function chooses a random action for belief b, following the policy distribution. More... | |
std::tuple< size_t, size_t > | sampleAction (const Belief &b, unsigned horizon) const |
This function chooses a random action for belief b when horizon steps are missing, following the policy distribution. More... | |
std::tuple< size_t, size_t > | sampleAction (size_t id, size_t o, unsigned horizon) const |
This function chooses a random action after performing a sampled action and observing observation o, for a particular horizon. More... | |
virtual double | getActionProbability (const Belief &b, const size_t &a) const override |
This function returns the probability of taking the specified action in the specified belief. More... | |
double | getActionProbability (const Belief &b, size_t a, unsigned horizon) const |
This function returns the probability of taking the specified action in the specified belief. More... | |
size_t | getO () const |
This function returns the number of observations possible for the agent. More... | |
size_t | getH () const |
This function returns the highest horizon available within this Policy. More... | |
const ValueFunction & | getValueFunction () const |
This function returns the internally stored ValueFunction. More... | |
Public Member Functions inherited from AIToolbox::PolicyInterface< size_t, Belief, size_t > | |
PolicyInterface (size_t s, size_t a) | |
Basic constructor. More... | |
virtual | ~PolicyInterface () |
Basic virtual destructor. More... | |
virtual size_t | sampleAction (const Belief &s) const=0 |
This function chooses a random action for state s, following the policy distribution. More... | |
virtual double | getActionProbability (const Belief &s, const size_t &a) const=0 |
This function returns the probability of taking the specified action in the specified state. More... | |
const size_t & | getS () const |
This function returns the number of states of the world. More... | |
const size_t & | getA () const |
This function returns the number of available actions to the agent. More... | |
Friends | |
std::istream & | operator>> (std::istream &is, Policy &p) |
This function reads a policy from a file. More... | |
Additional Inherited Members | |
Protected Attributes inherited from AIToolbox::PolicyInterface< size_t, Belief, size_t > | |
size_t | S |
size_t | A |
RandomEngine | rand_ |
This class represents a POMDP Policy.
This class currently represents a basic Policy adaptor for a POMDP::ValueFunction. What this class does is to extract the policy tree contained within a POMDP::ValueFunction. The idea is that, at each horizon, the ValueFunction contains a set of applicable solutions (alpha vectors) for the POMDP. At each Belief point, only one of those vectors applies.
This class finds out at every belief which is the vector that applies, and returns the appropriate action. At the same time, it provides facilities to follow the chosen vector along the tree (since future actions depend on the observations obtained by the agent).
using AIToolbox::POMDP::Policy::Base = PolicyInterface<size_t, Belief, size_t> |
AIToolbox::POMDP::Policy::Policy | ( | size_t | s, |
size_t | a, | ||
size_t | o | ||
) |
Basic constrctor.
This constructor initializes the internal ValueFunction as having only the horizon 0 no values solution. This is most useful if the Policy needs to be read from a file.
s | The number of states of the world. |
a | The number of actions available to the agent. |
o | The number of possible observations the agent could make. |
AIToolbox::POMDP::Policy::Policy | ( | size_t | s, |
size_t | a, | ||
size_t | o, | ||
const ValueFunction & | v | ||
) |
Basic constrctor.
This constructor copies the implied policy contained in a ValueFunction. Keep in mind that the policy stored within a ValueFunction is non-stochastic in nature, since for each state it can only save a single action.
s | The number of states of the world. |
a | The number of actions available to the agent. |
o | The number of possible observations the agent could make. |
v | The ValueFunction used as a basis for the Policy. |
|
overridevirtual |
This function returns the probability of taking the specified action in the specified belief.
b | The selected belief. |
a | The selected action. |
double AIToolbox::POMDP::Policy::getActionProbability | ( | const Belief & | b, |
size_t | a, | ||
unsigned | horizon | ||
) | const |
This function returns the probability of taking the specified action in the specified belief.
b | The selected belief. |
a | The selected action. |
horizon | The requested horizon, meaning the number of timesteps missing until the end of the "episode". |
size_t AIToolbox::POMDP::Policy::getH | ( | ) | const |
This function returns the highest horizon available within this Policy.
Note that all functions that accept an horizon as a parameter DO NOT check the bounds of that variable. In addition, note that while for S,A,O getters you get a number that exceeds by 1 the values allowed (since counting starts from 0), here the bound is actually included in the limit, as horizon 0 does not really do anything.
Example: getH() returns 5. This means that 5 is the highest allowed parameter for an horizon in any other Policy method.
size_t AIToolbox::POMDP::Policy::getO | ( | ) | const |
This function returns the number of observations possible for the agent.
const ValueFunction& AIToolbox::POMDP::Policy::getValueFunction | ( | ) | const |
This function returns the internally stored ValueFunction.
|
overridevirtual |
This function chooses a random action for belief b, following the policy distribution.
Note that this will sample from the highest horizon that the Policy was computed for.
b | The sampled belief of the policy. |
std::tuple<size_t, size_t> AIToolbox::POMDP::Policy::sampleAction | ( | const Belief & | b, |
unsigned | horizon | ||
) | const |
This function chooses a random action for belief b when horizon steps are missing, following the policy distribution.
There are a couple of differences between this sampling function and the simpler version. The first one is that this function is actually able to sample from different timesteps, since this class is able to maintain a full policy tree over time.
The second difference is that it returns two values. The first one is the requested action. The second return value is an id that allows the policy to compute more efficiently the sampled action during the next timestep, if provided to the Policy together with the obtained observation.
b | The sampled belief of the policy. |
horizon | The requested horizon, meaning the number of timesteps missing until the end of the "episode". horizon 0 will return a valid, non-specified action. |
std::tuple<size_t, size_t> AIToolbox::POMDP::Policy::sampleAction | ( | size_t | id, |
size_t | o, | ||
unsigned | horizon | ||
) | const |
This function chooses a random action after performing a sampled action and observing observation o, for a particular horizon.
This sampling function is provided in case an already sampled action has been performed, an observation registered, and now a new action is needed for the next timestep. Using this function is highly recommended, as no belief update is necessary, and no lookup in a possibly very long list of VEntries required.
Note that this function works if and only if the horizon is going to be 1 (one) less than the value used for the previous sampling, otherwise anything could happen. This does not mean that the calls depend on each other (the function is "pure" in that sense), just that to obtain meaningful values back the horizon should be decreased.
To keep things simple, the id does not store internally the needed horizon value, and you are requested to keep track of it yourself.
An example of usage for this function would be:
id | An id returned from a previous call of sampleAction. |
o | The observation obtained after performing a previously sampled action. |
horizon | The new horizon, equal to the old sampled horizon - 1. |
|
friend |
This function reads a policy from a file.
This function reads files that have been outputted through operator<<(std::ostream&, const Policy&). If not enough values can be extracted from the stream, the function stops and the input policy is not modified. In addition, it checks whether the probability values are within 0 and 1.
is | The stream were the policy is being read from. |
p | The policy that is being assigned. |