AIToolbox
A library that offers tools for AI problem solving.
|
This class represents a Partially Observable Markov Decision Process. More...
#include <AIToolbox/POMDP/Model.hpp>
Public Types | |
using | ObservationMatrix = Matrix3D |
Public Member Functions | |
template<typename... Args> | |
Model (size_t o, Args &&... parameters) | |
Basic constructor. More... | |
template<IsNaive3DMatrix ObFun, typename... Args> | |
Model (size_t o, ObFun &&of, Args &&... parameters) | |
Basic constructor. More... | |
template<typename PM > | |
requires IsModel< PM > &&std::constructible_from< M, PM > | Model (const PM &model) |
Copy constructor from any valid POMDP model. More... | |
template<typename... Args> | |
Model (NoCheck, size_t o, ObservationMatrix &&ot, Args &&... parameters) | |
Unchecked constructor. More... | |
template<IsNaive3DMatrix ObFun> | |
void | setObservationFunction (const ObFun &of) |
This function replaces the Model observation function with the one provided. More... | |
void | setObservationFunction (const ObservationMatrix &o) |
This function sets the observation function using a Eigen dense matrix. More... | |
std::tuple< size_t, size_t, double > | sampleSOR (size_t s, size_t a) const |
This function samples the POMDP for the specified state action pair. More... | |
std::tuple< size_t, double > | sampleOR (size_t s, size_t a, size_t s1) const |
This function samples the POMDP for the specified state action pair. More... | |
double | getObservationProbability (size_t s1, size_t a, size_t o) const |
This function returns the stored observation probability for the specified state-action pair. More... | |
const Matrix2D & | getObservationFunction (size_t a) const |
This function returns the observation function for a given action. More... | |
size_t | getO () const |
This function returns the number of observations possible. More... | |
const ObservationMatrix & | getObservationFunction () const |
This function returns the observation matrix for inspection. More... | |
This class represents a Partially Observable Markov Decision Process.
This class inherits from any valid MDP model type, so that it can use its base methods, and it builds from those. Templated inheritance was chosen to improve performance and keep code small, instead of doing composition.
A POMDP is an MDP where the agent, at each timestep, does not know in which state it is. Instead, after each action is performed, it obtains an "observation", which offers some information as to which new state the agent has transitioned to. This observation is determined by an "observation function", that maps S'xAxO to a probability: the probability of obtaining observation O after taking action A and landing in state S'.
Since now its knowledge is imperfect, in order to represent the knowledge of the state it is currently in, the agent is thus forced to use Beliefs: probability distributions over states.
The way a Belief works is that, after each action and observation, the agent can reason as follows: given my previous Belief (distribution over states) that I think I was in, what is now the probability that I transitioned to any particular state? This new Belief can be computed from the Model, given that the agent knows the distributions of the transition and observation functions.
Turns out that a POMDP can be viewed as an MDP with an infinite number of states, where each state is essentially a Belief. Since a Belief is a vector of real numbers, there are infinite of them, thus the infinite number of states. While POMDPs can be much more powerful than MDPs for modeling real world problems, where information is usually not perfect, it turns out that this infinite-state property makes them so much harder to solve perfectly, and their solutions much more complex.
A POMDP solution is composed by several policies, which apply in different ranges of the Belief space, and suggest different actions depending on the observations received by the agent at each timestep. The values of those policies can be, in the same way, represented as a number of value vectors (called alpha vectors in the literature) that apply in those same ranges of the Belief space. Each alpha vector is somewhat similar to an MDP ValueFunction.
M | The particular MDP type that we want to extend. |
using AIToolbox::POMDP::Model< M >::ObservationMatrix = Matrix3D |
AIToolbox::POMDP::Model< M >::Model | ( | size_t | o, |
Args &&... | parameters | ||
) |
Basic constructor.
This constructor initializes the observation function so that all actions will return observation 0.
Args | All types of the parent constructor arguments. |
o | The number of possible observations the agent could make. |
parameters | All arguments needed to build the parent Model. |
AIToolbox::POMDP::Model< M >::Model | ( | size_t | o, |
ObFun && | of, | ||
Args &&... | parameters | ||
) |
Basic constructor.
This constructor takes an arbitrary three dimensional container and tries to copy its contents into the observations matrix.
The container needs to support data access through operator[]. In addition, the dimensions of the container must match the ones provided as arguments both directly (o) and indirectly (s,a), in the order s, a, o.
This is important, as this constructor DOES NOT perform any size checks on the external containers.
Internal values of the containers will be converted to double, so these conversions must be possible.
In addition, the observation container must contain a valid transition function.
ObFun | The external observations container type. |
o | The number of possible observations the agent could make. |
of | The observation probability matrix. |
parameters | All arguments needed to build the parent Model. |
requires IsModel< PM > &&std::constructible_from< M, PM > AIToolbox::POMDP::Model< M >::Model | ( | const PM & | model | ) |
Copy constructor from any valid POMDP model.
This allows to copy from any other model. A nice use for this is to convert any model which computes probabilities on the fly into an POMDP::Model where probabilities are all stored for fast access. Of course such a solution can be done only when the number of states, actions and observations is not too big.
Of course this constructor is available only if the underlying MDP Model can to be constructed from the input as well.
PM | The type of the other model. |
model | The model that needs to be copied. |
AIToolbox::POMDP::Model< M >::Model | ( | NoCheck | , |
size_t | o, | ||
ObservationMatrix && | ot, | ||
Args &&... | parameters | ||
) |
Unchecked constructor.
This constructor takes ownership of the data that it is passed to it to avoid any sorts of copies and additional work (sanity checks), in order to speed up as much as possible the process of building a new Model.
Note that to use it you have to explicitly use the NO_CHECK tag parameter first.
o | The number of possible observations the agent could make. |
ot | The observation probability matrix. |
parameters | All arguments needed to build the parent Model. |
size_t AIToolbox::POMDP::Model< M >::getO |
This function returns the number of observations possible.
const Model< M >::ObservationMatrix & AIToolbox::POMDP::Model< M >::getObservationFunction |
This function returns the observation matrix for inspection.
const Matrix2D & AIToolbox::POMDP::Model< M >::getObservationFunction | ( | size_t | a | ) | const |
This function returns the observation function for a given action.
a | The action requested. |
double AIToolbox::POMDP::Model< M >::getObservationProbability | ( | size_t | s1, |
size_t | a, | ||
size_t | o | ||
) | const |
This function returns the stored observation probability for the specified state-action pair.
s1 | The final state of the transition. |
a | The action performed in the transition. |
o | The recorded observation for the transition. |
std::tuple< size_t, double > AIToolbox::POMDP::Model< M >::sampleOR | ( | size_t | s, |
size_t | a, | ||
size_t | s1 | ||
) | const |
This function samples the POMDP for the specified state action pair.
This function samples the model for simulated experience. The transition, observation and reward functions are used to produce, from the state, action and new state inserted as arguments, a possible new observation and reward. The observation and rewards are picked so that they are consistent with the specified new state.
s | The state that needs to be sampled. |
a | The action that needs to be sampled. |
s1 | The resulting state of the s,a transition. |
std::tuple< size_t, size_t, double > AIToolbox::POMDP::Model< M >::sampleSOR | ( | size_t | s, |
size_t | a | ||
) | const |
This function samples the POMDP for the specified state action pair.
This function samples the model for simulated experience. The transition, observation and reward functions are used to produce, from the state action pair inserted as arguments, a possible new state with respective observation and reward. The new state is picked from all possible states that the MDP allows transitioning to, each with probability equal to the same probability of the transition in the model. After a new state is picked, an observation is sampled from the observation function distribution, and finally the reward is the corresponding reward contained in the reward function.
s | The state that needs to be sampled. |
a | The action that needs to be sampled. |
void AIToolbox::POMDP::Model< M >::setObservationFunction | ( | const ObFun & | of | ) |
This function replaces the Model observation function with the one provided.
The container needs to support data access through operator[]. In addition, the dimensions of the containers must match the ones provided as arguments (for three dimensions: s,a,o, in this order).
This is important, as this function DOES NOT perform any size checks on the external containers.
Internal values of the container will be converted to double, so these conversions must be possible.
ObFun | The external observations container type. |
of | The external observations container. |
void AIToolbox::POMDP::Model< M >::setObservationFunction | ( | const ObservationMatrix & | o | ) |
This function sets the observation function using a Eigen dense matrix.
This function will throw an std::invalid_argument if the matrix provided does not contain valid probabilities.
The dimensions of the container must match the ones used during construction (for three dimensions: A, S, O). BE CAREFUL. The matrices MUST be SxO, while the std::vector containing them MUST be of size A.
This function does DOES NOT perform any size checks on the input.
o | The external observations container. |