This class represents a Partially Observable Markov Decision Process. More...

#include <AIToolbox/POMDP/Model.hpp>

Public Types
using	ObservationMatrix = Matrix3D

Public Member Functions
template<typename... Args>
	Model (size_t o, Args &&... parameters)
	Basic constructor. More...

template<IsNaive3DMatrix ObFun, typename... Args>
	Model (size_t o, ObFun &&of, Args &&... parameters)
	Basic constructor. More...

template<typename PM >
requires IsModel< PM > &&std::constructible_from< M, PM >	Model (const PM &model)
	Copy constructor from any valid POMDP model. More...

template<typename... Args>
	Model (NoCheck, size_t o, ObservationMatrix &&ot, Args &&... parameters)
	Unchecked constructor. More...

template<IsNaive3DMatrix ObFun>
void	setObservationFunction (const ObFun &of)
	This function replaces the Model observation function with the one provided. More...

void	setObservationFunction (const ObservationMatrix &o)
	This function sets the observation function using a Eigen dense matrix. More...

std::tuple< size_t, size_t, double >	sampleSOR (size_t s, size_t a) const
	This function samples the POMDP for the specified state action pair. More...

std::tuple< size_t, double >	sampleOR (size_t s, size_t a, size_t s1) const
	This function samples the POMDP for the specified state action pair. More...

double	getObservationProbability (size_t s1, size_t a, size_t o) const
	This function returns the stored observation probability for the specified state-action pair. More...

const Matrix2D &	getObservationFunction (size_t a) const
	This function returns the observation function for a given action. More...

size_t	getO () const
	This function returns the number of observations possible. More...

const ObservationMatrix &	getObservationFunction () const
	This function returns the observation matrix for inspection. More...

Detailed Description

template<MDP::IsModel M>
class AIToolbox::POMDP::Model< M >

This class represents a Partially Observable Markov Decision Process.

This class inherits from any valid MDP model type, so that it can use its base methods, and it builds from those. Templated inheritance was chosen to improve performance and keep code small, instead of doing composition.

A POMDP is an MDP where the agent, at each timestep, does not know in which state it is. Instead, after each action is performed, it obtains an "observation", which offers some information as to which new state the agent has transitioned to. This observation is determined by an "observation function", that maps S'xAxO to a probability: the probability of obtaining observation O after taking action A and landing in state S'.

Since now its knowledge is imperfect, in order to represent the knowledge of the state it is currently in, the agent is thus forced to use Beliefs: probability distributions over states.

The way a Belief works is that, after each action and observation, the agent can reason as follows: given my previous Belief (distribution over states) that I think I was in, what is now the probability that I transitioned to any particular state? This new Belief can be computed from the Model, given that the agent knows the distributions of the transition and observation functions.

Turns out that a POMDP can be viewed as an MDP with an infinite number of states, where each state is essentially a Belief. Since a Belief is a vector of real numbers, there are infinite of them, thus the infinite number of states. While POMDPs can be much more powerful than MDPs for modeling real world problems, where information is usually not perfect, it turns out that this infinite-state property makes them so much harder to solve perfectly, and their solutions much more complex.

A POMDP solution is composed by several policies, which apply in different ranges of the Belief space, and suggest different actions depending on the observations received by the agent at each timestep. The values of those policies can be, in the same way, represented as a number of value vectors (called alpha vectors in the literature) that apply in those same ranges of the Belief space. Each alpha vector is somewhat similar to an MDP ValueFunction.

Template Parameters

M	The particular MDP type that we want to extend.

Member Typedef Documentation

◆ ObservationMatrix

template<MDP::IsModel M>

using AIToolbox::POMDP::Model< M >::ObservationMatrix = Matrix3D

Constructor & Destructor Documentation

◆ Model() [1/4]

template<MDP::IsModel M>

template<typename... Args>

AIToolbox::POMDP::Model< M >::Model	(	size_t	o,
		Args &&...	parameters
	)

Basic constructor.

This constructor initializes the observation function so that all actions will return observation 0.

Template Parameters

Args	All types of the parent constructor arguments.

Parameters

o	The number of possible observations the agent could make.
parameters	All arguments needed to build the parent Model.

◆ Model() [2/4]

template<MDP::IsModel M>

template<IsNaive3DMatrix ObFun, typename... Args>

AIToolbox::POMDP::Model< M >::Model	(	size_t	o,
		ObFun &&	of,
		Args &&...	parameters
	)

Basic constructor.

This constructor takes an arbitrary three dimensional container and tries to copy its contents into the observations matrix.

The container needs to support data access through operator[]. In addition, the dimensions of the container must match the ones provided as arguments both directly (o) and indirectly (s,a), in the order s, a, o.

This is important, as this constructor DOES NOT perform any size checks on the external containers.

Internal values of the containers will be converted to double, so these conversions must be possible.

In addition, the observation container must contain a valid transition function.

Template Parameters

ObFun The external observations container type.

Parameters

o	The number of possible observations the agent could make.
of	The observation probability matrix.
parameters	All arguments needed to build the parent Model.

◆ Model() [3/4]

template<MDP::IsModel M>

template<typename PM >

requires IsModel< PM > &&std::constructible_from< M, PM > AIToolbox::POMDP::Model< M >::Model ( const PM & model )

Copy constructor from any valid POMDP model.

This allows to copy from any other model. A nice use for this is to convert any model which computes probabilities on the fly into an POMDP::Model where probabilities are all stored for fast access. Of course such a solution can be done only when the number of states, actions and observations is not too big.

Of course this constructor is available only if the underlying MDP Model can to be constructed from the input as well.

Template Parameters

PM	The type of the other model.

Parameters

model The model that needs to be copied.

◆ Model() [4/4]

template<MDP::IsModel M>

template<typename... Args>

AIToolbox::POMDP::Model< M >::Model	(	NoCheck	,
		size_t	o,
		ObservationMatrix &&	ot,
		Args &&...	parameters
	)

Unchecked constructor.

This constructor takes ownership of the data that it is passed to it to avoid any sorts of copies and additional work (sanity checks), in order to speed up as much as possible the process of building a new Model.

Note that to use it you have to explicitly use the NO_CHECK tag parameter first.

Parameters

o	The number of possible observations the agent could make.
ot	The observation probability matrix.
parameters	All arguments needed to build the parent Model.

Member Function Documentation

◆ getO()

template<MDP::IsModel M>

size_t AIToolbox::POMDP::Model< M >::getO

This function returns the number of observations possible.

Returns: The total number of observations.

◆ getObservationFunction() [1/2]

template<MDP::IsModel M>

const Model< M >::ObservationMatrix & AIToolbox::POMDP::Model< M >::getObservationFunction

This function returns the observation matrix for inspection.

Returns: The observation matrix.

◆ getObservationFunction() [2/2]

template<MDP::IsModel M>

const Matrix2D & AIToolbox::POMDP::Model< M >::getObservationFunction ( size_t a ) const

This function returns the observation function for a given action.

Parameters

a	The action requested.

Returns: The observation function for the input action.

◆ getObservationProbability()

template<MDP::IsModel M>

double AIToolbox::POMDP::Model< M >::getObservationProbability	(	size_t	s1,
		size_t	a,
		size_t	o
	)		const

This function returns the stored observation probability for the specified state-action pair.

Parameters

s1	The final state of the transition.
a	The action performed in the transition.
o	The recorded observation for the transition.

Returns: The probability of the specified observation.

◆ sampleOR()

template<MDP::IsModel M>

std::tuple< size_t, double > AIToolbox::POMDP::Model< M >::sampleOR	(	size_t	s,
		size_t	a,
		size_t	s1
	)		const

This function samples the POMDP for the specified state action pair.

This function samples the model for simulated experience. The transition, observation and reward functions are used to produce, from the state, action and new state inserted as arguments, a possible new observation and reward. The observation and rewards are picked so that they are consistent with the specified new state.

Parameters

s	The state that needs to be sampled.
a	The action that needs to be sampled.
s1	The resulting state of the s,a transition.

Returns: A tuple containing a new observation and reward.

◆ sampleSOR()

template<MDP::IsModel M>

std::tuple< size_t, size_t, double > AIToolbox::POMDP::Model< M >::sampleSOR	(	size_t	s,
		size_t	a
	)		const

This function samples the POMDP for the specified state action pair.

This function samples the model for simulated experience. The transition, observation and reward functions are used to produce, from the state action pair inserted as arguments, a possible new state with respective observation and reward. The new state is picked from all possible states that the MDP allows transitioning to, each with probability equal to the same probability of the transition in the model. After a new state is picked, an observation is sampled from the observation function distribution, and finally the reward is the corresponding reward contained in the reward function.

Parameters

s	The state that needs to be sampled.
a	The action that needs to be sampled.

Returns: A tuple containing a new state, observation and reward.

◆ setObservationFunction() [1/2]

template<MDP::IsModel M>

template<IsNaive3DMatrix ObFun>

void AIToolbox::POMDP::Model< M >::setObservationFunction ( const ObFun & of )

This function replaces the Model observation function with the one provided.

The container needs to support data access through operator[]. In addition, the dimensions of the containers must match the ones provided as arguments (for three dimensions: s,a,o, in this order).

This is important, as this function DOES NOT perform any size checks on the external containers.

Internal values of the container will be converted to double, so these conversions must be possible.

Template Parameters

ObFun The external observations container type.

Parameters

of	The external observations container.

◆ setObservationFunction() [2/2]

template<MDP::IsModel M>

void AIToolbox::POMDP::Model< M >::setObservationFunction ( const ObservationMatrix & o )

This function sets the observation function using a Eigen dense matrix.

This function will throw an std::invalid_argument if the matrix provided does not contain valid probabilities.

The dimensions of the container must match the ones used during construction (for three dimensions: A, S, O). BE CAREFUL. The matrices MUST be SxO, while the std::vector containing them MUST be of size A.

This function does DOES NOT perform any size checks on the input.

Parameters

o	The external observations container.

The documentation for this class was generated from the following file:

include/AIToolbox/POMDP/Model.hpp

Public Types

Public Member Functions

Detailed Description

template<MDP::IsModel M> class AIToolbox::POMDP::Model< M >

Member Typedef Documentation

◆ ObservationMatrix

Constructor & Destructor Documentation

◆ Model() [1/4]

◆ Model() [2/4]

◆ Model() [3/4]

◆ Model() [4/4]

Member Function Documentation

◆ getO()

◆ getObservationFunction() [1/2]

◆ getObservationFunction() [2/2]

◆ getObservationProbability()

◆ sampleOR()

◆ sampleSOR()

◆ setObservationFunction() [1/2]

◆ setObservationFunction() [2/2]

template<MDP::IsModel M>
class AIToolbox::POMDP::Model< M >