This class represents a Markov Decision Process. More...

#include <AIToolbox/MDP/Model.hpp>

Public Types
using	TransitionMatrix = Matrix3D

using	RewardMatrix = Matrix2D

Public Member Functions
	Model (size_t s, size_t a, double discount=1.0)
	Basic constructor. More...

template<IsNaive3DMatrix T, IsNaive3DMatrix R>
	Model (size_t s, size_t a, const T &t, const R &r, double d=1.0)
	Basic constructor. More...

template<IsModel M>
	Model (const M &model)
	Copy constructor from any valid MDP model. More...

	Model (NoCheck, size_t s, size_t a, TransitionMatrix &&t, RewardMatrix &&r, double d)
	Unchecked constructor. More...

template<IsNaive3DMatrix T>
void	setTransitionFunction (const T &t)
	This function replaces the Model transition function with the one provided. More...

void	setTransitionFunction (const TransitionMatrix &t)
	This function sets the transition function using a Eigen dense matrix. More...

template<IsNaive3DMatrix R>
void	setRewardFunction (const R &r)
	This function replaces the Model reward function with the one provided. More...

void	setRewardFunction (const RewardMatrix &r)
	This function replaces the reward function with the one provided. More...

void	setDiscount (double d)
	This function sets a new discount factor for the Model. More...

std::tuple< size_t, double >	sampleSR (size_t s, size_t a) const
	This function samples the MDP with the specified state action pair. More...

size_t	getS () const
	This function returns the number of states of the world. More...

size_t	getA () const
	This function returns the number of available actions to the agent. More...

double	getDiscount () const
	This function returns the currently set discount factor. More...

double	getTransitionProbability (size_t s, size_t a, size_t s1) const
	This function returns the stored transition probability for the specified transition. More...

double	getExpectedReward (size_t s, size_t a, size_t s1) const
	This function returns the stored expected reward for the specified transition. More...

const TransitionMatrix &	getTransitionFunction () const
	This function returns the transition matrix for inspection. More...

const Matrix2D &	getTransitionFunction (size_t a) const
	This function returns the transition function for a given action. More...

const RewardMatrix &	getRewardFunction () const
	This function returns the rewards matrix for inspection. More...

bool	isTerminal (size_t s) const
	This function returns whether a given state is a terminal. More...

Detailed Description

This class represents a Markov Decision Process.

A Markov Decision Process (MDP) is a way to model decision making. The idea is that there is an agent situated in a stochastic environment which changes in discrete "timesteps". The agent can influence the way the environment changes via "actions". For each action the agent can perform, the environment will transition from a state "s" to a state "s1" following a certain transition function. The transition function specifies, for each triple SxAxS' the probability that such a transition will happen.

In addition, associated with transitions, the agent is able to obtain rewards. Thus, if it does good, the agent will obtain a higher reward than if it performed badly. The reward obtained by the agent is in addition associated with a "discount" factor: at every step, the possible reward that the agent can collect is multiplied by this factor, which is a number between 0 and 1. The discount factor is used to model the fact that often it is preferable to obtain something sooner, rather than later.

Since all of this is governed by probabilities, it is possible to solve an MDP model in order to obtain an "optimal policy", which is a way to select an action from a state which will maximize the expected reward that the agent is going to collect during its life. The expected reward is computed as the sum of every reward the agent collects at every timestep, keeping in mind that at every timestep the reward is further and further discounted.

Solving an MDP in such a way is called "planning". Planning solutions often include an "horizon", which is the number of timesteps that are included in an episode. They can be finite or infinite. The optimal policy changes with respect to the horizon, since a higher horizon may offer access to reward-gaining opportunities farther in the future.

An MDP policy (be it the optimal one or another), is associated with two functions: a ValueFunction and a QFunction. The ValueFunction represents the expected return for the agent from any initial state, given that actions are going to be selected according to the policy. The QFunction is similar: it gives the expected return for a specific state-action pair, given that after the specified action one will act according to the policy.

Given that we are usually interested about the optimal policy, there are a couple of properties that are associated with the optimal policies functions. First, the optimal policy can be derived from the optimal QFunction. The optimal policy simply selects, in a given state "s", the action that maximizes the value of the QFunction. In the same way, the optimal ValueFunction can be computed from the optimal QFunction by selecting the max with respect to the action.

Since so much information can be extracted from the QFunction, lots of methods (mostly in Reinforcement Learning) try to learn it.

Member Typedef Documentation

◆ RewardMatrix

using AIToolbox::MDP::Model::RewardMatrix = Matrix2D

◆ TransitionMatrix

using AIToolbox::MDP::Model::TransitionMatrix = Matrix3D

Constructor & Destructor Documentation

◆ Model() [1/4]

AIToolbox::MDP::Model::Model	(	size_t	s,
		size_t	a,
		double	discount = `1.0`
	)

Basic constructor.

This constructor initializes the Model so that all transitions happen with probability 0 but for transitions that bring back to the same state, no matter the action.

All rewards are set to 0. The discount parameter is set to 1.

Parameters

s	The number of states of the world.
a	The number of actions available to the agent.
discount	The discount factor for the MDP.

◆ Model() [2/4]

template<IsNaive3DMatrix T, IsNaive3DMatrix R>

AIToolbox::MDP::Model::Model	(	size_t	s,
		size_t	a,
		const T &	t,
		const R &	r,
		double	d = `1.0`
	)

Basic constructor.

This constructor takes two arbitrary three dimensional containers and tries to copy their contents into the transitions and rewards matrices respectively.

The containers need to support data access through operator[]. In addition, the dimensions of the containers must match the ones provided as arguments (for three dimensions: s,a,s).

This is important, as this constructor DOES NOT perform any size checks on the external containers.

Internal values of the containers will be converted to double, so these conversions must be possible.

In addition, the transition container must contain a valid transition function.

The discount parameter must be between 0 and 1 included, otherwise the constructor will throw an std::invalid_argument.

Template Parameters

T	The external transition container type.
R	The external rewards container type.

Parameters

s	The number of states of the world.
a	The number of actions available to the agent.
t	The external transitions container.
r	The external rewards container.
d	The discount factor for the MDP.

◆ Model() [3/4]

template<IsModel M>

AIToolbox::MDP::Model::Model ( const M & model )

Copy constructor from any valid MDP model.

This allows to copy from any other model. A nice use for this is to convert any model which computes probabilities on the fly into an MDP::Model where probabilities are all stored for fast access. Of course such a solution can be done only when the number of states and actions is not too big.

Template Parameters

M	The type of the other model.

Parameters

model The model that needs to be copied.

◆ Model() [4/4]

AIToolbox::MDP::Model::Model	(	NoCheck	,
		size_t	s,
		size_t	a,
		TransitionMatrix &&	t,
		RewardMatrix &&	r,
		double	d
	)

Unchecked constructor.

This constructor takes ownership of the data that it is passed to it to avoid any sorts of copies and additional work (sanity checks), in order to speed up as much as possible the process of building a new Model.

Note that to use it you have to explicitly use the NO_CHECK tag parameter first.

Parameters

s	The state space of the Model.
a	The action space of the Model.
t	The transition function to be used in the Model.
r	The reward function to be used in the Model.
d	The discount factor for the Model.

Member Function Documentation

◆ getA()

size_t AIToolbox::MDP::Model::getA ( ) const

This function returns the number of available actions to the agent.

Returns: The total number of actions.

◆ getDiscount()

double AIToolbox::MDP::Model::getDiscount ( ) const

This function returns the currently set discount factor.

Returns: The currently set discount factor.

◆ getExpectedReward()

double AIToolbox::MDP::Model::getExpectedReward	(	size_t	s,
		size_t	a,
		size_t	s1
	)		const

This function returns the stored expected reward for the specified transition.

Parameters

s	The initial state of the transition.
a	The action performed in the transition.
s1	The final state of the transition.

Returns: The expected reward of the specified transition.

◆ getRewardFunction()

const RewardMatrix& AIToolbox::MDP::Model::getRewardFunction ( ) const

This function returns the rewards matrix for inspection.

Returns: The rewards matrix.

◆ getS()

size_t AIToolbox::MDP::Model::getS ( ) const

This function returns the number of states of the world.

Returns: The total number of states.

◆ getTransitionFunction() [1/2]

const TransitionMatrix& AIToolbox::MDP::Model::getTransitionFunction ( ) const

This function returns the transition matrix for inspection.

Returns: The rewards matrix.

◆ getTransitionFunction() [2/2]

const Matrix2D& AIToolbox::MDP::Model::getTransitionFunction ( size_t a ) const

This function returns the transition function for a given action.

Parameters

a	The action requested.

Returns: The transition function for the input action.

◆ getTransitionProbability()

double AIToolbox::MDP::Model::getTransitionProbability	(	size_t	s,
		size_t	a,
		size_t	s1
	)		const

This function returns the stored transition probability for the specified transition.

Parameters

s	The initial state of the transition.
a	The action performed in the transition.
s1	The final state of the transition.

Returns: The probability of the specified transition.

◆ isTerminal()

bool AIToolbox::MDP::Model::isTerminal ( size_t s ) const

This function returns whether a given state is a terminal.

Parameters

s	The state examined.

Returns: True if the input state is a terminal, false otherwise.

◆ sampleSR()

std::tuple<size_t, double> AIToolbox::MDP::Model::sampleSR	(	size_t	s,
		size_t	a
	)		const

This function samples the MDP with the specified state action pair.

This function samples the model for simulated experience. The transition and reward functions are used to produce, from the state action pair inserted as arguments, a possible new state with respective reward. The new state is picked from all possible states that the MDP allows transitioning to, each with probability equal to the same probability of the transition in the model. After a new state is picked, the reward is the corresponding reward contained in the reward function.

Parameters

s	The state that needs to be sampled.
a	The action that needs to be sampled.

Returns: A tuple containing a new state and a reward.

◆ setDiscount()

void AIToolbox::MDP::Model::setDiscount ( double d )

This function sets a new discount factor for the Model.

Parameters

d	The new discount factor for the Model.

◆ setRewardFunction() [1/2]

template<IsNaive3DMatrix R>

void AIToolbox::MDP::Model::setRewardFunction ( const R & r )

This function replaces the Model reward function with the one provided.

The container needs to support data access through operator[]. In addition, the dimensions of the containers must match the ones used during construction (for three dimensions: S, A, S).

This is important, as this function DOES NOT perform any size checks on the external containers.

Internal values of the container will be converted to double, so these conversions must be possible.

Template Parameters

R	The external rewards container type.

Parameters

r	The external rewards container.

◆ setRewardFunction() [2/2]

void AIToolbox::MDP::Model::setRewardFunction ( const RewardMatrix & r )

This function replaces the reward function with the one provided.

The dimensions of the container must match the ones used during construction(for two dimensions: S, A). BE CAREFUL.

This function does DOES NOT perform any size checks on the input.

Parameters

r	The external rewards container.

◆ setTransitionFunction() [1/2]

template<IsNaive3DMatrix T>

void AIToolbox::MDP::Model::setTransitionFunction ( const T & t )

This function replaces the Model transition function with the one provided.

This function will throw a std::invalid_argument if the matrix provided does not contain valid probabilities.

The container needs to support data access through operator[]. In addition, the dimensions of the container must match the ones used during construction (for three dimensions: S,A,S).

This is important, as this function DOES NOT perform any size checks on the external container.

Internal values of the container will be converted to double, so these conversions must be possible.

Template Parameters

T	The external transition container type.

Parameters

t	The external transitions container.

◆ setTransitionFunction() [2/2]

void AIToolbox::MDP::Model::setTransitionFunction ( const TransitionMatrix & t )

This function sets the transition function using a Eigen dense matrix.

This function will throw an std::invalid_argument if the matrix provided does not contain valid probabilities.

The dimensions of the container must match the ones used during construction (for three dimensions: A, S, S). BE CAREFUL. The matrices MUST be SxS, while the std::vector containing them MUST be of size A.

This function does DOES NOT perform any size checks on the input.

Parameters

t	The external transitions container.

The documentation for this class was generated from the following file:

include/AIToolbox/MDP/Model.hpp

Public Types

Public Member Functions

Detailed Description

Member Typedef Documentation

◆ RewardMatrix

◆ TransitionMatrix

Constructor & Destructor Documentation

◆ Model() [1/4]

◆ Model() [2/4]

◆ Model() [3/4]

◆ Model() [4/4]

Member Function Documentation

◆ getA()

◆ getDiscount()

◆ getExpectedReward()

◆ getRewardFunction()

◆ getS()

◆ getTransitionFunction() [1/2]

◆ getTransitionFunction() [2/2]

◆ getTransitionProbability()

◆ isTerminal()

◆ sampleSR()

◆ setDiscount()

◆ setRewardFunction() [1/2]

◆ setRewardFunction() [2/2]

◆ setTransitionFunction() [1/2]

◆ setTransitionFunction() [2/2]