AIToolbox
A library that offers tools for AI problem solving.
AIToolbox::MDP::Model Class Reference

This class represents a Markov Decision Process. More...

#include <AIToolbox/MDP/Model.hpp>

Public Types

using TransitionMatrix = Matrix3D
 
using RewardMatrix = Matrix2D
 

Public Member Functions

 Model (size_t s, size_t a, double discount=1.0)
 Basic constructor. More...
 
template<IsNaive3DMatrix T, IsNaive3DMatrix R>
 Model (size_t s, size_t a, const T &t, const R &r, double d=1.0)
 Basic constructor. More...
 
template<IsModel M>
 Model (const M &model)
 Copy constructor from any valid MDP model. More...
 
 Model (NoCheck, size_t s, size_t a, TransitionMatrix &&t, RewardMatrix &&r, double d)
 Unchecked constructor. More...
 
template<IsNaive3DMatrix T>
void setTransitionFunction (const T &t)
 This function replaces the Model transition function with the one provided. More...
 
void setTransitionFunction (const TransitionMatrix &t)
 This function sets the transition function using a Eigen dense matrix. More...
 
template<IsNaive3DMatrix R>
void setRewardFunction (const R &r)
 This function replaces the Model reward function with the one provided. More...
 
void setRewardFunction (const RewardMatrix &r)
 This function replaces the reward function with the one provided. More...
 
void setDiscount (double d)
 This function sets a new discount factor for the Model. More...
 
std::tuple< size_t, double > sampleSR (size_t s, size_t a) const
 This function samples the MDP with the specified state action pair. More...
 
size_t getS () const
 This function returns the number of states of the world. More...
 
size_t getA () const
 This function returns the number of available actions to the agent. More...
 
double getDiscount () const
 This function returns the currently set discount factor. More...
 
double getTransitionProbability (size_t s, size_t a, size_t s1) const
 This function returns the stored transition probability for the specified transition. More...
 
double getExpectedReward (size_t s, size_t a, size_t s1) const
 This function returns the stored expected reward for the specified transition. More...
 
const TransitionMatrixgetTransitionFunction () const
 This function returns the transition matrix for inspection. More...
 
const Matrix2DgetTransitionFunction (size_t a) const
 This function returns the transition function for a given action. More...
 
const RewardMatrixgetRewardFunction () const
 This function returns the rewards matrix for inspection. More...
 
bool isTerminal (size_t s) const
 This function returns whether a given state is a terminal. More...
 

Detailed Description

This class represents a Markov Decision Process.

A Markov Decision Process (MDP) is a way to model decision making. The idea is that there is an agent situated in a stochastic environment which changes in discrete "timesteps". The agent can influence the way the environment changes via "actions". For each action the agent can perform, the environment will transition from a state "s" to a state "s1" following a certain transition function. The transition function specifies, for each triple SxAxS' the probability that such a transition will happen.

In addition, associated with transitions, the agent is able to obtain rewards. Thus, if it does good, the agent will obtain a higher reward than if it performed badly. The reward obtained by the agent is in addition associated with a "discount" factor: at every step, the possible reward that the agent can collect is multiplied by this factor, which is a number between 0 and 1. The discount factor is used to model the fact that often it is preferable to obtain something sooner, rather than later.

Since all of this is governed by probabilities, it is possible to solve an MDP model in order to obtain an "optimal policy", which is a way to select an action from a state which will maximize the expected reward that the agent is going to collect during its life. The expected reward is computed as the sum of every reward the agent collects at every timestep, keeping in mind that at every timestep the reward is further and further discounted.

Solving an MDP in such a way is called "planning". Planning solutions often include an "horizon", which is the number of timesteps that are included in an episode. They can be finite or infinite. The optimal policy changes with respect to the horizon, since a higher horizon may offer access to reward-gaining opportunities farther in the future.

An MDP policy (be it the optimal one or another), is associated with two functions: a ValueFunction and a QFunction. The ValueFunction represents the expected return for the agent from any initial state, given that actions are going to be selected according to the policy. The QFunction is similar: it gives the expected return for a specific state-action pair, given that after the specified action one will act according to the policy.

Given that we are usually interested about the optimal policy, there are a couple of properties that are associated with the optimal policies functions. First, the optimal policy can be derived from the optimal QFunction. The optimal policy simply selects, in a given state "s", the action that maximizes the value of the QFunction. In the same way, the optimal ValueFunction can be computed from the optimal QFunction by selecting the max with respect to the action.

Since so much information can be extracted from the QFunction, lots of methods (mostly in Reinforcement Learning) try to learn it.

Member Typedef Documentation

◆ RewardMatrix

◆ TransitionMatrix

Constructor & Destructor Documentation

◆ Model() [1/4]

AIToolbox::MDP::Model::Model ( size_t  s,
size_t  a,
double  discount = 1.0 
)

Basic constructor.

This constructor initializes the Model so that all transitions happen with probability 0 but for transitions that bring back to the same state, no matter the action.

All rewards are set to 0. The discount parameter is set to 1.

Parameters
sThe number of states of the world.
aThe number of actions available to the agent.
discountThe discount factor for the MDP.

◆ Model() [2/4]

template<IsNaive3DMatrix T, IsNaive3DMatrix R>
AIToolbox::MDP::Model::Model ( size_t  s,
size_t  a,
const T &  t,
const R &  r,
double  d = 1.0 
)

Basic constructor.

This constructor takes two arbitrary three dimensional containers and tries to copy their contents into the transitions and rewards matrices respectively.

The containers need to support data access through operator[]. In addition, the dimensions of the containers must match the ones provided as arguments (for three dimensions: s,a,s).

This is important, as this constructor DOES NOT perform any size checks on the external containers.

Internal values of the containers will be converted to double, so these conversions must be possible.

In addition, the transition container must contain a valid transition function.

The discount parameter must be between 0 and 1 included, otherwise the constructor will throw an std::invalid_argument.

Template Parameters
TThe external transition container type.
RThe external rewards container type.
Parameters
sThe number of states of the world.
aThe number of actions available to the agent.
tThe external transitions container.
rThe external rewards container.
dThe discount factor for the MDP.

◆ Model() [3/4]

template<IsModel M>
AIToolbox::MDP::Model::Model ( const M &  model)

Copy constructor from any valid MDP model.

This allows to copy from any other model. A nice use for this is to convert any model which computes probabilities on the fly into an MDP::Model where probabilities are all stored for fast access. Of course such a solution can be done only when the number of states and actions is not too big.

Template Parameters
MThe type of the other model.
Parameters
modelThe model that needs to be copied.

◆ Model() [4/4]

AIToolbox::MDP::Model::Model ( NoCheck  ,
size_t  s,
size_t  a,
TransitionMatrix &&  t,
RewardMatrix &&  r,
double  d 
)

Unchecked constructor.

This constructor takes ownership of the data that it is passed to it to avoid any sorts of copies and additional work (sanity checks), in order to speed up as much as possible the process of building a new Model.

Note that to use it you have to explicitly use the NO_CHECK tag parameter first.

Parameters
sThe state space of the Model.
aThe action space of the Model.
tThe transition function to be used in the Model.
rThe reward function to be used in the Model.
dThe discount factor for the Model.

Member Function Documentation

◆ getA()

size_t AIToolbox::MDP::Model::getA ( ) const

This function returns the number of available actions to the agent.

Returns
The total number of actions.

◆ getDiscount()

double AIToolbox::MDP::Model::getDiscount ( ) const

This function returns the currently set discount factor.

Returns
The currently set discount factor.

◆ getExpectedReward()

double AIToolbox::MDP::Model::getExpectedReward ( size_t  s,
size_t  a,
size_t  s1 
) const

This function returns the stored expected reward for the specified transition.

Parameters
sThe initial state of the transition.
aThe action performed in the transition.
s1The final state of the transition.
Returns
The expected reward of the specified transition.

◆ getRewardFunction()

const RewardMatrix& AIToolbox::MDP::Model::getRewardFunction ( ) const

This function returns the rewards matrix for inspection.

Returns
The rewards matrix.

◆ getS()

size_t AIToolbox::MDP::Model::getS ( ) const

This function returns the number of states of the world.

Returns
The total number of states.

◆ getTransitionFunction() [1/2]

const TransitionMatrix& AIToolbox::MDP::Model::getTransitionFunction ( ) const

This function returns the transition matrix for inspection.

Returns
The rewards matrix.

◆ getTransitionFunction() [2/2]

const Matrix2D& AIToolbox::MDP::Model::getTransitionFunction ( size_t  a) const

This function returns the transition function for a given action.

Parameters
aThe action requested.
Returns
The transition function for the input action.

◆ getTransitionProbability()

double AIToolbox::MDP::Model::getTransitionProbability ( size_t  s,
size_t  a,
size_t  s1 
) const

This function returns the stored transition probability for the specified transition.

Parameters
sThe initial state of the transition.
aThe action performed in the transition.
s1The final state of the transition.
Returns
The probability of the specified transition.

◆ isTerminal()

bool AIToolbox::MDP::Model::isTerminal ( size_t  s) const

This function returns whether a given state is a terminal.

Parameters
sThe state examined.
Returns
True if the input state is a terminal, false otherwise.

◆ sampleSR()

std::tuple<size_t, double> AIToolbox::MDP::Model::sampleSR ( size_t  s,
size_t  a 
) const

This function samples the MDP with the specified state action pair.

This function samples the model for simulated experience. The transition and reward functions are used to produce, from the state action pair inserted as arguments, a possible new state with respective reward. The new state is picked from all possible states that the MDP allows transitioning to, each with probability equal to the same probability of the transition in the model. After a new state is picked, the reward is the corresponding reward contained in the reward function.

Parameters
sThe state that needs to be sampled.
aThe action that needs to be sampled.
Returns
A tuple containing a new state and a reward.

◆ setDiscount()

void AIToolbox::MDP::Model::setDiscount ( double  d)

This function sets a new discount factor for the Model.

Parameters
dThe new discount factor for the Model.

◆ setRewardFunction() [1/2]

template<IsNaive3DMatrix R>
void AIToolbox::MDP::Model::setRewardFunction ( const R &  r)

This function replaces the Model reward function with the one provided.

The container needs to support data access through operator[]. In addition, the dimensions of the containers must match the ones used during construction (for three dimensions: S, A, S).

This is important, as this function DOES NOT perform any size checks on the external containers.

Internal values of the container will be converted to double, so these conversions must be possible.

Template Parameters
RThe external rewards container type.
Parameters
rThe external rewards container.

◆ setRewardFunction() [2/2]

void AIToolbox::MDP::Model::setRewardFunction ( const RewardMatrix r)

This function replaces the reward function with the one provided.

The dimensions of the container must match the ones used during construction(for two dimensions: S, A). BE CAREFUL.

This function does DOES NOT perform any size checks on the input.

Parameters
rThe external rewards container.

◆ setTransitionFunction() [1/2]

template<IsNaive3DMatrix T>
void AIToolbox::MDP::Model::setTransitionFunction ( const T &  t)

This function replaces the Model transition function with the one provided.

This function will throw a std::invalid_argument if the matrix provided does not contain valid probabilities.

The container needs to support data access through operator[]. In addition, the dimensions of the container must match the ones used during construction (for three dimensions: S,A,S).

This is important, as this function DOES NOT perform any size checks on the external container.

Internal values of the container will be converted to double, so these conversions must be possible.

Template Parameters
TThe external transition container type.
Parameters
tThe external transitions container.

◆ setTransitionFunction() [2/2]

void AIToolbox::MDP::Model::setTransitionFunction ( const TransitionMatrix t)

This function sets the transition function using a Eigen dense matrix.

This function will throw an std::invalid_argument if the matrix provided does not contain valid probabilities.

The dimensions of the container must match the ones used during construction (for three dimensions: A, S, S). BE CAREFUL. The matrices MUST be SxS, while the std::vector containing them MUST be of size A.

This function does DOES NOT perform any size checks on the input.

Parameters
tThe external transitions container.

The documentation for this class was generated from the following file: