AIToolbox
A library that offers tools for AI problem solving.
AIToolbox::MDP Namespace Reference

Namespaces

 GridWorldUtils
 This namespace exists in order to allow referencing the Direction values directly.
 

Classes

class  BanditPolicyAdaptor
 This class extends a Bandit policy so that it can be called from MDP code. More...
 
class  DoubleQLearning
 This class represents the double QLearning algorithm. More...
 
class  Dyna2
 This class represents the Dyna2 algorithm. More...
 
class  DynaQ
 This class represents the DynaQ algorithm. More...
 
class  EpsilonPolicy
 
class  ExpectedSARSA
 This class represents the ExpectedSARSA algorithm. More...
 
class  Experience
 This class keeps track of registered events and rewards. More...
 
class  GenerativeModelPython
 This class allows to import generative models from Python. More...
 
class  GridWorld
 This class represents a simple rectangular gridworld. More...
 
class  HystereticQLearning
 This class represents the Hysteretic QLearning algorithm. More...
 
class  ImportanceSampling
 This class implements off-policy control via importance sampling. More...
 
class  ImportanceSamplingEvaluation
 This class implements off-policy evaluation via importance sampling. More...
 
class  LinearProgramming
 This class solves an MDP using Linear Programming. More...
 
class  MaximumLikelihoodModel
 This class models Experience as a Markov Decision Process using Maximum Likelihood. More...
 
class  MCTS
 This class represents the MCTS online planner using UCB1. More...
 
class  Model
 This class represents a Markov Decision Process. More...
 
class  OffPolicyBase
 This class contains all the boilerplates for off-policy methods. More...
 
class  OffPolicyControl
 This class is a general version of off-policy control. More...
 
class  OffPolicyEvaluation
 This class is a general version of off-policy evaluation. More...
 
class  PGAAPPPolicy
 This class implements the PGA-APP learning algorithm. More...
 
class  Policy
 This class represents an MDP Policy. More...
 
class  PolicyEvaluation
 This class applies the policy evaluation algorithm on a policy. More...
 
class  PolicyInterface
 Simple typedef for most of MDP's policy needs. More...
 
class  PolicyIteration
 This class represents the Policy Iteration algorithm. More...
 
class  PolicyWrapper
 This class provides an MDP Policy interface around a Matrix2D. More...
 
class  PrioritizedSweeping
 This class represents the PrioritizedSweeping algorithm. More...
 
class  QGreedyPolicy
 This class implements a greedy policy through a QFunction. More...
 
class  QL
 This class implements off-policy control via Q(lambda). More...
 
class  QLearning
 This class represents the QLearning algorithm. More...
 
class  QLEvaluation
 This class implements off-policy evaluation via Q(lambda). More...
 
class  QPolicyInterface
 This class is an interface to specify a policy through a QFunction. More...
 
class  QSoftmaxPolicy
 This class implements a softmax policy through a QFunction. More...
 
class  RetraceL
 This class implements off-policy control via Retrace(lambda). More...
 
class  RetraceLEvaluation
 This class implements off-policy evaluation via Retrace(lambda). More...
 
class  RLearning
 This class represents the RLearning algorithm. More...
 
class  SARSA
 This class represents the SARSA algorithm. More...
 
class  SARSAL
 This class represents the SARSAL algorithm. More...
 
class  SparseExperience
 This class keeps track of registered events and rewards. More...
 
class  SparseMaximumLikelihoodModel
 This class models Experience as a Markov Decision Process using Maximum Likelihood. More...
 
class  SparseModel
 This class represents a Markov Decision Process. More...
 
class  ThompsonModel
 This class models Experience as a Markov Decision Process using Thompson Sampling. More...
 
class  TreeBackupL
 This class implements off-policy control via Tree Backup(lambda). More...
 
class  TreeBackupLEvaluation
 This class implements off-policy evaluation via Tree Backup(lambda). More...
 
struct  ValueFunction
 
class  ValueIteration
 This class applies the value iteration algorithm on a Model. More...
 
class  WoLFPolicy
 This class implements the WoLF learning algorithm. More...
 

Typedefs

using RandomPolicy = BanditPolicyAdaptor< Bandit::RandomPolicy >
 This class represents a random policy. More...
 

Functions

template<typename M , typename Gen >
requires AIToolbox::IsGenerativeModel< M > &&HasIntegralActionSpace< M > double rollout (const M &m, std::remove_cvref_t< decltype(std::declval< M >().getS())> s, const unsigned maxDepth, Gen &rnd)
 This function performs a rollout from the input state. More...
 
AIToolbox::MDP::SparseModel makeCliffProblem (const GridWorld &grid)
 This function sets up the cliff problem in a SparseModel. More...
 
AIToolbox::MDP::Model makeCornerProblem (const GridWorld &grid, double stepUncertainty=0.8)
 This function sets up the corner problem in a Model. More...
 
Model parseCassandra (std::istream &input)
 This function parses an MDP from a Cassandra formatted stream. More...
 
QFunction makeQFunction (size_t S, size_t A)
 This function creates and zeroes a QFunction. More...
 
ValueFunction makeValueFunction (size_t S)
 This function creates and zeroes a ValueFunction. More...
 
ValueFunction bellmanOperator (const QFunction &q)
 This function converts a QFunction into the equivalent optimal ValueFunction. More...
 
void bellmanOperatorInplace (const QFunction &q, ValueFunction *v)
 This function converts a QFunction into the equivalent optimal ValueFunction. More...
 
template<IsModel M>
Matrix2D computeImmediateRewards (const M &model)
 This function computes all immediate rewards (state and action) of the MDP once for improved speed. More...
 
template<IsModel M>
QFunction computeQFunction (const M &model, const Values &v, QFunction ir)
 This function computes the Model's QFunction from the values of a ValueFunction. More...
 

Variables

template<typename M >
concept IsGenerativeModel
 This concept represents the required interface for a generative MDP. More...
 
template<typename M >
concept IsModel
 This concept represents the required interface for a full MDP model. More...
 
template<typename M >
concept IsModelEigen
 This concept represents the required interface that allows MDP algorithms to leverage Eigen. More...
 
template<typename E >
concept IsExperience
 This concept represents the required interface for an experience recorder. More...
 
template<typename E >
concept IsExperienceEigen
 This concept represents the required Experience interface that allows leverage Eigen. More...
 

Input stream utilities

These utilities read back data outputted with their respective operator<<() function.

Note that the inputs must already be constructed with the correct size (state-action spaces), as the operator<<() do not save this information.

These functions do not modify the input if the parsing fails.

std::istream & operator>> (std::istream &is, Model &m)
 
std::istream & operator>> (std::istream &is, SparseModel &m)
 
std::istream & operator>> (std::istream &is, Experience &e)
 
std::istream & operator>> (std::istream &is, SparseExperience &e)
 
std::istream & operator>> (std::istream &is, Policy &p)
 This function reads a policy from a file. More...
 

Typedef Documentation

◆ Actions

using AIToolbox::MDP::Actions = typedef std::vector<size_t>

◆ QFunction

◆ RandomPolicy

This class represents a random policy.

This class simply returns a random action every time it is polled.

◆ Values

using AIToolbox::MDP::Values = typedef Vector

Function Documentation

◆ bellmanOperator()

ValueFunction AIToolbox::MDP::bellmanOperator ( const QFunction q)

This function converts a QFunction into the equivalent optimal ValueFunction.

The ValueFunction will contain, for each state, the best action and corresponding value as extracted from the input QFunction.

Parameters
qThe QFunction to convert.
Returns
The equivalent optimal ValueFunction.

◆ bellmanOperatorInplace()

void AIToolbox::MDP::bellmanOperatorInplace ( const QFunction q,
ValueFunction v 
)

This function converts a QFunction into the equivalent optimal ValueFunction.

This function is the same as bellmanOperator(), but performs its operations inplace. The input ValueFunction MUST already be sized appropriately for the input QFunction.

NOTE: This function DOES NOT perform any checks whatsoever on both the validity of the input pointer and on the size of the input ValueFunction. It assumes everything is already correct.

Parameters
qThe QFunction to convert.
vA pre-allocated ValueFunction to populate with the optimal values.

◆ computeImmediateRewards()

template<IsModel M>
Matrix2D AIToolbox::MDP::computeImmediateRewards ( const M &  model)

This function computes all immediate rewards (state and action) of the MDP once for improved speed.

This function pretty much creates the R(s,a) function for the input model. Normally we store the reward function as R(s,a,s'), but this matrix can be "compressed" into R(s,a) with no loss of meaningful information - with respect to the planning process.

Note that this function is more efficient with eigen models.

Parameters
modelThe MDP that needs to be solved.
Returns
The Models's immediate rewards.

◆ computeQFunction()

template<IsModel M>
QFunction AIToolbox::MDP::computeQFunction ( const M &  model,
const Values v,
QFunction  ir 
)

This function computes the Model's QFunction from the values of a ValueFunction.

Note that this function is more efficient with eigen models.

Parameters
modelThe MDP that needs to be solved.
vThe values of the ValueFunction for the future of the QFunction.
irThe immediate rewards of the model, as created by computeImmediateRewards()
Returns
A new QFunction.

◆ makeCliffProblem()

AIToolbox::MDP::SparseModel AIToolbox::MDP::makeCliffProblem ( const GridWorld grid)
inline

This function sets up the cliff problem in a SparseModel.

The gist of this problem is a small grid where the agent is suppose to walk from a state to another state. The only problem is that between the two points stands a cliff, and walking down the cliff results in a huge negative reward, and in the agent being reset at the start of the walk. Reaching the end results in a positive reward, while every step results in a small negative reward.

Movement here is fully deterministic.

+-----—+-----—+-----—+-----—+-----—+ | | | | | | | 0 | 1 | .... | X-2 | X-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | | | X | X+1 | .... | 2X-2 | 2X-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | | | 2X | 2X+1 | .... | 3X-2 | 3X-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | | | .... | .... | .... | .... | .... | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | | | (Y-1)X |(Y-1)X+1| .... | YX-2 | YX-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | (START)| | | | (GOAL) | | YX | ~~~~ | .... | ~~~~ | YX+1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ \ /


V The Cliff

To do this we use a grid above the cliff, and we attach two states under it.

Parameters
gridThe grid to use for the problem.
Returns
The SparseModel representing the problem.

◆ makeCornerProblem()

AIToolbox::MDP::Model AIToolbox::MDP::makeCornerProblem ( const GridWorld grid,
double  stepUncertainty = 0.8 
)
inline

This function sets up the corner problem in a Model.

The gist of this problem is a small grid where the upper-left corner and the bottom-right corner are self-absorbing states. The agent can move in a top-left-down-right way, where each transition that is not self absorbing results in a reward penalty of -1. In addition the movements are not guaranteed: the agent succeeds only 80% of the time.

Thus the agent needs to be able to find the shortest path to one of the self-absorbing states from every other state.

The grid cells are numbered as following:

+-----—+-----—+-----—+-----—+-----—+ | (GOAL) | | | | | | 0 | 1 | .... | X-2 | X-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | | | X | X+1 | .... | 2X-2 | 2X-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | | | 2X | 2X+1 | .... | 3X-2 | 3X-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | | | .... | .... | .... | .... | .... | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | (GOAL) | | (Y-1)X |(Y-1)X+1| .... | YX-2 | YX-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+

Parameters
gridThe grid to use for the problem.
stepUncertaintyThe probability that a movement action succeeds.
Returns
The Model representing the problem.

◆ makeQFunction()

QFunction AIToolbox::MDP::makeQFunction ( size_t  S,
size_t  A 
)

This function creates and zeroes a QFunction.

This function exists mostly to avoid remembering how to initialize Eigen matrices, and does nothing special.

Parameters
SThe state space of the QFunction.
AThe action space of the QFunction.
Returns
A newly built QFunction.

◆ makeValueFunction()

ValueFunction AIToolbox::MDP::makeValueFunction ( size_t  S)

This function creates and zeroes a ValueFunction.

This function exists mostly to avoid remembering how to initialize Eigen vectors, and does nothing special.

Parameters
SThe state space of the ValueFunction.
Returns
A newly build ValueFunction.

◆ operator<<() [1/5]

std::ostream& AIToolbox::MDP::operator<< ( std::ostream &  os,
const Experience exp 
)

◆ operator<<() [2/5]

std::ostream& AIToolbox::MDP::operator<< ( std::ostream &  os,
const Model model 
)

◆ operator<<() [3/5]

std::ostream& AIToolbox::MDP::operator<< ( std::ostream &  os,
const PolicyInterface p 
)

◆ operator<<() [4/5]

std::ostream& AIToolbox::MDP::operator<< ( std::ostream &  os,
const SparseExperience exp 
)

◆ operator<<() [5/5]

std::ostream& AIToolbox::MDP::operator<< ( std::ostream &  os,
const SparseModel model 
)

◆ operator>>() [1/5]

std::istream& AIToolbox::MDP::operator>> ( std::istream &  is,
Experience e 
)

◆ operator>>() [2/5]

std::istream& AIToolbox::MDP::operator>> ( std::istream &  is,
Model m 
)

◆ operator>>() [3/5]

std::istream & AIToolbox::MDP::operator>> ( std::istream &  is,
Policy p 
)

This function reads a policy from a file.

This function reads files that have been outputted through operator<<(). If not enough values can be extracted from the stream, the function stops and the input policy is not modified. In addition, it checks whether the probability values are within 0 and 1.

State and actions are also verified, and this function does not accept a randomly shuffled policy file. The file must be sorted by state, and each state must be sorted by action.

As a layer of additional precaution, the function normalizes the policy once it has been read, to assure true probability distribution on the internal policy.

Parameters
isThe stream were the policy is being read from.
pThe policy that is being assigned.
Returns
The input stream.

◆ operator>>() [4/5]

std::istream& AIToolbox::MDP::operator>> ( std::istream &  is,
SparseExperience e 
)

◆ operator>>() [5/5]

std::istream& AIToolbox::MDP::operator>> ( std::istream &  is,
SparseModel m 
)

◆ parseCassandra()

Model AIToolbox::MDP::parseCassandra ( std::istream &  input)

This function parses an MDP from a Cassandra formatted stream.

This function may throw std::runtime_errors depending on whether the input is correctly formed or not.

Parameters
inputThe input stream.
Returns
The parsed model.

◆ rollout()

template<typename M , typename Gen >
requires AIToolbox::IsGenerativeModel<M>&& HasIntegralActionSpace<M> double AIToolbox::MDP::rollout ( const M &  m,
std::remove_cvref_t< decltype(std::declval< M >().getS())>  s,
const unsigned  maxDepth,
Gen &  rnd 
)

This function performs a rollout from the input state.

This function performs a rollout until the agent either reaches the desired depth, or reaches a terminal state. The overall return is finally returned, from the point of the input state, and with the future rewards discounted appropriately.

This function is generally used in Monte-Carlo tree search-like algorithms, like MCTS or POMCP, to speed up discovery of promising actions without necessarily expanding their search tree. This avoids wasting lots of computation and memory on states far from our root that we will probably never see again, while at the same time still getting an estimate for the rest of the simulation.

Parameters
mThe model to use for the rollout.
sThe state to start the rollout from.
maxDepthThe maximum number of timesteps to look into the future.
rndA random number generator.
Returns
The discounted return from the input state.

Variable Documentation

◆ IsExperience

template<typename E >
concept AIToolbox::MDP::IsExperience
Initial value:
= requires (const E e, size_t s, size_t a) {
{ e.getVisits(s, a, s) } -> std::convertible_to<long unsigned>;
{ e.getVisitsSum(s, a) } -> std::convertible_to<long unsigned>;
{ e.getReward(s, a) } -> std::convertible_to<double>;
{ e.getM2(s, a) } -> std::convertible_to<double>;
}

This concept represents the required interface for an experience recorder.

This concept tests for the interface of an experience recorder that can be used to create Reinforcement Learning MDP models.

The interface must be implemented and be public in the parameter class. The interface is the following:

  • long unsigned getVisits(size_t, size_t, size_t) const : Returns the number of times a particular transition has been experienced.
  • long unsigned getVisitsSum(size_t, size_t) const : Returns the number of times a transition starting with the input state-action pair.
  • double getReward(size_t, size_t) const : Returns the expected rewards obtained from a given state-action pair.
  • double getM2(size_t, size_t) const : Returns the M2 statistics of experienced rewards from the given state-action pair.

◆ IsExperienceEigen

template<typename E >
concept AIToolbox::MDP::IsExperienceEigen
Initial value:
= IsExperience<E> && requires (const E e, size_t a) {
e.getVisitsSumTable();
requires IsDerivedFromEigen<std::remove_cvref_t<decltype((e.getVisitsSumTable()))>>;
e.getVisitsTable(a);
requires IsDerivedFromEigen<std::remove_cvref_t<decltype((e.getVisitsTable(a)))>>;
e.getRewardMatrix();
requires IsDerivedFromEigen<std::remove_cvref_t<decltype((e.getRewardMatrix()))>>;
e.getM2Matrix();
requires IsDerivedFromEigen<std::remove_cvref_t<decltype((e.getM2Matrix()))>>;
}

This concept represents the required Experience interface that allows leverage Eigen.

This concept tests for the interface of an MDP Experience which uses Eigen matrices internally. This should work for both dense and sparse Experiences.

The interface must be implemented and be public in the parameter class. The interface is the following:

  • T getVisitsTable(size_t a) const : Returns the visits table for a given action as a matrix SxS', where T is some Eigen matrix type.
  • R getRewardFunction() const : Returns the reward matrix as a matrix SxA', where R is some Eigen matrix type.
  • S getM2Matrix() const : Returns the M2 matrix as a matrix SxA', where S is some Eigen matrix type.

In addition the Experience needs to respect the basic Experience interface.

See also
IsExperience

◆ IsGenerativeModel

template<typename M >
concept AIToolbox::MDP::IsGenerativeModel
Initial value:
= AIToolbox::IsGenerativeModel<M> &&
HasIntegralStateSpace<M> &&
HasIntegralActionSpace<M> &&
HasFixedActionSpace<M>

This concept represents the required interface for a generative MDP.

This concept tests for the interface of a generative MDP model. The interface must be implemented and be public in the parameter class. The interface is the following:

  • size_t getS() const : Returns the number of states of the Model.
  • size_t getA() const : Returns the number of actions of the Model.
  • double getDiscount() const : Returns the discount factor of the Model.
  • std::tuple<size_t, double> sampleSR(size_t s, size_t a) const : Returns a sampled state-reward pair from (s,a)
  • bool isTerminal(size_t s) const : Reports whether the input state is a terminal state.

The concept re-uses the "base" concept and simply requires a fixed action space, and integral state and action spaces.

See also
AIToolbox::IsGenerativeModel

◆ IsModel

template<typename M >
concept AIToolbox::MDP::IsModel
Initial value:
= IsGenerativeModel<M> && requires (const M m, size_t s, size_t a) {
{ m.getTransitionProbability(s, a, s) } -> std::convertible_to<double>;
{ m.getExpectedReward(s, a, s) } -> std::convertible_to<double>;
}

This concept represents the required interface for a full MDP model.

This concept tests for the interface of an MDP model. The interface must be implemented and be public in the parameter class. The interface is the following:

  • double getTransitionProbability(size_t s, size_t a, size_t s1) const : Returns the transition probability given (s,a) to s1
  • double getExpectedReward(size_t s, size_t a, size_t s1) const : Returns the expected reward for transition (s,a) to s1

In addition the MDP needs to respect the interface for the MDP generative model.

See also
IsGenerativeModel

◆ IsModelEigen

template<typename M >
concept AIToolbox::MDP::IsModelEigen
Initial value:
= IsModel<M> && requires (const M m, size_t a) {
m.getTransitionFunction(a);
requires IsDerivedFromEigen<std::remove_cvref_t<decltype((m.getTransitionFunction(a)))>>;
m.getRewardFunction();
requires IsDerivedFromEigen<std::remove_cvref_t<decltype((m.getRewardFunction()))>>;
}

This concept represents the required interface that allows MDP algorithms to leverage Eigen.

This struct tests for the interface of an MDP model which uses Eigen matrices internally. This should work for both dense and sparse models.

The interface must be implemented and be public in the parameter class. The interface is the following:

  • T getTransitionFunction(size_t a) const : Returns the transition function for a given action as a matrix SxS', where T is some Eigen matrix type.
  • R getRewardFunction() const : Returns the reward function as a matrix SxA', where R is some Eigen matrix type.

In addition the MDP needs to respect the interface for the MDP model.

See also
IsModel
AIToolbox::IsDerivedFromEigen
concept IsDerivedFromEigen
This concept simplifies checking for non-void.
Definition: TypeTraits.hpp:66