Namespaces
	GridWorldUtils
	This namespace exists in order to allow referencing the Direction values directly.

Classes
class	BanditPolicyAdaptor
	This class extends a Bandit policy so that it can be called from MDP code. More...

class	DoubleQLearning
	This class represents the double QLearning algorithm. More...

class	Dyna2
	This class represents the Dyna2 algorithm. More...

class	DynaQ
	This class represents the DynaQ algorithm. More...

class	EpsilonPolicy

class	ExpectedSARSA
	This class represents the ExpectedSARSA algorithm. More...

class	Experience
	This class keeps track of registered events and rewards. More...

class	GenerativeModelPython
	This class allows to import generative models from Python. More...

class	GridWorld
	This class represents a simple rectangular gridworld. More...

class	HystereticQLearning
	This class represents the Hysteretic QLearning algorithm. More...

class	ImportanceSampling
	This class implements off-policy control via importance sampling. More...

class	ImportanceSamplingEvaluation
	This class implements off-policy evaluation via importance sampling. More...

class	LinearProgramming
	This class solves an MDP using Linear Programming. More...

class	MaximumLikelihoodModel
	This class models Experience as a Markov Decision Process using Maximum Likelihood. More...

class	MCTS
	This class represents the MCTS online planner using UCB1. More...

class	Model
	This class represents a Markov Decision Process. More...

class	OffPolicyBase
	This class contains all the boilerplates for off-policy methods. More...

class	OffPolicyControl
	This class is a general version of off-policy control. More...

class	OffPolicyEvaluation
	This class is a general version of off-policy evaluation. More...

class	PGAAPPPolicy
	This class implements the PGA-APP learning algorithm. More...

class	Policy
	This class represents an MDP Policy. More...

class	PolicyEvaluation
	This class applies the policy evaluation algorithm on a policy. More...

class	PolicyInterface
	Simple typedef for most of MDP's policy needs. More...

class	PolicyIteration
	This class represents the Policy Iteration algorithm. More...

class	PolicyWrapper
	This class provides an MDP Policy interface around a Matrix2D. More...

class	PrioritizedSweeping
	This class represents the PrioritizedSweeping algorithm. More...

class	QGreedyPolicy
	This class implements a greedy policy through a QFunction. More...

class	QL
	This class implements off-policy control via Q(lambda). More...

class	QLearning
	This class represents the QLearning algorithm. More...

class	QLEvaluation
	This class implements off-policy evaluation via Q(lambda). More...

class	QPolicyInterface
	This class is an interface to specify a policy through a QFunction. More...

class	QSoftmaxPolicy
	This class implements a softmax policy through a QFunction. More...

class	RetraceL
	This class implements off-policy control via Retrace(lambda). More...

class	RetraceLEvaluation
	This class implements off-policy evaluation via Retrace(lambda). More...

class	RLearning
	This class represents the RLearning algorithm. More...

class	SARSA
	This class represents the SARSA algorithm. More...

class	SARSAL
	This class represents the SARSAL algorithm. More...

class	SparseExperience
	This class keeps track of registered events and rewards. More...

class	SparseMaximumLikelihoodModel
	This class models Experience as a Markov Decision Process using Maximum Likelihood. More...

class	SparseModel
	This class represents a Markov Decision Process. More...

class	ThompsonModel
	This class models Experience as a Markov Decision Process using Thompson Sampling. More...

class	TreeBackupL
	This class implements off-policy control via Tree Backup(lambda). More...

class	TreeBackupLEvaluation
	This class implements off-policy evaluation via Tree Backup(lambda). More...

struct	ValueFunction

class	ValueIteration
	This class applies the value iteration algorithm on a Model. More...

class	WoLFPolicy
	This class implements the WoLF learning algorithm. More...

Typedefs
using	RandomPolicy = BanditPolicyAdaptor< Bandit::RandomPolicy >
	This class represents a random policy. More...

Functions
template<typename M , typename Gen >
requires AIToolbox::IsGenerativeModel< M > &&HasIntegralActionSpace< M > double	rollout (const M &m, std::remove_cvref_t< decltype(std::declval< M >().getS())> s, const unsigned maxDepth, Gen &rnd)
	This function performs a rollout from the input state. More...

AIToolbox::MDP::SparseModel	makeCliffProblem (const GridWorld &grid)
	This function sets up the cliff problem in a SparseModel. More...

AIToolbox::MDP::Model	makeCornerProblem (const GridWorld &grid, double stepUncertainty=0.8)
	This function sets up the corner problem in a Model. More...

Model	parseCassandra (std::istream &input)
	This function parses an MDP from a Cassandra formatted stream. More...

QFunction	makeQFunction (size_t S, size_t A)
	This function creates and zeroes a QFunction. More...

ValueFunction	makeValueFunction (size_t S)
	This function creates and zeroes a ValueFunction. More...

ValueFunction	bellmanOperator (const QFunction &q)
	This function converts a QFunction into the equivalent optimal ValueFunction. More...

void	bellmanOperatorInplace (const QFunction &q, ValueFunction *v)
	This function converts a QFunction into the equivalent optimal ValueFunction. More...

template<IsModel M>
Matrix2D	computeImmediateRewards (const M &model)
	This function computes all immediate rewards (state and action) of the MDP once for improved speed. More...

template<IsModel M>
QFunction	computeQFunction (const M &model, const Values &v, QFunction ir)
	This function computes the Model's QFunction from the values of a ValueFunction. More...

Variables
template<typename M >
concept	IsGenerativeModel
	This concept represents the required interface for a generative MDP. More...

template<typename M >
concept	IsModel
	This concept represents the required interface for a full MDP model. More...

template<typename M >
concept	IsModelEigen
	This concept represents the required interface that allows MDP algorithms to leverage Eigen. More...

template<typename E >
concept	IsExperience
	This concept represents the required interface for an experience recorder. More...

template<typename E >
concept	IsExperienceEigen
	This concept represents the required Experience interface that allows leverage Eigen. More...

Input stream utilities
These utilities read back data outputted with their respective operator<<() function. Note that the inputs must already be constructed with the correct size (state-action spaces), as the operator<<() do not save this information. These functions do not modify the input if the parsing fails.
std::istream &	operator>> (std::istream &is, Model &m)

std::istream &	operator>> (std::istream &is, SparseModel &m)

std::istream &	operator>> (std::istream &is, Experience &e)

std::istream &	operator>> (std::istream &is, SparseExperience &e)

std::istream &	operator>> (std::istream &is, Policy &p)
	This function reads a policy from a file. More...

Typedef Documentation

◆ Actions

using AIToolbox::MDP::Actions = typedef std::vector<size_t>

◆ QFunction

using AIToolbox::MDP::QFunction = typedef Matrix2D

◆ RandomPolicy

using AIToolbox::MDP::RandomPolicy = typedef BanditPolicyAdaptor<Bandit::RandomPolicy>

This class represents a random policy.

This class simply returns a random action every time it is polled.

◆ Values

using AIToolbox::MDP::Values = typedef Vector

Function Documentation

◆ bellmanOperator()

ValueFunction AIToolbox::MDP::bellmanOperator ( const QFunction & q )

This function converts a QFunction into the equivalent optimal ValueFunction.

The ValueFunction will contain, for each state, the best action and corresponding value as extracted from the input QFunction.

Parameters

q	The QFunction to convert.

Returns: The equivalent optimal ValueFunction.

◆ bellmanOperatorInplace()

void AIToolbox::MDP::bellmanOperatorInplace	(	const QFunction &	q,
		ValueFunction *	v
	)

This function converts a QFunction into the equivalent optimal ValueFunction.

This function is the same as bellmanOperator(), but performs its operations inplace. The input ValueFunction MUST already be sized appropriately for the input QFunction.

NOTE: This function DOES NOT perform any checks whatsoever on both the validity of the input pointer and on the size of the input ValueFunction. It assumes everything is already correct.

Parameters

q	The QFunction to convert.
v	A pre-allocated ValueFunction to populate with the optimal values.

◆ computeImmediateRewards()

template<IsModel M>

Matrix2D AIToolbox::MDP::computeImmediateRewards ( const M & model )

This function computes all immediate rewards (state and action) of the MDP once for improved speed.

This function pretty much creates the R(s,a) function for the input model. Normally we store the reward function as R(s,a,s'), but this matrix can be "compressed" into R(s,a) with no loss of meaningful information - with respect to the planning process.

Note that this function is more efficient with eigen models.

Parameters

model The MDP that needs to be solved.

Returns: The Models's immediate rewards.

◆ computeQFunction()

template<IsModel M>

QFunction AIToolbox::MDP::computeQFunction	(	const M &	model,
		const Values &	v,
		QFunction	ir
	)

This function computes the Model's QFunction from the values of a ValueFunction.

Note that this function is more efficient with eigen models.

Parameters

model	The MDP that needs to be solved.
v	The values of the ValueFunction for the future of the QFunction.
ir	The immediate rewards of the model, as created by computeImmediateRewards()

Returns: A new QFunction.

◆ makeCliffProblem()

AIToolbox::MDP::SparseModel AIToolbox::MDP::makeCliffProblem ( const GridWorld & grid )

inline

This function sets up the cliff problem in a SparseModel.

The gist of this problem is a small grid where the agent is suppose to walk from a state to another state. The only problem is that between the two points stands a cliff, and walking down the cliff results in a huge negative reward, and in the agent being reset at the start of the walk. Reaching the end results in a positive reward, while every step results in a small negative reward.

Movement here is fully deterministic.

+-----—+-----—+-----—+-----—+-----—+ | | | | | | | 0 | 1 | .... | X-2 | X-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | | | X | X+1 | .... | 2X-2 | 2X-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | | | 2X | 2X+1 | .... | 3X-2 | 3X-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | | | .... | .... | .... | .... | .... | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | | | (Y-1)X |(Y-1)X+1| .... | YX-2 | YX-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | (START)| | | | (GOAL) | | YX | ~~~~ | .... | ~~~~ | YX+1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ \ /

V The Cliff

To do this we use a grid above the cliff, and we attach two states under it.

Parameters

grid	The grid to use for the problem.

Returns: The SparseModel representing the problem.

◆ makeCornerProblem()

AIToolbox::MDP::Model AIToolbox::MDP::makeCornerProblem	(	const GridWorld &	grid,
		double	stepUncertainty = `0.8`
	)

inline

This function sets up the corner problem in a Model.

The gist of this problem is a small grid where the upper-left corner and the bottom-right corner are self-absorbing states. The agent can move in a top-left-down-right way, where each transition that is not self absorbing results in a reward penalty of -1. In addition the movements are not guaranteed: the agent succeeds only 80% of the time.

Thus the agent needs to be able to find the shortest path to one of the self-absorbing states from every other state.

The grid cells are numbered as following:

+-----—+-----—+-----—+-----—+-----—+ | (GOAL) | | | | | | 0 | 1 | .... | X-2 | X-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | | | X | X+1 | .... | 2X-2 | 2X-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | | | 2X | 2X+1 | .... | 3X-2 | 3X-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | | | .... | .... | .... | .... | .... | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | (GOAL) | | (Y-1)X |(Y-1)X+1| .... | YX-2 | YX-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+

Parameters

grid	The grid to use for the problem.
stepUncertainty	The probability that a movement action succeeds.

Returns: The Model representing the problem.

◆ makeQFunction()

QFunction AIToolbox::MDP::makeQFunction	(	size_t	S,
		size_t	A
	)

This function creates and zeroes a QFunction.

This function exists mostly to avoid remembering how to initialize Eigen matrices, and does nothing special.

Parameters

S	The state space of the QFunction.
A	The action space of the QFunction.

Returns: A newly built QFunction.

◆ makeValueFunction()

ValueFunction AIToolbox::MDP::makeValueFunction ( size_t S )

This function creates and zeroes a ValueFunction.

This function exists mostly to avoid remembering how to initialize Eigen vectors, and does nothing special.

Parameters

S	The state space of the ValueFunction.

Returns: A newly build ValueFunction.

◆ operator<<() [1/5]

std::ostream& AIToolbox::MDP::operator<<	(	std::ostream &	os,
		const Experience &	exp
	)

◆ operator<<() [2/5]

std::ostream& AIToolbox::MDP::operator<<	(	std::ostream &	os,
		const Model &	model
	)

◆ operator<<() [3/5]

std::ostream& AIToolbox::MDP::operator<<	(	std::ostream &	os,
		const PolicyInterface &	p
	)

◆ operator<<() [4/5]

std::ostream& AIToolbox::MDP::operator<<	(	std::ostream &	os,
		const SparseExperience &	exp
	)

◆ operator<<() [5/5]

std::ostream& AIToolbox::MDP::operator<<	(	std::ostream &	os,
		const SparseModel &	model
	)

◆ operator>>() [1/5]

std::istream& AIToolbox::MDP::operator>>	(	std::istream &	is,
		Experience &	e
	)

◆ operator>>() [2/5]

std::istream& AIToolbox::MDP::operator>>	(	std::istream &	is,
		Model &	m
	)

◆ operator>>() [3/5]

std::istream & AIToolbox::MDP::operator>>	(	std::istream &	is,
		Policy &	p
	)

This function reads a policy from a file.

This function reads files that have been outputted through operator<<(). If not enough values can be extracted from the stream, the function stops and the input policy is not modified. In addition, it checks whether the probability values are within 0 and 1.

State and actions are also verified, and this function does not accept a randomly shuffled policy file. The file must be sorted by state, and each state must be sorted by action.

As a layer of additional precaution, the function normalizes the policy once it has been read, to assure true probability distribution on the internal policy.

Parameters

is	The stream were the policy is being read from.
p	The policy that is being assigned.

Returns: The input stream.

◆ operator>>() [4/5]

std::istream& AIToolbox::MDP::operator>>	(	std::istream &	is,
		SparseExperience &	e
	)

◆ operator>>() [5/5]

std::istream& AIToolbox::MDP::operator>>	(	std::istream &	is,
		SparseModel &	m
	)

◆ parseCassandra()

Model AIToolbox::MDP::parseCassandra ( std::istream & input )

This function parses an MDP from a Cassandra formatted stream.

This function may throw std::runtime_errors depending on whether the input is correctly formed or not.

Parameters

input The input stream.

Returns: The parsed model.

◆ rollout()

template<typename M , typename Gen >

requires AIToolbox::IsGenerativeModel<M>&& HasIntegralActionSpace<M> double AIToolbox::MDP::rollout	(	const M &	m,
		std::remove_cvref_t< decltype(std::declval< M >().getS())>	s,
		const unsigned	maxDepth,
		Gen &	rnd
	)

This function performs a rollout from the input state.

This function performs a rollout until the agent either reaches the desired depth, or reaches a terminal state. The overall return is finally returned, from the point of the input state, and with the future rewards discounted appropriately.

This function is generally used in Monte-Carlo tree search-like algorithms, like MCTS or POMCP, to speed up discovery of promising actions without necessarily expanding their search tree. This avoids wasting lots of computation and memory on states far from our root that we will probably never see again, while at the same time still getting an estimate for the rest of the simulation.

Parameters

m	The model to use for the rollout.
s	The state to start the rollout from.
maxDepth	The maximum number of timesteps to look into the future.
rnd	A random number generator.

Returns: The discounted return from the input state.

Variable Documentation

◆ IsExperience

template<typename E >

concept AIToolbox::MDP::IsExperience

Initial value:

= requires (const E e, size_t s, size_t a) {
        { e.getVisits(s, a, s) } -> std::convertible_to<long unsigned>;
        { e.getVisitsSum(s, a) } -> std::convertible_to<long unsigned>;
        { e.getReward(s, a) }    -> std::convertible_to<double>;
        { e.getM2(s, a) }        -> std::convertible_to<double>;
    }

This concept represents the required interface for an experience recorder.

This concept tests for the interface of an experience recorder that can be used to create Reinforcement Learning MDP models.

The interface must be implemented and be public in the parameter class. The interface is the following:

long unsigned getVisits(size_t, size_t, size_t) const : Returns the number of times a particular transition has been experienced.
long unsigned getVisitsSum(size_t, size_t) const : Returns the number of times a transition starting with the input state-action pair.
double getReward(size_t, size_t) const : Returns the expected rewards obtained from a given state-action pair.
double getM2(size_t, size_t) const : Returns the M2 statistics of experienced rewards from the given state-action pair.

◆ IsExperienceEigen

template<typename E >

concept AIToolbox::MDP::IsExperienceEigen

Initial value:

= IsExperience<E> && requires (const E e, size_t a) {
        e.getVisitsSumTable();
        requires IsDerivedFromEigen<std::remove_cvref_t<decltype((e.getVisitsSumTable()))>>;
 
        e.getVisitsTable(a);
        requires IsDerivedFromEigen<std::remove_cvref_t<decltype((e.getVisitsTable(a)))>>;
 
        e.getRewardMatrix();
        requires IsDerivedFromEigen<std::remove_cvref_t<decltype((e.getRewardMatrix()))>>;
 
        e.getM2Matrix();
        requires IsDerivedFromEigen<std::remove_cvref_t<decltype((e.getM2Matrix()))>>;
    }

This concept represents the required Experience interface that allows leverage Eigen.

This concept tests for the interface of an MDP Experience which uses Eigen matrices internally. This should work for both dense and sparse Experiences.

The interface must be implemented and be public in the parameter class. The interface is the following:

T getVisitsTable(size_t a) const : Returns the visits table for a given action as a matrix SxS', where T is some Eigen matrix type.
R getRewardFunction() const : Returns the reward matrix as a matrix SxA', where R is some Eigen matrix type.
S getM2Matrix() const : Returns the M2 matrix as a matrix SxA', where S is some Eigen matrix type.

In addition the Experience needs to respect the basic Experience interface.

See also: IsExperience

◆ IsGenerativeModel

template<typename M >

concept AIToolbox::MDP::IsGenerativeModel

Initial value:

= AIToolbox::IsGenerativeModel<M> &&
                                HasIntegralStateSpace<M> &&
                                HasIntegralActionSpace<M> &&
                                HasFixedActionSpace<M>

This concept represents the required interface for a generative MDP.

This concept tests for the interface of a generative MDP model. The interface must be implemented and be public in the parameter class. The interface is the following:

size_t getS() const : Returns the number of states of the Model.
size_t getA() const : Returns the number of actions of the Model.
double getDiscount() const : Returns the discount factor of the Model.
std::tuple<size_t, double> sampleSR(size_t s, size_t a) const : Returns a sampled state-reward pair from (s,a)
bool isTerminal(size_t s) const : Reports whether the input state is a terminal state.

The concept re-uses the "base" concept and simply requires a fixed action space, and integral state and action spaces.

See also: AIToolbox::IsGenerativeModel

◆ IsModel

template<typename M >

concept AIToolbox::MDP::IsModel

Initial value:

= IsGenerativeModel<M> && requires (const M m, size_t s, size_t a) {
        { m.getTransitionProbability(s, a, s) } -> std::convertible_to<double>;
        { m.getExpectedReward(s, a, s) }        -> std::convertible_to<double>;
    }

This concept represents the required interface for a full MDP model.

This concept tests for the interface of an MDP model. The interface must be implemented and be public in the parameter class. The interface is the following:

double getTransitionProbability(size_t s, size_t a, size_t s1) const : Returns the transition probability given (s,a) to s1
double getExpectedReward(size_t s, size_t a, size_t s1) const : Returns the expected reward for transition (s,a) to s1

In addition the MDP needs to respect the interface for the MDP generative model.

See also: IsGenerativeModel

◆ IsModelEigen

template<typename M >

concept AIToolbox::MDP::IsModelEigen

Initial value:

= IsModel<M> && requires (const M m, size_t a) {
        m.getTransitionFunction(a);
        requires IsDerivedFromEigen<std::remove_cvref_t<decltype((m.getTransitionFunction(a)))>>;
 
        m.getRewardFunction();
        requires IsDerivedFromEigen<std::remove_cvref_t<decltype((m.getRewardFunction()))>>;
    }

This concept represents the required interface that allows MDP algorithms to leverage Eigen.

This struct tests for the interface of an MDP model which uses Eigen matrices internally. This should work for both dense and sparse models.

The interface must be implemented and be public in the parameter class. The interface is the following:

T getTransitionFunction(size_t a) const : Returns the transition function for a given action as a matrix SxS', where T is some Eigen matrix type.
R getRewardFunction() const : Returns the reward function as a matrix SxA', where R is some Eigen matrix type.

In addition the MDP needs to respect the interface for the MDP model.

See also: IsModel

Namespaces

Classes

Typedefs

Functions

Variables

Input stream utilities

Typedef Documentation

◆ Actions

◆ QFunction

◆ RandomPolicy

◆ Values

Function Documentation

◆ bellmanOperator()

◆ bellmanOperatorInplace()

◆ computeImmediateRewards()

◆ computeQFunction()

◆ makeCliffProblem()

◆ makeCornerProblem()

◆ makeQFunction()

◆ makeValueFunction()

◆ operator<<() [1/5]

◆ operator<<() [2/5]

◆ operator<<() [3/5]

◆ operator<<() [4/5]

◆ operator<<() [5/5]

◆ operator>>() [1/5]

◆ operator>>() [2/5]

◆ operator>>() [3/5]

◆ operator>>() [4/5]

◆ operator>>() [5/5]

◆ parseCassandra()

◆ rollout()

Variable Documentation

◆ IsExperience

◆ IsExperienceEigen

◆ IsGenerativeModel

◆ IsModel

◆ IsModelEigen