This class models Experience as a Markov Decision Process using Maximum Likelihood. More...

#include <AIToolbox/MDP/SparseMaximumLikelihoodModel.hpp>

Public Types
using	TransitionMatrix = SparseMatrix3D

using	RewardMatrix = SparseMatrix2D

Public Member Functions
	SparseMaximumLikelihoodModel (const E &exp, double discount=1.0, bool sync=false)
	Constructor using previous Experience. More...

void	setDiscount (double d)
	This function sets a new discount factor for the Model. More...

void	sync ()
	This function syncs the whole SparseMaximumLikelihoodModel to the underlying Experience. More...

void	sync (size_t s, size_t a)
	This function syncs a state action pair in the SparseMaximumLikelihoodModel to the underlying Experience. More...

void	sync (size_t s, size_t a, size_t s1)
	This function syncs a state action pair in the SparseMaximumLikelihoodModel to the underlying Experience in the fastest possible way. More...

std::tuple< size_t, double >	sampleSR (size_t s, size_t a) const
	This function samples the MDP for the specified state action pair. More...

size_t	getS () const
	This function returns the number of states of the world. More...

size_t	getA () const
	This function returns the number of available actions to the agent. More...

double	getDiscount () const
	This function returns the currently set discount factor. More...

const E &	getExperience () const
	This function enables inspection of the underlying Experience of the SparseMaximumLikelihoodModel. More...

double	getTransitionProbability (size_t s, size_t a, size_t s1) const
	This function returns the stored transition probability for the specified transition. More...

double	getExpectedReward (size_t s, size_t a, size_t s1) const
	This function returns the stored expected reward for the specified transition. More...

const TransitionMatrix &	getTransitionFunction () const
	This function returns the transition matrix for inspection. More...

const SparseMatrix2D &	getTransitionFunction (size_t a) const
	This function returns the transition function for a given action. More...

const RewardMatrix &	getRewardFunction () const
	This function returns the rewards matrix for inspection. More...

bool	isTerminal (size_t s) const
	This function returns whether a given state is a terminal. More...

Detailed Description

template<IsExperience E>
class AIToolbox::MDP::SparseMaximumLikelihoodModel< E >

This class models Experience as a Markov Decision Process using Maximum Likelihood.

Often an MDP is not known in advance. It is known that it can assume a certain set of states, and that a certain set of actions are available to the agent, but not much more. Thus, in these cases, the goal is not only to find out the best policy for the MDP we have, but at the same time learn the actual transition and reward functions of such a model. This task is called "reinforcement learning".

This class helps with this. A naive approach in reinforcement learning is to keep track, for each action, of its results, and deduce transition probabilities and rewards based on the data collected in such a way. This class does just this, using Maximum Likelihood Estimates to decide what the transition probabilities and rewards are.

This class maps an Experience object to the most likely transition reward functions that produced it. The transition function is guaranteed to be a correct probability function, as in the sum of the probabilities of all transitions from a particular state and a particular action is always 1. Each instance is not directly synced with the supplied Experience object. This is to avoid possible overheads, as the user can optimize better depending on their use case. See sync().

When little data is available, the deduced transition and reward functions may be significantly subject to noise. A possible way to improve on this is to artificially bias the data as to skew it towards certain distributions. This could be done if some knowledge of the model (even approximate) is known, in order to speed up the learning process. Another way is to assume that all transitions are possible, add data to support that claim, and simply wait until the averages converge to the true values. Another thing that can be done is to associate with each fake datapoint an high reward: this will skew the agent into trying out new actions, thinking it will obtained the high rewards. This is able to obtain automatically a good degree of exploration in the early stages of an episode. Such a technique is called "optimistic initialization".

Whether any of these techniques work or not can definitely depend on the model you are trying to approximate. Trying out things is good!

The difference between this class and the MDP::MaximumLikelihoodModel class is that this class stores transitions and rewards in sparse matrices. This results in a possibly slower access to individual probabilities and rewards, but immeasurably speeds up computation with some classes of planning algorithms in case the number of useful transitions is very small with respect to the total theoretic state action space of SxAxS. It also of course incredibly reduces memory consumption in such cases, which may also improve speed by effect of improved caching.

Member Typedef Documentation

◆ RewardMatrix

template<IsExperience E>

using AIToolbox::MDP::SparseMaximumLikelihoodModel< E >::RewardMatrix = SparseMatrix2D

◆ TransitionMatrix

template<IsExperience E>

using AIToolbox::MDP::SparseMaximumLikelihoodModel< E >::TransitionMatrix = SparseMatrix3D

Constructor & Destructor Documentation

◆ SparseMaximumLikelihoodModel()

template<IsExperience E>

AIToolbox::MDP::SparseMaximumLikelihoodModel< E >::SparseMaximumLikelihoodModel	(	const E &	exp,
		double	discount = `1.0`,
		bool	sync = `false`
	)

Constructor using previous Experience.

This constructor selects the Experience that will be used to learn an MDP Model from the data, and initializes internal Model data.

The user can choose whether he wants to directly sync the SparseMaximumLikelihoodModel to the underlying Experience, or delay it for later.

In the latter case the default transition function defines a transition of probability 1 for each state to itself, no matter the action.

In general it would be better to add some amount of bias to the Experience so that when a new state-action pair is tried, the SparseMaximumLikelihoodModel doesn't automatically compute 100% probability of transitioning to the resulting state, but smooths into it. This may depend on your problem though.

The default reward function is 0.

Parameters

exp	The base Experience of the model.
discount	The discount used in solving methods.
sync	Whether to sync with the Experience immediately or delay it.

Member Function Documentation

◆ getA()

template<IsExperience E>

size_t AIToolbox::MDP::SparseMaximumLikelihoodModel< E >::getA

This function returns the number of available actions to the agent.

Returns: The total number of actions.

◆ getDiscount()

template<IsExperience E>

double AIToolbox::MDP::SparseMaximumLikelihoodModel< E >::getDiscount

This function returns the currently set discount factor.

Returns: The currently set discount factor.

◆ getExpectedReward()

template<IsExperience E>

double AIToolbox::MDP::SparseMaximumLikelihoodModel< E >::getExpectedReward	(	size_t	s,
		size_t	a,
		size_t	s1
	)		const

This function returns the stored expected reward for the specified transition.

Parameters

s	The initial state of the transition.
a	The action performed in the transition.
s1	The final state of the transition.

Returns: The expected reward of the specified transition.

◆ getExperience()

template<IsExperience E>

const E & AIToolbox::MDP::SparseMaximumLikelihoodModel< E >::getExperience

This function enables inspection of the underlying Experience of the SparseMaximumLikelihoodModel.

Returns: The underlying Experience of the SparseMaximumLikelihoodModel.

◆ getRewardFunction()

template<IsExperience E>

const SparseMaximumLikelihoodModel< E >::RewardMatrix & AIToolbox::MDP::SparseMaximumLikelihoodModel< E >::getRewardFunction

This function returns the rewards matrix for inspection.

Returns: The rewards matrix.

◆ getS()

template<IsExperience E>

size_t AIToolbox::MDP::SparseMaximumLikelihoodModel< E >::getS

This function returns the number of states of the world.

Returns: The total number of states.

◆ getTransitionFunction() [1/2]

template<IsExperience E>

const SparseMaximumLikelihoodModel< E >::TransitionMatrix & AIToolbox::MDP::SparseMaximumLikelihoodModel< E >::getTransitionFunction

This function returns the transition matrix for inspection.

Returns: The transition matrix.

◆ getTransitionFunction() [2/2]

template<IsExperience E>

const SparseMatrix2D & AIToolbox::MDP::SparseMaximumLikelihoodModel< E >::getTransitionFunction ( size_t a ) const

This function returns the transition function for a given action.

Parameters

a	The action requested.

Returns: The transition function for the input action.

◆ getTransitionProbability()

template<IsExperience E>

double AIToolbox::MDP::SparseMaximumLikelihoodModel< E >::getTransitionProbability	(	size_t	s,
		size_t	a,
		size_t	s1
	)		const

This function returns the stored transition probability for the specified transition.

Parameters

s	The initial state of the transition.
a	The action performed in the transition.
s1	The final state of the transition.

Returns: The probability of the specified transition.

◆ isTerminal()

template<IsExperience E>

bool AIToolbox::MDP::SparseMaximumLikelihoodModel< E >::isTerminal ( size_t s ) const

This function returns whether a given state is a terminal.

Parameters

s	The state examined.

Returns: True if the input state is a terminal, false otherwise.

◆ sampleSR()

template<IsExperience E>

std::tuple< size_t, double > AIToolbox::MDP::SparseMaximumLikelihoodModel< E >::sampleSR	(	size_t	s,
		size_t	a
	)		const

This function samples the MDP for the specified state action pair.

This function samples the model for simulate experience. The transition and reward functions are used to produce, from the state action pair inserted as arguments, a possible new state with respective reward. The new state is picked from all possible states that the MDP allows transitioning to, each with probability equal to the same probability of the transition in the model. After a new state is picked, the reward is the corresponding reward contained in the reward function.

Parameters

s	The state that needs to be sampled.
a	The action that needs to be sampled.

Returns: A tuple containing a new state and a reward.

◆ setDiscount()

template<IsExperience E>

void AIToolbox::MDP::SparseMaximumLikelihoodModel< E >::setDiscount ( double d )

This function sets a new discount factor for the Model.

Parameters

d	The new discount factor for the Model.

◆ sync() [1/3]

template<IsExperience E>

void AIToolbox::MDP::SparseMaximumLikelihoodModel< E >::sync

This function syncs the whole SparseMaximumLikelihoodModel to the underlying Experience.

Since use cases in AI are very varied, one may not want to update its SparseMaximumLikelihoodModel for each single transition experienced by the agent. To avoid this we leave to the user the task of syncing between the underlying Experience and the SparseMaximumLikelihoodModel, as he/she sees fit.

After this function is run the transition and reward functions will accurately reflect the state of the underlying Experience.

◆ sync() [2/3]

template<IsExperience E>

void AIToolbox::MDP::SparseMaximumLikelihoodModel< E >::sync	(	size_t	s,
		size_t	a
	)

This function syncs a state action pair in the SparseMaximumLikelihoodModel to the underlying Experience.

Since use cases in AI are very varied, one may not want to update its SparseMaximumLikelihoodModel for each single transition experienced by the agent. To avoid this we leave to the user the task of syncing between the underlying Experience and the SparseMaximumLikelihoodModel, as he/she sees fit.

This function updates a single state action pair with the underlying Experience. This function is offered to avoid having to recompute the whole SparseMaximumLikelihoodModel if the user knows that only few transitions have been experienced by the agent.

After this function is run the transition and reward functions will accurately reflect the state of the underlying Experience for the specified state action pair.

Parameters

s	The state that needs to be synced.
a	The action that needs to be synced.

◆ sync() [3/3]

template<IsExperience E>

void AIToolbox::MDP::SparseMaximumLikelihoodModel< E >::sync	(	size_t	s,
		size_t	a,
		size_t	s1
	)

This function syncs a state action pair in the SparseMaximumLikelihoodModel to the underlying Experience in the fastest possible way.

This function updates a state action pair given that the last increased transition in the underlying Experience is the triplet s, a, s1. In addition, this function only works if it needs to add information from this single new point of information (if more has changed from the last sync, use sync(s,a) ). The performance boost that this function obtains increases with the increase of the number of states in the model.

Parameters

s	The state that needs to be synced.
a	The action that needs to be synced.
s1	The final state of the transition that got updated in the Experience.

The documentation for this class was generated from the following file:

include/AIToolbox/MDP/SparseMaximumLikelihoodModel.hpp

Public Types

Public Member Functions

Detailed Description

template<IsExperience E> class AIToolbox::MDP::SparseMaximumLikelihoodModel< E >

Member Typedef Documentation

◆ RewardMatrix

◆ TransitionMatrix

Constructor & Destructor Documentation

◆ SparseMaximumLikelihoodModel()

Member Function Documentation

◆ getA()

◆ getDiscount()

◆ getExpectedReward()

◆ getExperience()

◆ getRewardFunction()

◆ getS()

◆ getTransitionFunction() [1/2]

◆ getTransitionFunction() [2/2]

◆ getTransitionProbability()

◆ isTerminal()

◆ sampleSR()

◆ setDiscount()

◆ sync() [1/3]

◆ sync() [2/3]

◆ sync() [3/3]

template<IsExperience E>
class AIToolbox::MDP::SparseMaximumLikelihoodModel< E >