AIToolbox
A library that offers tools for AI problem solving.
AIToolbox::MDP::ThompsonModel< E > Class Template Reference

This class models Experience as a Markov Decision Process using Thompson Sampling. More...

#include <AIToolbox/MDP/ThompsonModel.hpp>

Public Types

using TransitionMatrix = Matrix3D
 
using RewardMatrix = Matrix2D
 

Public Member Functions

 ThompsonModel (const E &exp, double discount=1.0)
 Constructor using previous Experience. More...
 
void setDiscount (double d)
 This function sets a new discount factor for the Model. More...
 
void sync ()
 This function syncs the whole ThompsonModel to the underlying Experience. More...
 
void sync (size_t s, size_t a)
 This function syncs a state action pair in the ThompsonModel to the underlying Experience. More...
 
std::tuple< size_t, double > sampleSR (size_t s, size_t a) const
 This function samples the MDP for the specified state action pair. More...
 
size_t getS () const
 This function returns the number of states of the world. More...
 
size_t getA () const
 This function returns the number of available actions to the agent. More...
 
double getDiscount () const
 This function returns the currently set discount factor. More...
 
const E & getExperience () const
 This function enables inspection of the underlying Experience of the ThompsonModel. More...
 
double getTransitionProbability (size_t s, size_t a, size_t s1) const
 This function returns the stored transition probability for the specified transition. More...
 
double getExpectedReward (size_t s, size_t a, size_t s1) const
 This function returns the stored expected reward for the specified transition. More...
 
const TransitionMatrixgetTransitionFunction () const
 This function returns the transition matrix for inspection. More...
 
const Matrix2DgetTransitionFunction (size_t a) const
 This function returns the transition function for a given action. More...
 
const RewardMatrixgetRewardFunction () const
 This function returns the rewards matrix for inspection. More...
 
bool isTerminal (size_t s) const
 This function returns whether a given state is a terminal. More...
 

Detailed Description

template<IsExperience E>
class AIToolbox::MDP::ThompsonModel< E >

This class models Experience as a Markov Decision Process using Thompson Sampling.

Often an MDP is not known in advance. It is known that it can assume a certain set of states, and that a certain set of actions are available to the agent, but not much more. Thus, in these cases, the goal is not only to find out the best policy for the MDP we have, but at the same time learn the actual transition and reward functions of such a model. This task is called "reinforcement learning".

This class helps with this. A naive approach in reinforcement learning is to keep track, for each action, of its results, and deduce transition probabilities and rewards based on the data collected in such a way. This class does just this, using Thompson Sampling to decide what the transition probabilities and rewards are.

This class maps an Experience object using a series of Dirichlet (for transitions) and Student-t (for rewards) distributions, one per state-action pairs. The user can sample from these distributions to obtain transition and reward functions. As more data is accumulated, the distributions can be resampled so that these functions better reflect the data. The syncing operation MUST be done manually as it is slightly expensive (it must sample a distribution with S parameters and normalize the result). See sync().

When little data is available, syncing will generally result in transition functions where most transitions are assumed possible. Priors can be given to the Experience as "fictional" experience so as to bias the result. Additionally, this class uses Jeffreys prior when sampling. For a Dirichlet distribution, this is equivalent to having 0.5 priors on all parameters (which can't be set via the Experience, as they are not integers). For the reward, the posteriors are student-t distributions. A Jeffreys prior ensures that the sampling is non-biased through any transformation of the original parameters.

The strength of ThompsonModel is that it can replace traditional exploration techniques, embedding our beliefs of what transitions and rewards are possible directly in the sampled functions.

Whether any of these techniques work or not can definitely depend on the model you are trying to approximate. Trying out things is good!

Member Typedef Documentation

◆ RewardMatrix

template<IsExperience E>
using AIToolbox::MDP::ThompsonModel< E >::RewardMatrix = Matrix2D

◆ TransitionMatrix

template<IsExperience E>
using AIToolbox::MDP::ThompsonModel< E >::TransitionMatrix = Matrix3D

Constructor & Destructor Documentation

◆ ThompsonModel()

template<IsExperience E>
AIToolbox::MDP::ThompsonModel< E >::ThompsonModel ( const E &  exp,
double  discount = 1.0 
)

Constructor using previous Experience.

This constructor selects the Experience that will be used to learn an MDP Model from the data, and initializes internal Model data.

Differently from MaximumLikelihoodModel, we always sync at first, since we will sample from a Dirichlet distribution whether we have data or not.

All transition parameters read from the Experience will be incremented by 0.5, since we are using Jeffreys prior.

The rewards will be sampled from student-t distributions.

Parameters
expThe base Experience of the model.
discountThe discount used in solving methods.

Member Function Documentation

◆ getA()

template<IsExperience E>
size_t AIToolbox::MDP::ThompsonModel< E >::getA

This function returns the number of available actions to the agent.

Returns
The total number of actions.

◆ getDiscount()

template<IsExperience E>
double AIToolbox::MDP::ThompsonModel< E >::getDiscount

This function returns the currently set discount factor.

Returns
The currently set discount factor.

◆ getExpectedReward()

template<IsExperience E>
double AIToolbox::MDP::ThompsonModel< E >::getExpectedReward ( size_t  s,
size_t  a,
size_t  s1 
) const

This function returns the stored expected reward for the specified transition.

Parameters
sThe initial state of the transition.
aThe action performed in the transition.
s1The final state of the transition.
Returns
The expected reward of the specified transition.

◆ getExperience()

template<IsExperience E>
const E & AIToolbox::MDP::ThompsonModel< E >::getExperience

This function enables inspection of the underlying Experience of the ThompsonModel.

Returns
The underlying Experience of the ThompsonModel.

◆ getRewardFunction()

template<IsExperience E>
const ThompsonModel< E >::RewardMatrix & AIToolbox::MDP::ThompsonModel< E >::getRewardFunction

This function returns the rewards matrix for inspection.

Returns
The rewards matrix.

◆ getS()

template<IsExperience E>
size_t AIToolbox::MDP::ThompsonModel< E >::getS

This function returns the number of states of the world.

Returns
The total number of states.

◆ getTransitionFunction() [1/2]

template<IsExperience E>
const ThompsonModel< E >::TransitionMatrix & AIToolbox::MDP::ThompsonModel< E >::getTransitionFunction

This function returns the transition matrix for inspection.

Returns
The transition matrix.

◆ getTransitionFunction() [2/2]

template<IsExperience E>
const Matrix2D & AIToolbox::MDP::ThompsonModel< E >::getTransitionFunction ( size_t  a) const

This function returns the transition function for a given action.

Parameters
aThe action requested.
Returns
The transition function for the input action.

◆ getTransitionProbability()

template<IsExperience E>
double AIToolbox::MDP::ThompsonModel< E >::getTransitionProbability ( size_t  s,
size_t  a,
size_t  s1 
) const

This function returns the stored transition probability for the specified transition.

Parameters
sThe initial state of the transition.
aThe action performed in the transition.
s1The final state of the transition.
Returns
The probability of the specified transition.

◆ isTerminal()

template<IsExperience E>
bool AIToolbox::MDP::ThompsonModel< E >::isTerminal ( size_t  s) const

This function returns whether a given state is a terminal.

Parameters
sThe state examined.
Returns
True if the input state is a terminal, false otherwise.

◆ sampleSR()

template<IsExperience E>
std::tuple< size_t, double > AIToolbox::MDP::ThompsonModel< E >::sampleSR ( size_t  s,
size_t  a 
) const

This function samples the MDP for the specified state action pair.

This function samples the model for simulate experience. The transition and reward functions are used to produce, from the state action pair inserted as arguments, a possible new state with respective reward. The new state is picked from all possible states that the MDP allows transitioning to, each with probability equal to the same probability of the transition in the model. After a new state is picked, the reward is the corresponding reward contained in the reward function.

Parameters
sThe state that needs to be sampled.
aThe action that needs to be sampled.
Returns
A tuple containing a new state and a reward.

◆ setDiscount()

template<IsExperience E>
void AIToolbox::MDP::ThompsonModel< E >::setDiscount ( double  d)

This function sets a new discount factor for the Model.

Parameters
dThe new discount factor for the Model.

◆ sync() [1/2]

template<IsExperience E>
void AIToolbox::MDP::ThompsonModel< E >::sync

This function syncs the whole ThompsonModel to the underlying Experience.

Since use cases in AI are very varied, one may not want to update its ThompsonModel for each single transition experienced by the agent. To avoid this we leave to the user the task of syncing between the underlying Experience and the ThompsonModel, as he/she sees fit.

◆ sync() [2/2]

template<IsExperience E>
void AIToolbox::MDP::ThompsonModel< E >::sync ( size_t  s,
size_t  a 
)

This function syncs a state action pair in the ThompsonModel to the underlying Experience.

Since use cases in AI are very varied, one may not want to update the ThompsonModel for each single transition experienced by the agent. To avoid this we leave to the user the task of syncing between the underlying Experience and the ThompsonModel, as he/she sees fit.

This function updates a single state action pair with the underlying Experience. This function provides a higher fine-grained control on resampling than sync().

Both transitions and rewards for the specified state-action pair will be resampled.

Parameters
sThe state that needs to be synced.
aThe action that needs to be synced.

The documentation for this class was generated from the following file: