AIToolbox
A library that offers tools for AI problem solving.
|
This class represents a Markov Decision Process. More...
#include <AIToolbox/MDP/Model.hpp>
Public Types | |
using | TransitionMatrix = Matrix3D |
using | RewardMatrix = Matrix2D |
Public Member Functions | |
Model (size_t s, size_t a, double discount=1.0) | |
Basic constructor. More... | |
template<IsNaive3DMatrix T, IsNaive3DMatrix R> | |
Model (size_t s, size_t a, const T &t, const R &r, double d=1.0) | |
Basic constructor. More... | |
template<IsModel M> | |
Model (const M &model) | |
Copy constructor from any valid MDP model. More... | |
Model (NoCheck, size_t s, size_t a, TransitionMatrix &&t, RewardMatrix &&r, double d) | |
Unchecked constructor. More... | |
template<IsNaive3DMatrix T> | |
void | setTransitionFunction (const T &t) |
This function replaces the Model transition function with the one provided. More... | |
void | setTransitionFunction (const TransitionMatrix &t) |
This function sets the transition function using a Eigen dense matrix. More... | |
template<IsNaive3DMatrix R> | |
void | setRewardFunction (const R &r) |
This function replaces the Model reward function with the one provided. More... | |
void | setRewardFunction (const RewardMatrix &r) |
This function replaces the reward function with the one provided. More... | |
void | setDiscount (double d) |
This function sets a new discount factor for the Model. More... | |
std::tuple< size_t, double > | sampleSR (size_t s, size_t a) const |
This function samples the MDP with the specified state action pair. More... | |
size_t | getS () const |
This function returns the number of states of the world. More... | |
size_t | getA () const |
This function returns the number of available actions to the agent. More... | |
double | getDiscount () const |
This function returns the currently set discount factor. More... | |
double | getTransitionProbability (size_t s, size_t a, size_t s1) const |
This function returns the stored transition probability for the specified transition. More... | |
double | getExpectedReward (size_t s, size_t a, size_t s1) const |
This function returns the stored expected reward for the specified transition. More... | |
const TransitionMatrix & | getTransitionFunction () const |
This function returns the transition matrix for inspection. More... | |
const Matrix2D & | getTransitionFunction (size_t a) const |
This function returns the transition function for a given action. More... | |
const RewardMatrix & | getRewardFunction () const |
This function returns the rewards matrix for inspection. More... | |
bool | isTerminal (size_t s) const |
This function returns whether a given state is a terminal. More... | |
This class represents a Markov Decision Process.
A Markov Decision Process (MDP) is a way to model decision making. The idea is that there is an agent situated in a stochastic environment which changes in discrete "timesteps". The agent can influence the way the environment changes via "actions". For each action the agent can perform, the environment will transition from a state "s" to a state "s1" following a certain transition function. The transition function specifies, for each triple SxAxS' the probability that such a transition will happen.
In addition, associated with transitions, the agent is able to obtain rewards. Thus, if it does good, the agent will obtain a higher reward than if it performed badly. The reward obtained by the agent is in addition associated with a "discount" factor: at every step, the possible reward that the agent can collect is multiplied by this factor, which is a number between 0 and 1. The discount factor is used to model the fact that often it is preferable to obtain something sooner, rather than later.
Since all of this is governed by probabilities, it is possible to solve an MDP model in order to obtain an "optimal policy", which is a way to select an action from a state which will maximize the expected reward that the agent is going to collect during its life. The expected reward is computed as the sum of every reward the agent collects at every timestep, keeping in mind that at every timestep the reward is further and further discounted.
Solving an MDP in such a way is called "planning". Planning solutions often include an "horizon", which is the number of timesteps that are included in an episode. They can be finite or infinite. The optimal policy changes with respect to the horizon, since a higher horizon may offer access to reward-gaining opportunities farther in the future.
An MDP policy (be it the optimal one or another), is associated with two functions: a ValueFunction and a QFunction. The ValueFunction represents the expected return for the agent from any initial state, given that actions are going to be selected according to the policy. The QFunction is similar: it gives the expected return for a specific state-action pair, given that after the specified action one will act according to the policy.
Given that we are usually interested about the optimal policy, there are a couple of properties that are associated with the optimal policies functions. First, the optimal policy can be derived from the optimal QFunction. The optimal policy simply selects, in a given state "s", the action that maximizes the value of the QFunction. In the same way, the optimal ValueFunction can be computed from the optimal QFunction by selecting the max with respect to the action.
Since so much information can be extracted from the QFunction, lots of methods (mostly in Reinforcement Learning) try to learn it.
AIToolbox::MDP::Model::Model | ( | size_t | s, |
size_t | a, | ||
double | discount = 1.0 |
||
) |
Basic constructor.
This constructor initializes the Model so that all transitions happen with probability 0 but for transitions that bring back to the same state, no matter the action.
All rewards are set to 0. The discount parameter is set to 1.
s | The number of states of the world. |
a | The number of actions available to the agent. |
discount | The discount factor for the MDP. |
AIToolbox::MDP::Model::Model | ( | size_t | s, |
size_t | a, | ||
const T & | t, | ||
const R & | r, | ||
double | d = 1.0 |
||
) |
Basic constructor.
This constructor takes two arbitrary three dimensional containers and tries to copy their contents into the transitions and rewards matrices respectively.
The containers need to support data access through operator[]. In addition, the dimensions of the containers must match the ones provided as arguments (for three dimensions: s,a,s).
This is important, as this constructor DOES NOT perform any size checks on the external containers.
Internal values of the containers will be converted to double, so these conversions must be possible.
In addition, the transition container must contain a valid transition function.
The discount parameter must be between 0 and 1 included, otherwise the constructor will throw an std::invalid_argument.
T | The external transition container type. |
R | The external rewards container type. |
s | The number of states of the world. |
a | The number of actions available to the agent. |
t | The external transitions container. |
r | The external rewards container. |
d | The discount factor for the MDP. |
AIToolbox::MDP::Model::Model | ( | const M & | model | ) |
Copy constructor from any valid MDP model.
This allows to copy from any other model. A nice use for this is to convert any model which computes probabilities on the fly into an MDP::Model where probabilities are all stored for fast access. Of course such a solution can be done only when the number of states and actions is not too big.
M | The type of the other model. |
model | The model that needs to be copied. |
AIToolbox::MDP::Model::Model | ( | NoCheck | , |
size_t | s, | ||
size_t | a, | ||
TransitionMatrix && | t, | ||
RewardMatrix && | r, | ||
double | d | ||
) |
Unchecked constructor.
This constructor takes ownership of the data that it is passed to it to avoid any sorts of copies and additional work (sanity checks), in order to speed up as much as possible the process of building a new Model.
Note that to use it you have to explicitly use the NO_CHECK tag parameter first.
size_t AIToolbox::MDP::Model::getA | ( | ) | const |
This function returns the number of available actions to the agent.
double AIToolbox::MDP::Model::getDiscount | ( | ) | const |
This function returns the currently set discount factor.
double AIToolbox::MDP::Model::getExpectedReward | ( | size_t | s, |
size_t | a, | ||
size_t | s1 | ||
) | const |
This function returns the stored expected reward for the specified transition.
s | The initial state of the transition. |
a | The action performed in the transition. |
s1 | The final state of the transition. |
const RewardMatrix& AIToolbox::MDP::Model::getRewardFunction | ( | ) | const |
This function returns the rewards matrix for inspection.
size_t AIToolbox::MDP::Model::getS | ( | ) | const |
This function returns the number of states of the world.
const TransitionMatrix& AIToolbox::MDP::Model::getTransitionFunction | ( | ) | const |
This function returns the transition matrix for inspection.
const Matrix2D& AIToolbox::MDP::Model::getTransitionFunction | ( | size_t | a | ) | const |
This function returns the transition function for a given action.
a | The action requested. |
double AIToolbox::MDP::Model::getTransitionProbability | ( | size_t | s, |
size_t | a, | ||
size_t | s1 | ||
) | const |
This function returns the stored transition probability for the specified transition.
s | The initial state of the transition. |
a | The action performed in the transition. |
s1 | The final state of the transition. |
bool AIToolbox::MDP::Model::isTerminal | ( | size_t | s | ) | const |
This function returns whether a given state is a terminal.
s | The state examined. |
std::tuple<size_t, double> AIToolbox::MDP::Model::sampleSR | ( | size_t | s, |
size_t | a | ||
) | const |
This function samples the MDP with the specified state action pair.
This function samples the model for simulated experience. The transition and reward functions are used to produce, from the state action pair inserted as arguments, a possible new state with respective reward. The new state is picked from all possible states that the MDP allows transitioning to, each with probability equal to the same probability of the transition in the model. After a new state is picked, the reward is the corresponding reward contained in the reward function.
s | The state that needs to be sampled. |
a | The action that needs to be sampled. |
void AIToolbox::MDP::Model::setDiscount | ( | double | d | ) |
void AIToolbox::MDP::Model::setRewardFunction | ( | const R & | r | ) |
This function replaces the Model reward function with the one provided.
The container needs to support data access through operator[]. In addition, the dimensions of the containers must match the ones used during construction (for three dimensions: S, A, S).
This is important, as this function DOES NOT perform any size checks on the external containers.
Internal values of the container will be converted to double, so these conversions must be possible.
R | The external rewards container type. |
r | The external rewards container. |
void AIToolbox::MDP::Model::setRewardFunction | ( | const RewardMatrix & | r | ) |
This function replaces the reward function with the one provided.
The dimensions of the container must match the ones used during construction(for two dimensions: S, A). BE CAREFUL.
This function does DOES NOT perform any size checks on the input.
r | The external rewards container. |
void AIToolbox::MDP::Model::setTransitionFunction | ( | const T & | t | ) |
This function replaces the Model transition function with the one provided.
This function will throw a std::invalid_argument if the matrix provided does not contain valid probabilities.
The container needs to support data access through operator[]. In addition, the dimensions of the container must match the ones used during construction (for three dimensions: S,A,S).
This is important, as this function DOES NOT perform any size checks on the external container.
Internal values of the container will be converted to double, so these conversions must be possible.
T | The external transition container type. |
t | The external transitions container. |
void AIToolbox::MDP::Model::setTransitionFunction | ( | const TransitionMatrix & | t | ) |
This function sets the transition function using a Eigen dense matrix.
This function will throw an std::invalid_argument if the matrix provided does not contain valid probabilities.
The dimensions of the container must match the ones used during construction (for three dimensions: A, S, S). BE CAREFUL. The matrices MUST be SxS, while the std::vector containing them MUST be of size A.
This function does DOES NOT perform any size checks on the input.
t | The external transitions container. |