AIToolbox
A library that offers tools for AI problem solving.
|
Namespaces | |
GridWorldUtils | |
This namespace exists in order to allow referencing the Direction values directly. | |
Classes | |
class | BanditPolicyAdaptor |
This class extends a Bandit policy so that it can be called from MDP code. More... | |
class | DoubleQLearning |
This class represents the double QLearning algorithm. More... | |
class | Dyna2 |
This class represents the Dyna2 algorithm. More... | |
class | DynaQ |
This class represents the DynaQ algorithm. More... | |
class | EpsilonPolicy |
class | ExpectedSARSA |
This class represents the ExpectedSARSA algorithm. More... | |
class | Experience |
This class keeps track of registered events and rewards. More... | |
class | GenerativeModelPython |
This class allows to import generative models from Python. More... | |
class | GridWorld |
This class represents a simple rectangular gridworld. More... | |
class | HystereticQLearning |
This class represents the Hysteretic QLearning algorithm. More... | |
class | ImportanceSampling |
This class implements off-policy control via importance sampling. More... | |
class | ImportanceSamplingEvaluation |
This class implements off-policy evaluation via importance sampling. More... | |
class | LinearProgramming |
This class solves an MDP using Linear Programming. More... | |
class | MaximumLikelihoodModel |
This class models Experience as a Markov Decision Process using Maximum Likelihood. More... | |
class | MCTS |
This class represents the MCTS online planner using UCB1. More... | |
class | Model |
This class represents a Markov Decision Process. More... | |
class | OffPolicyBase |
This class contains all the boilerplates for off-policy methods. More... | |
class | OffPolicyControl |
This class is a general version of off-policy control. More... | |
class | OffPolicyEvaluation |
This class is a general version of off-policy evaluation. More... | |
class | PGAAPPPolicy |
This class implements the PGA-APP learning algorithm. More... | |
class | Policy |
This class represents an MDP Policy. More... | |
class | PolicyEvaluation |
This class applies the policy evaluation algorithm on a policy. More... | |
class | PolicyInterface |
Simple typedef for most of MDP's policy needs. More... | |
class | PolicyIteration |
This class represents the Policy Iteration algorithm. More... | |
class | PolicyWrapper |
This class provides an MDP Policy interface around a Matrix2D. More... | |
class | PrioritizedSweeping |
This class represents the PrioritizedSweeping algorithm. More... | |
class | QGreedyPolicy |
This class implements a greedy policy through a QFunction. More... | |
class | QL |
This class implements off-policy control via Q(lambda). More... | |
class | QLearning |
This class represents the QLearning algorithm. More... | |
class | QLEvaluation |
This class implements off-policy evaluation via Q(lambda). More... | |
class | QPolicyInterface |
This class is an interface to specify a policy through a QFunction. More... | |
class | QSoftmaxPolicy |
This class implements a softmax policy through a QFunction. More... | |
class | RetraceL |
This class implements off-policy control via Retrace(lambda). More... | |
class | RetraceLEvaluation |
This class implements off-policy evaluation via Retrace(lambda). More... | |
class | RLearning |
This class represents the RLearning algorithm. More... | |
class | SARSA |
This class represents the SARSA algorithm. More... | |
class | SARSAL |
This class represents the SARSAL algorithm. More... | |
class | SparseExperience |
This class keeps track of registered events and rewards. More... | |
class | SparseMaximumLikelihoodModel |
This class models Experience as a Markov Decision Process using Maximum Likelihood. More... | |
class | SparseModel |
This class represents a Markov Decision Process. More... | |
class | ThompsonModel |
This class models Experience as a Markov Decision Process using Thompson Sampling. More... | |
class | TreeBackupL |
This class implements off-policy control via Tree Backup(lambda). More... | |
class | TreeBackupLEvaluation |
This class implements off-policy evaluation via Tree Backup(lambda). More... | |
struct | ValueFunction |
class | ValueIteration |
This class applies the value iteration algorithm on a Model. More... | |
class | WoLFPolicy |
This class implements the WoLF learning algorithm. More... | |
Typedefs | |
using | RandomPolicy = BanditPolicyAdaptor< Bandit::RandomPolicy > |
This class represents a random policy. More... | |
Functions | |
template<typename M , typename Gen > | |
requires AIToolbox::IsGenerativeModel< M > &&HasIntegralActionSpace< M > double | rollout (const M &m, std::remove_cvref_t< decltype(std::declval< M >().getS())> s, const unsigned maxDepth, Gen &rnd) |
This function performs a rollout from the input state. More... | |
AIToolbox::MDP::SparseModel | makeCliffProblem (const GridWorld &grid) |
This function sets up the cliff problem in a SparseModel. More... | |
AIToolbox::MDP::Model | makeCornerProblem (const GridWorld &grid, double stepUncertainty=0.8) |
This function sets up the corner problem in a Model. More... | |
Model | parseCassandra (std::istream &input) |
This function parses an MDP from a Cassandra formatted stream. More... | |
QFunction | makeQFunction (size_t S, size_t A) |
This function creates and zeroes a QFunction. More... | |
ValueFunction | makeValueFunction (size_t S) |
This function creates and zeroes a ValueFunction. More... | |
ValueFunction | bellmanOperator (const QFunction &q) |
This function converts a QFunction into the equivalent optimal ValueFunction. More... | |
void | bellmanOperatorInplace (const QFunction &q, ValueFunction *v) |
This function converts a QFunction into the equivalent optimal ValueFunction. More... | |
template<IsModel M> | |
Matrix2D | computeImmediateRewards (const M &model) |
This function computes all immediate rewards (state and action) of the MDP once for improved speed. More... | |
template<IsModel M> | |
QFunction | computeQFunction (const M &model, const Values &v, QFunction ir) |
This function computes the Model's QFunction from the values of a ValueFunction. More... | |
Variables | |
template<typename M > | |
concept | IsGenerativeModel |
This concept represents the required interface for a generative MDP. More... | |
template<typename M > | |
concept | IsModel |
This concept represents the required interface for a full MDP model. More... | |
template<typename M > | |
concept | IsModelEigen |
This concept represents the required interface that allows MDP algorithms to leverage Eigen. More... | |
template<typename E > | |
concept | IsExperience |
This concept represents the required interface for an experience recorder. More... | |
template<typename E > | |
concept | IsExperienceEigen |
This concept represents the required Experience interface that allows leverage Eigen. More... | |
Input stream utilities | |
These utilities read back data outputted with their respective operator<<() function. Note that the inputs must already be constructed with the correct size (state-action spaces), as the operator<<() do not save this information. These functions do not modify the input if the parsing fails. | |
std::istream & | operator>> (std::istream &is, Model &m) |
std::istream & | operator>> (std::istream &is, SparseModel &m) |
std::istream & | operator>> (std::istream &is, Experience &e) |
std::istream & | operator>> (std::istream &is, SparseExperience &e) |
std::istream & | operator>> (std::istream &is, Policy &p) |
This function reads a policy from a file. More... | |
using AIToolbox::MDP::Actions = typedef std::vector<size_t> |
using AIToolbox::MDP::QFunction = typedef Matrix2D |
using AIToolbox::MDP::RandomPolicy = typedef BanditPolicyAdaptor<Bandit::RandomPolicy> |
This class represents a random policy.
This class simply returns a random action every time it is polled.
using AIToolbox::MDP::Values = typedef Vector |
ValueFunction AIToolbox::MDP::bellmanOperator | ( | const QFunction & | q | ) |
This function converts a QFunction into the equivalent optimal ValueFunction.
The ValueFunction will contain, for each state, the best action and corresponding value as extracted from the input QFunction.
q | The QFunction to convert. |
void AIToolbox::MDP::bellmanOperatorInplace | ( | const QFunction & | q, |
ValueFunction * | v | ||
) |
This function converts a QFunction into the equivalent optimal ValueFunction.
This function is the same as bellmanOperator(), but performs its operations inplace. The input ValueFunction MUST already be sized appropriately for the input QFunction.
NOTE: This function DOES NOT perform any checks whatsoever on both the validity of the input pointer and on the size of the input ValueFunction. It assumes everything is already correct.
q | The QFunction to convert. |
v | A pre-allocated ValueFunction to populate with the optimal values. |
Matrix2D AIToolbox::MDP::computeImmediateRewards | ( | const M & | model | ) |
This function computes all immediate rewards (state and action) of the MDP once for improved speed.
This function pretty much creates the R(s,a) function for the input model. Normally we store the reward function as R(s,a,s'), but this matrix can be "compressed" into R(s,a) with no loss of meaningful information - with respect to the planning process.
Note that this function is more efficient with eigen models.
model | The MDP that needs to be solved. |
QFunction AIToolbox::MDP::computeQFunction | ( | const M & | model, |
const Values & | v, | ||
QFunction | ir | ||
) |
This function computes the Model's QFunction from the values of a ValueFunction.
Note that this function is more efficient with eigen models.
model | The MDP that needs to be solved. |
v | The values of the ValueFunction for the future of the QFunction. |
ir | The immediate rewards of the model, as created by computeImmediateRewards() |
|
inline |
This function sets up the cliff problem in a SparseModel.
The gist of this problem is a small grid where the agent is suppose to walk from a state to another state. The only problem is that between the two points stands a cliff, and walking down the cliff results in a huge negative reward, and in the agent being reset at the start of the walk. Reaching the end results in a positive reward, while every step results in a small negative reward.
Movement here is fully deterministic.
+-----—+-----—+-----—+-----—+-----—+ | | | | | | | 0 | 1 | .... | X-2 | X-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | | | X | X+1 | .... | 2X-2 | 2X-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | | | 2X | 2X+1 | .... | 3X-2 | 3X-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | | | .... | .... | .... | .... | .... | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | | | (Y-1)X |(Y-1)X+1| .... | YX-2 | YX-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | (START)| | | | (GOAL) | | YX | ~~~~ | .... | ~~~~ | YX+1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ \ /
V The Cliff
To do this we use a grid above the cliff, and we attach two states under it.
grid | The grid to use for the problem. |
|
inline |
This function sets up the corner problem in a Model.
The gist of this problem is a small grid where the upper-left corner and the bottom-right corner are self-absorbing states. The agent can move in a top-left-down-right way, where each transition that is not self absorbing results in a reward penalty of -1. In addition the movements are not guaranteed: the agent succeeds only 80% of the time.
Thus the agent needs to be able to find the shortest path to one of the self-absorbing states from every other state.
The grid cells are numbered as following:
+-----—+-----—+-----—+-----—+-----—+ | (GOAL) | | | | | | 0 | 1 | .... | X-2 | X-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | | | X | X+1 | .... | 2X-2 | 2X-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | | | 2X | 2X+1 | .... | 3X-2 | 3X-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | | | .... | .... | .... | .... | .... | | | | | | | +-----—+-----—+-----—+-----—+-----—+ | | | | | (GOAL) | | (Y-1)X |(Y-1)X+1| .... | YX-2 | YX-1 | | | | | | | +-----—+-----—+-----—+-----—+-----—+
grid | The grid to use for the problem. |
stepUncertainty | The probability that a movement action succeeds. |
QFunction AIToolbox::MDP::makeQFunction | ( | size_t | S, |
size_t | A | ||
) |
This function creates and zeroes a QFunction.
This function exists mostly to avoid remembering how to initialize Eigen matrices, and does nothing special.
S | The state space of the QFunction. |
A | The action space of the QFunction. |
ValueFunction AIToolbox::MDP::makeValueFunction | ( | size_t | S | ) |
This function creates and zeroes a ValueFunction.
This function exists mostly to avoid remembering how to initialize Eigen vectors, and does nothing special.
S | The state space of the ValueFunction. |
std::ostream& AIToolbox::MDP::operator<< | ( | std::ostream & | os, |
const Experience & | exp | ||
) |
std::ostream& AIToolbox::MDP::operator<< | ( | std::ostream & | os, |
const Model & | model | ||
) |
std::ostream& AIToolbox::MDP::operator<< | ( | std::ostream & | os, |
const PolicyInterface & | p | ||
) |
std::ostream& AIToolbox::MDP::operator<< | ( | std::ostream & | os, |
const SparseExperience & | exp | ||
) |
std::ostream& AIToolbox::MDP::operator<< | ( | std::ostream & | os, |
const SparseModel & | model | ||
) |
std::istream& AIToolbox::MDP::operator>> | ( | std::istream & | is, |
Experience & | e | ||
) |
std::istream& AIToolbox::MDP::operator>> | ( | std::istream & | is, |
Model & | m | ||
) |
std::istream & AIToolbox::MDP::operator>> | ( | std::istream & | is, |
Policy & | p | ||
) |
This function reads a policy from a file.
This function reads files that have been outputted through operator<<(). If not enough values can be extracted from the stream, the function stops and the input policy is not modified. In addition, it checks whether the probability values are within 0 and 1.
State and actions are also verified, and this function does not accept a randomly shuffled policy file. The file must be sorted by state, and each state must be sorted by action.
As a layer of additional precaution, the function normalizes the policy once it has been read, to assure true probability distribution on the internal policy.
is | The stream were the policy is being read from. |
p | The policy that is being assigned. |
std::istream& AIToolbox::MDP::operator>> | ( | std::istream & | is, |
SparseExperience & | e | ||
) |
std::istream& AIToolbox::MDP::operator>> | ( | std::istream & | is, |
SparseModel & | m | ||
) |
Model AIToolbox::MDP::parseCassandra | ( | std::istream & | input | ) |
This function parses an MDP from a Cassandra formatted stream.
This function may throw std::runtime_errors depending on whether the input is correctly formed or not.
input | The input stream. |
requires AIToolbox::IsGenerativeModel<M>&& HasIntegralActionSpace<M> double AIToolbox::MDP::rollout | ( | const M & | m, |
std::remove_cvref_t< decltype(std::declval< M >().getS())> | s, | ||
const unsigned | maxDepth, | ||
Gen & | rnd | ||
) |
This function performs a rollout from the input state.
This function performs a rollout until the agent either reaches the desired depth, or reaches a terminal state. The overall return is finally returned, from the point of the input state, and with the future rewards discounted appropriately.
This function is generally used in Monte-Carlo tree search-like algorithms, like MCTS or POMCP, to speed up discovery of promising actions without necessarily expanding their search tree. This avoids wasting lots of computation and memory on states far from our root that we will probably never see again, while at the same time still getting an estimate for the rest of the simulation.
m | The model to use for the rollout. |
s | The state to start the rollout from. |
maxDepth | The maximum number of timesteps to look into the future. |
rnd | A random number generator. |
concept AIToolbox::MDP::IsExperience |
This concept represents the required interface for an experience recorder.
This concept tests for the interface of an experience recorder that can be used to create Reinforcement Learning MDP models.
The interface must be implemented and be public in the parameter class. The interface is the following:
concept AIToolbox::MDP::IsExperienceEigen |
This concept represents the required Experience interface that allows leverage Eigen.
This concept tests for the interface of an MDP Experience which uses Eigen matrices internally. This should work for both dense and sparse Experiences.
The interface must be implemented and be public in the parameter class. The interface is the following:
In addition the Experience needs to respect the basic Experience interface.
concept AIToolbox::MDP::IsGenerativeModel |
This concept represents the required interface for a generative MDP.
This concept tests for the interface of a generative MDP model. The interface must be implemented and be public in the parameter class. The interface is the following:
The concept re-uses the "base" concept and simply requires a fixed action space, and integral state and action spaces.
concept AIToolbox::MDP::IsModel |
This concept represents the required interface for a full MDP model.
This concept tests for the interface of an MDP model. The interface must be implemented and be public in the parameter class. The interface is the following:
In addition the MDP needs to respect the interface for the MDP generative model.
concept AIToolbox::MDP::IsModelEigen |
This concept represents the required interface that allows MDP algorithms to leverage Eigen.
This struct tests for the interface of an MDP model which uses Eigen matrices internally. This should work for both dense and sparse models.
The interface must be implemented and be public in the parameter class. The interface is the following:
In addition the MDP needs to respect the interface for the MDP model.