AIToolbox
A library that offers tools for AI problem solving.
|
This class is a general version of off-policy evaluation. More...
#include <AIToolbox/MDP/Algorithms/Utils/OffPolicyTemplate.hpp>
Public Types | |
using | Parent = OffPolicyBase |
Public Types inherited from AIToolbox::MDP::OffPolicyBase | |
using | Trace = std::tuple< size_t, size_t, double > |
using | Traces = std::vector< Trace > |
Public Member Functions | |
OffPolicyEvaluation (const PolicyInterface &target, double discount=1.0, double alpha=0.1, double tolerance=0.001) | |
Basic constructor. More... | |
void | stepUpdateQ (const size_t s, const size_t a, const size_t s1, const double rew) |
This function updates the internal QFunction using the discount set during construction. More... | |
Public Member Functions inherited from AIToolbox::MDP::OffPolicyBase | |
OffPolicyBase (size_t s, size_t a, double discount=1.0, double alpha=0.1, double tolerance=0.001) | |
Basic construtor. More... | |
void | setLearningRate (double a) |
This function sets the learning rate parameter. More... | |
double | getLearningRate () const |
This function will return the current set learning rate parameter. More... | |
void | setDiscount (double d) |
This function sets the new discount parameter. More... | |
double | getDiscount () const |
This function returns the currently set discount parameter. More... | |
void | setTolerance (double t) |
This function sets the trace cutoff parameter. More... | |
double | getTolerance () const |
This function returns the currently set trace cutoff parameter. More... | |
void | clearTraces () |
This function clears the already set traces. More... | |
const Traces & | getTraces () const |
This function returns the currently set traces. More... | |
void | setTraces (const Traces &t) |
This function sets the currently set traces. More... | |
size_t | getS () const |
This function returns the number of states on which QLearning is working. More... | |
size_t | getA () const |
This function returns the number of actions on which QLearning is working. More... | |
const QFunction & | getQFunction () const |
This function returns a reference to the internal QFunction. More... | |
void | setQFunction (const QFunction &qfun) |
This function allows to directly set the internal QFunction. More... | |
Protected Attributes | |
const PolicyInterface & | target_ |
Protected Attributes inherited from AIToolbox::MDP::OffPolicyBase | |
size_t | S |
size_t | A |
double | discount_ |
double | alpha_ |
double | tolerance_ |
QFunction | q_ |
Traces | traces_ |
Additional Inherited Members | |
Protected Member Functions inherited from AIToolbox::MDP::OffPolicyBase | |
void | updateTraces (size_t s, size_t a, double error, double traceDiscount) |
This function updates the traces using the input data. More... | |
This class is a general version of off-policy evaluation.
This class is used to compute the QFunction of a given policy, when you are actually acting and gathering data following another policy (which is why it's called off-policy).
Keep in mind that these kind of methods are not very efficient when either the target or the behaviour policy are very deterministic. This is because greedy policies (at least with methods that use some kind of importance sampling) tend to cut traces short, which is basically equivalent to discarding data (this must be done to ensure correctness though).
Note that this class does not necessarily encompass all off-policy evaluation methods. It only does for the one that use eligibility traces in a certain form, such as ImportanceSampling, RetraceLambda, etc.
This class is supposed to be used as a CRTP parent. The child must derive it as:
In addition, the child must define the function
This will then be automatically called here to compute the amount to decrease the traces during the stepUpdateQ. For example, in ImportanceSampling the function would return:
using AIToolbox::MDP::OffPolicyEvaluation< Derived >::Parent = OffPolicyBase |
AIToolbox::MDP::OffPolicyEvaluation< Derived >::OffPolicyEvaluation | ( | const PolicyInterface & | target, |
double | discount = 1.0 , |
||
double | alpha = 0.1 , |
||
double | tolerance = 0.001 |
||
) |
Basic constructor.
target | The policy to be evaluated. |
discount | The discount of the environment. |
alpha | The learning rate parameter. |
tolerance | The trace cutoff parameter. |
void AIToolbox::MDP::OffPolicyEvaluation< Derived >::stepUpdateQ | ( | const size_t | s, |
const size_t | a, | ||
const size_t | s1, | ||
const double | rew | ||
) |
This function updates the internal QFunction using the discount set during construction.
This function takes a single experience point and uses it to update the QFunction. This is a very efficient method to keep the QFunction up to date with the latest experience.
s | The previous state. |
a | The action performed. |
s1 | The new state. |
rew | The reward obtained. |
|
protected |