AIToolbox
A library that offers tools for AI problem solving.
|
This class implements off-policy evaluation via Q(lambda). More...
#include <AIToolbox/MDP/Algorithms/QL.hpp>
Public Types | |
using | Parent = OffPolicyEvaluation< QLEvaluation > |
Public Types inherited from AIToolbox::MDP::OffPolicyEvaluation< QLEvaluation > | |
using | Parent = OffPolicyBase |
Public Types inherited from AIToolbox::MDP::OffPolicyBase | |
using | Trace = std::tuple< size_t, size_t, double > |
using | Traces = std::vector< Trace > |
Public Member Functions | |
QLEvaluation (const PolicyInterface &target, const double discount, const double alpha, const double lambda, const double tolerance) | |
Basic constructor. More... | |
void | setLambda (double l) |
This function sets the new lambda parameter. More... | |
double | getLambda () const |
This function returns the currently set lambda parameter. More... | |
Public Member Functions inherited from AIToolbox::MDP::OffPolicyEvaluation< QLEvaluation > | |
OffPolicyEvaluation (const PolicyInterface &target, double discount=1.0, double alpha=0.1, double tolerance=0.001) | |
Basic constructor. More... | |
void | stepUpdateQ (const size_t s, const size_t a, const size_t s1, const double rew) |
This function updates the internal QFunction using the discount set during construction. More... | |
Public Member Functions inherited from AIToolbox::MDP::OffPolicyBase | |
OffPolicyBase (size_t s, size_t a, double discount=1.0, double alpha=0.1, double tolerance=0.001) | |
Basic construtor. More... | |
void | setLearningRate (double a) |
This function sets the learning rate parameter. More... | |
double | getLearningRate () const |
This function will return the current set learning rate parameter. More... | |
void | setDiscount (double d) |
This function sets the new discount parameter. More... | |
double | getDiscount () const |
This function returns the currently set discount parameter. More... | |
void | setTolerance (double t) |
This function sets the trace cutoff parameter. More... | |
double | getTolerance () const |
This function returns the currently set trace cutoff parameter. More... | |
void | clearTraces () |
This function clears the already set traces. More... | |
const Traces & | getTraces () const |
This function returns the currently set traces. More... | |
void | setTraces (const Traces &t) |
This function sets the currently set traces. More... | |
size_t | getS () const |
This function returns the number of states on which QLearning is working. More... | |
size_t | getA () const |
This function returns the number of actions on which QLearning is working. More... | |
const QFunction & | getQFunction () const |
This function returns a reference to the internal QFunction. More... | |
void | setQFunction (const QFunction &qfun) |
This function allows to directly set the internal QFunction. More... | |
Additional Inherited Members | |
Protected Member Functions inherited from AIToolbox::MDP::OffPolicyBase | |
void | updateTraces (size_t s, size_t a, double error, double traceDiscount) |
This function updates the traces using the input data. More... | |
Protected Attributes inherited from AIToolbox::MDP::OffPolicyEvaluation< QLEvaluation > | |
const PolicyInterface & | target_ |
Protected Attributes inherited from AIToolbox::MDP::OffPolicyBase | |
size_t | S |
size_t | A |
double | discount_ |
double | alpha_ |
double | tolerance_ |
QFunction | q_ |
Traces | traces_ |
This class implements off-policy evaluation via Q(lambda).
This algorithm is the off-policy equivalent of SARSAL. It scales traces using the lambda parameter, but is able to work in an off-line manner.
Unfortunately, as it does not take into account the discrepancy between behaviour and target policies, it tends to work only if the two policies are similar.
Note that even if the trace discount does not take into account the target policy, the error update is still computed using the target, and that is why the method works and does not just compute the value of the current behaviour policy.
|
inline |
Basic constructor.
target | Target policy. |
discount | Discount for the problem. |
alpha | Learning rate parameter. |
lambda | Lambda trace parameter. |
tolerance | Trace cutoff parameter. |
|
inline |
This function returns the currently set lambda parameter.
|
inline |
This function sets the new lambda parameter.
The lambda parameter must be >= 0.0 and <= 1.0, otherwise the function will throw an std::invalid_argument.
l | The new lambda parameter. |