|
| RetraceLEvaluation (const PolicyInterface &target, const PolicyInterface &behaviour, const double discount, const double alpha, const double lambda, const double tolerance) |
| Basic constructor. More...
|
|
void | setLambda (double l) |
| This function sets the new lambda parameter. More...
|
|
double | getLambda () const |
| This function returns the currently set lambda parameter. More...
|
|
| OffPolicyEvaluation (const PolicyInterface &target, double discount=1.0, double alpha=0.1, double tolerance=0.001) |
| Basic constructor. More...
|
|
void | stepUpdateQ (const size_t s, const size_t a, const size_t s1, const double rew) |
| This function updates the internal QFunction using the discount set during construction. More...
|
|
| OffPolicyBase (size_t s, size_t a, double discount=1.0, double alpha=0.1, double tolerance=0.001) |
| Basic construtor. More...
|
|
void | setLearningRate (double a) |
| This function sets the learning rate parameter. More...
|
|
double | getLearningRate () const |
| This function will return the current set learning rate parameter. More...
|
|
void | setDiscount (double d) |
| This function sets the new discount parameter. More...
|
|
double | getDiscount () const |
| This function returns the currently set discount parameter. More...
|
|
void | setTolerance (double t) |
| This function sets the trace cutoff parameter. More...
|
|
double | getTolerance () const |
| This function returns the currently set trace cutoff parameter. More...
|
|
void | clearTraces () |
| This function clears the already set traces. More...
|
|
const Traces & | getTraces () const |
| This function returns the currently set traces. More...
|
|
void | setTraces (const Traces &t) |
| This function sets the currently set traces. More...
|
|
size_t | getS () const |
| This function returns the number of states on which QLearning is working. More...
|
|
size_t | getA () const |
| This function returns the number of actions on which QLearning is working. More...
|
|
const QFunction & | getQFunction () const |
| This function returns a reference to the internal QFunction. More...
|
|
void | setQFunction (const QFunction &qfun) |
| This function allows to directly set the internal QFunction. More...
|
|
This class implements off-policy evaluation via Retrace(lambda).
This algorithm tries to get all advantages from ImportanceSampling, QL and TreeBackupL. The idea is to use the lambda parameter to tune the traces, but at the same time use the ratio between target and behaviour policies in order to make the most out of the available data.
To avoid the variance problem of ImportanceSampling though, it imposes a ceiling on the ratio: if too high it is pinned to 1. This still leverages the data, but makes variance much less of a problem, since now traces are bound to decrease over time.