|
| TreeBackupLEvaluation (const PolicyInterface &target, const double discount, const double alpha, const double lambda, const double tolerance) |
| Basic constructor. More...
|
|
void | setLambda (double l) |
| This function sets the new lambda parameter. More...
|
|
double | getLambda () const |
| This function returns the currently set lambda parameter. More...
|
|
| OffPolicyEvaluation (const PolicyInterface &target, double discount=1.0, double alpha=0.1, double tolerance=0.001) |
| Basic constructor. More...
|
|
void | stepUpdateQ (const size_t s, const size_t a, const size_t s1, const double rew) |
| This function updates the internal QFunction using the discount set during construction. More...
|
|
| OffPolicyBase (size_t s, size_t a, double discount=1.0, double alpha=0.1, double tolerance=0.001) |
| Basic construtor. More...
|
|
void | setLearningRate (double a) |
| This function sets the learning rate parameter. More...
|
|
double | getLearningRate () const |
| This function will return the current set learning rate parameter. More...
|
|
void | setDiscount (double d) |
| This function sets the new discount parameter. More...
|
|
double | getDiscount () const |
| This function returns the currently set discount parameter. More...
|
|
void | setTolerance (double t) |
| This function sets the trace cutoff parameter. More...
|
|
double | getTolerance () const |
| This function returns the currently set trace cutoff parameter. More...
|
|
void | clearTraces () |
| This function clears the already set traces. More...
|
|
const Traces & | getTraces () const |
| This function returns the currently set traces. More...
|
|
void | setTraces (const Traces &t) |
| This function sets the currently set traces. More...
|
|
size_t | getS () const |
| This function returns the number of states on which QLearning is working. More...
|
|
size_t | getA () const |
| This function returns the number of actions on which QLearning is working. More...
|
|
const QFunction & | getQFunction () const |
| This function returns a reference to the internal QFunction. More...
|
|
void | setQFunction (const QFunction &qfun) |
| This function allows to directly set the internal QFunction. More...
|
|
This class implements off-policy evaluation via Tree Backup(lambda).
This algorithm tries to avoid the infinite variance problem that ImportanceSampling has, by multiplying the traces by just the target policy probability. It additionally uses the lambda parameter to further tune their length.
While it succeeds in its intent, it tends to cut traces short. This happens since all actions taken by a policy have a <= 1 probability of being picked, which generally shortens the trace. While not overall a problem, this is inefficient in case the behaviour and target policies are very similar.