AIToolbox
A library that offers tools for AI problem solving.
AIToolbox::MDP::OffPolicyEvaluation< Derived > Class Template Reference

This class is a general version of off-policy evaluation. More...

#include <AIToolbox/MDP/Algorithms/Utils/OffPolicyTemplate.hpp>

Inheritance diagram for AIToolbox::MDP::OffPolicyEvaluation< Derived >:
AIToolbox::MDP::OffPolicyBase

Public Types

using Parent = OffPolicyBase
 
- Public Types inherited from AIToolbox::MDP::OffPolicyBase
using Trace = std::tuple< size_t, size_t, double >
 
using Traces = std::vector< Trace >
 

Public Member Functions

 OffPolicyEvaluation (const PolicyInterface &target, double discount=1.0, double alpha=0.1, double tolerance=0.001)
 Basic constructor. More...
 
void stepUpdateQ (const size_t s, const size_t a, const size_t s1, const double rew)
 This function updates the internal QFunction using the discount set during construction. More...
 
- Public Member Functions inherited from AIToolbox::MDP::OffPolicyBase
 OffPolicyBase (size_t s, size_t a, double discount=1.0, double alpha=0.1, double tolerance=0.001)
 Basic construtor. More...
 
void setLearningRate (double a)
 This function sets the learning rate parameter. More...
 
double getLearningRate () const
 This function will return the current set learning rate parameter. More...
 
void setDiscount (double d)
 This function sets the new discount parameter. More...
 
double getDiscount () const
 This function returns the currently set discount parameter. More...
 
void setTolerance (double t)
 This function sets the trace cutoff parameter. More...
 
double getTolerance () const
 This function returns the currently set trace cutoff parameter. More...
 
void clearTraces ()
 This function clears the already set traces. More...
 
const TracesgetTraces () const
 This function returns the currently set traces. More...
 
void setTraces (const Traces &t)
 This function sets the currently set traces. More...
 
size_t getS () const
 This function returns the number of states on which QLearning is working. More...
 
size_t getA () const
 This function returns the number of actions on which QLearning is working. More...
 
const QFunctiongetQFunction () const
 This function returns a reference to the internal QFunction. More...
 
void setQFunction (const QFunction &qfun)
 This function allows to directly set the internal QFunction. More...
 

Protected Attributes

const PolicyInterfacetarget_
 
- Protected Attributes inherited from AIToolbox::MDP::OffPolicyBase
size_t S
 
size_t A
 
double discount_
 
double alpha_
 
double tolerance_
 
QFunction q_
 
Traces traces_
 

Additional Inherited Members

- Protected Member Functions inherited from AIToolbox::MDP::OffPolicyBase
void updateTraces (size_t s, size_t a, double error, double traceDiscount)
 This function updates the traces using the input data. More...
 

Detailed Description

template<typename Derived>
class AIToolbox::MDP::OffPolicyEvaluation< Derived >

This class is a general version of off-policy evaluation.

This class is used to compute the QFunction of a given policy, when you are actually acting and gathering data following another policy (which is why it's called off-policy).

Keep in mind that these kind of methods are not very efficient when either the target or the behaviour policy are very deterministic. This is because greedy policies (at least with methods that use some kind of importance sampling) tend to cut traces short, which is basically equivalent to discarding data (this must be done to ensure correctness though).

Note that this class does not necessarily encompass all off-policy evaluation methods. It only does for the one that use eligibility traces in a certain form, such as ImportanceSampling, RetraceLambda, etc.

This class is supposed to be used as a CRTP parent. The child must derive it as:

class Child : public OffPolicyEvaluation<Child> {};

In addition, the child must define the function

double getTraceDiscount(size_t s, size_t a, size_t s1, double rew) const;

This will then be automatically called here to compute the amount to decrease the traces during the stepUpdateQ. For example, in ImportanceSampling the function would return:

return target_.getActionProbability(s, a) / behaviour_.getActionProbability(s, a);

Member Typedef Documentation

◆ Parent

template<typename Derived >
using AIToolbox::MDP::OffPolicyEvaluation< Derived >::Parent = OffPolicyBase

Constructor & Destructor Documentation

◆ OffPolicyEvaluation()

template<typename Derived >
AIToolbox::MDP::OffPolicyEvaluation< Derived >::OffPolicyEvaluation ( const PolicyInterface target,
double  discount = 1.0,
double  alpha = 0.1,
double  tolerance = 0.001 
)

Basic constructor.

Parameters
targetThe policy to be evaluated.
discountThe discount of the environment.
alphaThe learning rate parameter.
toleranceThe trace cutoff parameter.

Member Function Documentation

◆ stepUpdateQ()

template<typename Derived >
void AIToolbox::MDP::OffPolicyEvaluation< Derived >::stepUpdateQ ( const size_t  s,
const size_t  a,
const size_t  s1,
const double  rew 
)

This function updates the internal QFunction using the discount set during construction.

This function takes a single experience point and uses it to update the QFunction. This is a very efficient method to keep the QFunction up to date with the latest experience.

Parameters
sThe previous state.
aThe action performed.
s1The new state.
rewThe reward obtained.

Member Data Documentation

◆ target_

template<typename Derived >
const PolicyInterface& AIToolbox::MDP::OffPolicyEvaluation< Derived >::target_
protected

The documentation for this class was generated from the following file:
AIToolbox::MDP::OffPolicyEvaluation::OffPolicyEvaluation
OffPolicyEvaluation(const PolicyInterface &target, double discount=1.0, double alpha=0.1, double tolerance=0.001)
Basic constructor.
Definition: OffPolicyTemplate.hpp:393
AIToolbox::MDP::OffPolicyEvaluation::target_
const PolicyInterface & target_
Definition: OffPolicyTemplate.hpp:245
AIToolbox::PolicyInterface::getActionProbability
virtual double getActionProbability(const Sampling &s, const Action &a) const =0
This function returns the probability of taking the specified action in the specified state.