This class implements off-policy evaluation via Retrace(lambda). More...

#include <AIToolbox/MDP/Algorithms/RetraceL.hpp>

Inheritance diagram for AIToolbox::MDP::RetraceLEvaluation:

Public Types
using	Parent = OffPolicyEvaluation< RetraceLEvaluation >

Public Types inherited from AIToolbox::MDP::OffPolicyEvaluation< RetraceLEvaluation >
using	Parent = OffPolicyBase

Public Types inherited from AIToolbox::MDP::OffPolicyBase
using	Trace = std::tuple< size_t, size_t, double >

using	Traces = std::vector< Trace >

Public Member Functions
	RetraceLEvaluation (const PolicyInterface &target, const PolicyInterface &behaviour, const double discount, const double alpha, const double lambda, const double tolerance)
	Basic constructor. More...

void	setLambda (double l)
	This function sets the new lambda parameter. More...

double	getLambda () const
	This function returns the currently set lambda parameter. More...

Public Member Functions inherited from AIToolbox::MDP::OffPolicyEvaluation< RetraceLEvaluation >
	OffPolicyEvaluation (const PolicyInterface &target, double discount=1.0, double alpha=0.1, double tolerance=0.001)
	Basic constructor. More...

void	stepUpdateQ (const size_t s, const size_t a, const size_t s1, const double rew)
	This function updates the internal QFunction using the discount set during construction. More...

Public Member Functions inherited from AIToolbox::MDP::OffPolicyBase
	OffPolicyBase (size_t s, size_t a, double discount=1.0, double alpha=0.1, double tolerance=0.001)
	Basic construtor. More...

void	setLearningRate (double a)
	This function sets the learning rate parameter. More...

double	getLearningRate () const
	This function will return the current set learning rate parameter. More...

void	setDiscount (double d)
	This function sets the new discount parameter. More...

double	getDiscount () const
	This function returns the currently set discount parameter. More...

void	setTolerance (double t)
	This function sets the trace cutoff parameter. More...

double	getTolerance () const
	This function returns the currently set trace cutoff parameter. More...

void	clearTraces ()
	This function clears the already set traces. More...

const Traces &	getTraces () const
	This function returns the currently set traces. More...

void	setTraces (const Traces &t)
	This function sets the currently set traces. More...

size_t	getS () const
	This function returns the number of states on which QLearning is working. More...

size_t	getA () const
	This function returns the number of actions on which QLearning is working. More...

const QFunction &	getQFunction () const
	This function returns a reference to the internal QFunction. More...

void	setQFunction (const QFunction &qfun)
	This function allows to directly set the internal QFunction. More...

Additional Inherited Members
Protected Member Functions inherited from AIToolbox::MDP::OffPolicyBase
void	updateTraces (size_t s, size_t a, double error, double traceDiscount)
	This function updates the traces using the input data. More...

Protected Attributes inherited from AIToolbox::MDP::OffPolicyEvaluation< RetraceLEvaluation >
const PolicyInterface &	target_

Protected Attributes inherited from AIToolbox::MDP::OffPolicyBase
size_t	S

size_t	A

double	discount_

double	alpha_

double	tolerance_

QFunction	q_

Traces	traces_

Detailed Description

This class implements off-policy evaluation via Retrace(lambda).

This algorithm tries to get all advantages from ImportanceSampling, QL and TreeBackupL. The idea is to use the lambda parameter to tune the traces, but at the same time use the ratio between target and behaviour policies in order to make the most out of the available data.

To avoid the variance problem of ImportanceSampling though, it imposes a ceiling on the ratio: if too high it is pinned to 1. This still leverages the data, but makes variance much less of a problem, since now traces are bound to decrease over time.

Member Typedef Documentation

◆ Parent

using AIToolbox::MDP::RetraceLEvaluation::Parent = OffPolicyEvaluation<RetraceLEvaluation>

Constructor & Destructor Documentation

◆ RetraceLEvaluation()

AIToolbox::MDP::RetraceLEvaluation::RetraceLEvaluation	(	const PolicyInterface &	target,
		const PolicyInterface &	behaviour,
		const double	discount,
		const double	alpha,
		const double	lambda,
		const double	tolerance
	)

inline

Basic constructor.

Parameters

target	Target policy.
behaviour	Behaviour policy
discount	Discount for the problem.
alpha	Learning rate parameter.
lambda	Lambda trace parameter.
tolerance	Trace cutoff parameter.

Member Function Documentation

◆ getLambda()

double AIToolbox::MDP::RetraceLEvaluation::getLambda ( ) const

inline

This function returns the currently set lambda parameter.

◆ setLambda()

void AIToolbox::MDP::RetraceLEvaluation::setLambda ( double l )

inline

This function sets the new lambda parameter.

The lambda parameter must be >= 0.0 and <= 1.0, otherwise the function will throw an std::invalid_argument.

Parameters

l	The new lambda parameter.

The documentation for this class was generated from the following file:

include/AIToolbox/MDP/Algorithms/RetraceL.hpp

Public Types

Public Member Functions

Additional Inherited Members

Detailed Description

Member Typedef Documentation

◆ Parent

Constructor & Destructor Documentation

◆ RetraceLEvaluation()

Member Function Documentation

◆ getLambda()

◆ setLambda()