This class contains all the boilerplates for off-policy methods. More...

#include <AIToolbox/MDP/Algorithms/Utils/OffPolicyTemplate.hpp>

Inheritance diagram for AIToolbox::MDP::OffPolicyBase:

Public Types
using	Trace = std::tuple< size_t, size_t, double >

using	Traces = std::vector< Trace >

Public Member Functions
	OffPolicyBase (size_t s, size_t a, double discount=1.0, double alpha=0.1, double tolerance=0.001)
	Basic construtor. More...

void	setLearningRate (double a)
	This function sets the learning rate parameter. More...

double	getLearningRate () const
	This function will return the current set learning rate parameter. More...

void	setDiscount (double d)
	This function sets the new discount parameter. More...

double	getDiscount () const
	This function returns the currently set discount parameter. More...

void	setTolerance (double t)
	This function sets the trace cutoff parameter. More...

double	getTolerance () const
	This function returns the currently set trace cutoff parameter. More...

void	clearTraces ()
	This function clears the already set traces. More...

const Traces &	getTraces () const
	This function returns the currently set traces. More...

void	setTraces (const Traces &t)
	This function sets the currently set traces. More...

size_t	getS () const
	This function returns the number of states on which QLearning is working. More...

size_t	getA () const
	This function returns the number of actions on which QLearning is working. More...

const QFunction &	getQFunction () const
	This function returns a reference to the internal QFunction. More...

void	setQFunction (const QFunction &qfun)
	This function allows to directly set the internal QFunction. More...

Protected Member Functions
void	updateTraces (size_t s, size_t a, double error, double traceDiscount)
	This function updates the traces using the input data. More...

Protected Attributes
size_t	S

size_t	A

double	discount_

double	alpha_

double	tolerance_

QFunction	q_

Traces	traces_

Detailed Description

This class contains all the boilerplates for off-policy methods.

Member Typedef Documentation

◆ Trace

using AIToolbox::MDP::OffPolicyBase::Trace = std::tuple<size_t, size_t, double>

◆ Traces

using AIToolbox::MDP::OffPolicyBase::Traces = std::vector<Trace>

Constructor & Destructor Documentation

◆ OffPolicyBase()

AIToolbox::MDP::OffPolicyBase::OffPolicyBase	(	size_t	s,
		size_t	a,
		double	discount = `1.0`,
		double	alpha = `0.1`,
		double	tolerance = `0.001`
	)

Basic construtor.

Parameters

s	The size of the state space.
a	The size of the action space.
discount	The discount of the environment.
alpha	The learning rate.
tolerance	The cutoff point for eligibility traces.

Member Function Documentation

◆ clearTraces()

void AIToolbox::MDP::OffPolicyBase::clearTraces ( )

This function clears the already set traces.

◆ getA()

size_t AIToolbox::MDP::OffPolicyBase::getA ( ) const

This function returns the number of actions on which QLearning is working.

Returns: The number of actions.

◆ getDiscount()

double AIToolbox::MDP::OffPolicyBase::getDiscount ( ) const

This function returns the currently set discount parameter.

Returns: The currently set discount parameter.

◆ getLearningRate()

double AIToolbox::MDP::OffPolicyBase::getLearningRate ( ) const

This function will return the current set learning rate parameter.

Returns: The currently set learning rate parameter.

◆ getQFunction()

const QFunction& AIToolbox::MDP::OffPolicyBase::getQFunction ( ) const

This function returns a reference to the internal QFunction.

The returned reference can be used to build Policies, for example MDP::QGreedyPolicy.

Returns: The internal QFunction.

◆ getS()

size_t AIToolbox::MDP::OffPolicyBase::getS ( ) const

This function returns the number of states on which QLearning is working.

Returns: The number of states.

◆ getTolerance()

double AIToolbox::MDP::OffPolicyBase::getTolerance ( ) const

This function returns the currently set trace cutoff parameter.

Returns: The currently set trace cutoff parameter.

◆ getTraces()

const Traces& AIToolbox::MDP::OffPolicyBase::getTraces ( ) const

This function returns the currently set traces.

Returns: The currently set traces.

◆ setDiscount()

void AIToolbox::MDP::OffPolicyBase::setDiscount ( double d )

This function sets the new discount parameter.

The discount parameter controls how much we care about future rewards. If 1, then any reward is the same, if obtained now or in a million timesteps. Thus the algorithm will optimize overall reward accretion. When less than 1, rewards obtained in the presents are valued more than future rewards.

Parameters

d	The new discount factor.

◆ setLearningRate()

void AIToolbox::MDP::OffPolicyBase::setLearningRate ( double a )

This function sets the learning rate parameter.

The learning parameter determines the speed at which the QFunction is modified with respect to new data. In fully deterministic environments (such as an agent moving through a grid, for example), this parameter can be safely set to 1.0 for maximum learning.

On the other side, in stochastic environments, in order to converge this parameter should be higher when first starting to learn, and decrease slowly over time.

Otherwise it can be kept somewhat high if the environment dynamics change progressively, and the algorithm will adapt accordingly. The final behaviour is very dependent on this parameter.

The learning rate parameter must be > 0.0 and <= 1.0, otherwise the function will throw an std::invalid_argument.

Parameters

a	The new learning rate parameter.

◆ setQFunction()

void AIToolbox::MDP::OffPolicyBase::setQFunction ( const QFunction & qfun )

This function allows to directly set the internal QFunction.

This can be useful in order to use a QFunction that has already been computed elsewhere.

Parameters

qfun	The new QFunction to set.

◆ setTolerance()

void AIToolbox::MDP::OffPolicyBase::setTolerance ( double t )

This function sets the trace cutoff parameter.

This parameter determines when a trace is removed, as its coefficient has become too small to bother updating its value.

Parameters

t	The new trace cutoff value.

◆ setTraces()

void AIToolbox::MDP::OffPolicyBase::setTraces ( const Traces & t )

This function sets the currently set traces.

This method is provided in case you have a need to tinker with the internal traces. You generally don't unless you are building on top of this class in order to do something more complicated.

Parameters

t	The currently set traces.

◆ updateTraces()

void AIToolbox::MDP::OffPolicyBase::updateTraces	(	size_t	s,
		size_t	a,
		double	error,
		double	traceDiscount
	)

protected

This function updates the traces using the input data.

This operation is basically identical to what SARSAL does.

See also: SARSAL::stepUpdateQ

Parameters

s	The state we were before.
a	The action we did.
error	The error used to update the QFunction.
traceDiscount	The discount for all traces in memory.

Member Data Documentation

◆ A

size_t AIToolbox::MDP::OffPolicyBase::A

protected

◆ alpha_

double AIToolbox::MDP::OffPolicyBase::alpha_

protected

◆ discount_

double AIToolbox::MDP::OffPolicyBase::discount_

protected

◆ q_

QFunction AIToolbox::MDP::OffPolicyBase::q_

protected

◆ S

size_t AIToolbox::MDP::OffPolicyBase::S

protected

◆ tolerance_

double AIToolbox::MDP::OffPolicyBase::tolerance_

protected

◆ traces_

Traces AIToolbox::MDP::OffPolicyBase::traces_

protected

The documentation for this class was generated from the following file:

include/AIToolbox/MDP/Algorithms/Utils/OffPolicyTemplate.hpp

Public Types

Public Member Functions

Protected Member Functions

Protected Attributes

Detailed Description

Member Typedef Documentation

◆ Trace

◆ Traces

Constructor & Destructor Documentation

◆ OffPolicyBase()

Member Function Documentation

◆ clearTraces()

◆ getA()

◆ getDiscount()

◆ getLearningRate()

◆ getQFunction()

◆ getS()

◆ getTolerance()

◆ getTraces()

◆ setDiscount()

◆ setLearningRate()

◆ setQFunction()

◆ setTolerance()

◆ setTraces()

◆ updateTraces()

Member Data Documentation

◆ A

◆ alpha_

◆ discount_

◆ q_

◆ S

◆ tolerance_

◆ traces_