This class represents the Hysteretic QLearning algorithm. More...

#include <AIToolbox/MDP/Algorithms/HystereticQLearning.hpp>

Public Member Functions
	HystereticQLearning (size_t S, size_t A, double discount=1.0, double alpha=0.1, double beta=0.01)
	Basic constructor. More...

template<IsGenerativeModel M>
	HystereticQLearning (const M &model, double alpha=0.1, double beta=0.01)
	Basic constructor. More...

void	setPositiveLearningRate (double a)
	This function sets the learning rate parameter for positive updates. More...

double	getPositiveLearningRate () const
	This function will return the currently set learning rate parameter for positive updates. More...

void	setNegativeLearningRate (double b)
	This function sets the learning rate parameter for negative updates. More...

double	getNegativeLearningRate () const
	This function will return the currently set learning rate parameter for negative updates. More...

void	setDiscount (double d)
	This function sets the new discount parameter. More...

double	getDiscount () const
	This function returns the currently set discount parameter. More...

void	stepUpdateQ (size_t s, size_t a, size_t s1, double rew)
	This function updates the internal QFunction using the discount set during construction. More...

size_t	getS () const
	This function returns the number of states on which HystereticQLearning is working. More...

size_t	getA () const
	This function returns the number of actions on which HystereticQLearning is working. More...

const QFunction &	getQFunction () const
	This function returns a reference to the internal QFunction. More...

Detailed Description

This class represents the Hysteretic QLearning algorithm.

This algorithm is a very simple but powerful way to learn the optimal QFunction for an MDP model, where the transition and reward functions are unknown. It works in an offline fashion, meaning that it can be used even if the policy that the agent is currently using is not the optimal one, or is different by the one currently specified by the HystereticQLearning QFunction.

See also: QLearning

The algorithm functions quite like the normal QLearning algorithm, with a small difference: it has an additional learning parameter, beta.

One of the learning parameters (alpha) is used when the change to the underlying QFunction is positive. The other (beta), which should be kept lower than alpha, is used when the change is negative.

This is useful when using QLearning for multi-agent RL where each agent is independent. A multi-agent environment is non-stationary from the point of view of a single agent, which is disruptive for normal QLearning and generally prevents it to learn to coordinate with the other agents well.

By assigning a higher learning parameter to transitions resulting in a positive feedback, the agent insulates itself from bad results which happen when the other agents take exploratory actions.

Bad results are still guaranteed to be discovered, since the learning parameter is still greater than zero, but the algorithm tries to focus on the good things rather than the bad.

If the beta parameter is equal to the alpha, this becomes standard QLearning. When the beta parameter is zero, the algorithm becomes equivalent to Distributed QLearning.

Constructor & Destructor Documentation

◆ HystereticQLearning() [1/2]

AIToolbox::MDP::HystereticQLearning::HystereticQLearning	(	size_t	S,
		size_t	A,
		double	discount = `1.0`,
		double	alpha = `0.1`,
		double	beta = `0.01`
	)

Basic constructor.

The alpha learning rate must be > 0.0 and <= 1.0, otherwise the constructor will throw an std::invalid_argument.

The beta learning rate must be >= 0.0 and <= 1.0, otherwise the constructor will throw an std::invalid_argument. It can be zero.

Keep in mind that the beta parameter should be lower than the alpha parameter, although this is not enforced.

Parameters

S	The size of the state space.
A	The size of the action space.
discount	The discount to use when learning.
alpha	The learning rate for positive updates.
beta	The learning rate for negative updates.

◆ HystereticQLearning() [2/2]

template<IsGenerativeModel M>

AIToolbox::MDP::HystereticQLearning::HystereticQLearning	(	const M &	model,
		double	alpha = `0.1`,
		double	beta = `0.01`
	)

Basic constructor.

The alpha learning rate must be > 0.0 and <= 1.0, otherwise the constructor will throw an std::invalid_argument.

The beta learning rate must be >= 0.0 and <= 1.0, otherwise the constructor will throw an std::invalid_argument. It can be zero.

Keep in mind that the beta parameter should be lower than the alpha parameter, although this is not enforced.

This constructor copies the S and A and discount parameters from the supplied model. It does not keep the reference, so if the discount needs to change you'll need to update it here manually too.

Parameters

model	The MDP model that HystereticQLearning will use as a base.
alpha	The learning rate of the HystereticQLearning method.
beta	The learning rate for negative updates.

Member Function Documentation

◆ getA()

size_t AIToolbox::MDP::HystereticQLearning::getA ( ) const

This function returns the number of actions on which HystereticQLearning is working.

Returns: The number of actions.

◆ getDiscount()

double AIToolbox::MDP::HystereticQLearning::getDiscount ( ) const

This function returns the currently set discount parameter.

Returns: The currently set discount parameter.

◆ getNegativeLearningRate()

double AIToolbox::MDP::HystereticQLearning::getNegativeLearningRate ( ) const

This function will return the currently set learning rate parameter for negative updates.

Returns: The currently set learning rate parameter for negative updates.

◆ getPositiveLearningRate()

double AIToolbox::MDP::HystereticQLearning::getPositiveLearningRate ( ) const

This function will return the currently set learning rate parameter for positive updates.

Returns: The currently set learning rate parameter for positive updates.

◆ getQFunction()

const QFunction& AIToolbox::MDP::HystereticQLearning::getQFunction ( ) const

This function returns a reference to the internal QFunction.

The returned reference can be used to build Policies, for example MDP::QGreedyPolicy.

Returns: The internal QFunction.

◆ getS()

size_t AIToolbox::MDP::HystereticQLearning::getS ( ) const

This function returns the number of states on which HystereticQLearning is working.

Returns: The number of states.

◆ setDiscount()

void AIToolbox::MDP::HystereticQLearning::setDiscount ( double d )

This function sets the new discount parameter.

The discount parameter controls the amount that future rewards are considered by HystereticQLearning. If 1, then any reward is the same, if obtained now or in a million timesteps. Thus the algorithm will optimize overall reward accretion. When less than 1, rewards obtained in the presents are valued more than future rewards.

Parameters

d	The new discount factor.

◆ setNegativeLearningRate()

void AIToolbox::MDP::HystereticQLearning::setNegativeLearningRate ( double b )

This function sets the learning rate parameter for negative updates.

The learning parameter determines the speed at which the QFunction is modified with respect to new data, when updates are negative.

Note that this parameter can be zero.

The learning rate parameter must be >= 0.0 and <= 1.0, otherwise the function will throw an std::invalid_argument.

Parameters

b	The new learning rate parameter for negative updates.

◆ setPositiveLearningRate()

void AIToolbox::MDP::HystereticQLearning::setPositiveLearningRate ( double a )

This function sets the learning rate parameter for positive updates.

The learning parameter determines the speed at which the QFunction is modified with respect to new data, when updates are positive.

The learning rate parameter must be > 0.0 and <= 1.0, otherwise the function will throw an std::invalid_argument.

Parameters

a	The new learning rate parameter for positive updates.

◆ stepUpdateQ()

void AIToolbox::MDP::HystereticQLearning::stepUpdateQ	(	size_t	s,
		size_t	a,
		size_t	s1,
		double	rew
	)

This function updates the internal QFunction using the discount set during construction.

This function takes a single experience point and uses it to update the QFunction. This is a very efficient method to keep the QFunction up to date with the latest experience.

Parameters

s	The previous state.
a	The action performed.
s1	The new state.
rew	The reward obtained.

The documentation for this class was generated from the following file:

include/AIToolbox/MDP/Algorithms/HystereticQLearning.hpp

Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ HystereticQLearning() [1/2]

◆ HystereticQLearning() [2/2]

Member Function Documentation

◆ getA()

◆ getDiscount()

◆ getNegativeLearningRate()

◆ getPositiveLearningRate()

◆ getQFunction()

◆ getS()

◆ setDiscount()

◆ setNegativeLearningRate()

◆ setPositiveLearningRate()

◆ stepUpdateQ()