AIToolbox
A library that offers tools for AI problem solving.
AIToolbox::MDP::HystereticQLearning Class Reference

This class represents the Hysteretic QLearning algorithm. More...

#include <AIToolbox/MDP/Algorithms/HystereticQLearning.hpp>

Public Member Functions

 HystereticQLearning (size_t S, size_t A, double discount=1.0, double alpha=0.1, double beta=0.01)
 Basic constructor. More...
 
template<IsGenerativeModel M>
 HystereticQLearning (const M &model, double alpha=0.1, double beta=0.01)
 Basic constructor. More...
 
void setPositiveLearningRate (double a)
 This function sets the learning rate parameter for positive updates. More...
 
double getPositiveLearningRate () const
 This function will return the currently set learning rate parameter for positive updates. More...
 
void setNegativeLearningRate (double b)
 This function sets the learning rate parameter for negative updates. More...
 
double getNegativeLearningRate () const
 This function will return the currently set learning rate parameter for negative updates. More...
 
void setDiscount (double d)
 This function sets the new discount parameter. More...
 
double getDiscount () const
 This function returns the currently set discount parameter. More...
 
void stepUpdateQ (size_t s, size_t a, size_t s1, double rew)
 This function updates the internal QFunction using the discount set during construction. More...
 
size_t getS () const
 This function returns the number of states on which HystereticQLearning is working. More...
 
size_t getA () const
 This function returns the number of actions on which HystereticQLearning is working. More...
 
const QFunctiongetQFunction () const
 This function returns a reference to the internal QFunction. More...
 

Detailed Description

This class represents the Hysteretic QLearning algorithm.

This algorithm is a very simple but powerful way to learn the optimal QFunction for an MDP model, where the transition and reward functions are unknown. It works in an offline fashion, meaning that it can be used even if the policy that the agent is currently using is not the optimal one, or is different by the one currently specified by the HystereticQLearning QFunction.

See also
QLearning

The algorithm functions quite like the normal QLearning algorithm, with a small difference: it has an additional learning parameter, beta.

One of the learning parameters (alpha) is used when the change to the underlying QFunction is positive. The other (beta), which should be kept lower than alpha, is used when the change is negative.

This is useful when using QLearning for multi-agent RL where each agent is independent. A multi-agent environment is non-stationary from the point of view of a single agent, which is disruptive for normal QLearning and generally prevents it to learn to coordinate with the other agents well.

By assigning a higher learning parameter to transitions resulting in a positive feedback, the agent insulates itself from bad results which happen when the other agents take exploratory actions.

Bad results are still guaranteed to be discovered, since the learning parameter is still greater than zero, but the algorithm tries to focus on the good things rather than the bad.

If the beta parameter is equal to the alpha, this becomes standard QLearning. When the beta parameter is zero, the algorithm becomes equivalent to Distributed QLearning.

Constructor & Destructor Documentation

◆ HystereticQLearning() [1/2]

AIToolbox::MDP::HystereticQLearning::HystereticQLearning ( size_t  S,
size_t  A,
double  discount = 1.0,
double  alpha = 0.1,
double  beta = 0.01 
)

Basic constructor.

The alpha learning rate must be > 0.0 and <= 1.0, otherwise the constructor will throw an std::invalid_argument.

The beta learning rate must be >= 0.0 and <= 1.0, otherwise the constructor will throw an std::invalid_argument. It can be zero.

Keep in mind that the beta parameter should be lower than the alpha parameter, although this is not enforced.

Parameters
SThe size of the state space.
AThe size of the action space.
discountThe discount to use when learning.
alphaThe learning rate for positive updates.
betaThe learning rate for negative updates.

◆ HystereticQLearning() [2/2]

template<IsGenerativeModel M>
AIToolbox::MDP::HystereticQLearning::HystereticQLearning ( const M &  model,
double  alpha = 0.1,
double  beta = 0.01 
)

Basic constructor.

The alpha learning rate must be > 0.0 and <= 1.0, otherwise the constructor will throw an std::invalid_argument.

The beta learning rate must be >= 0.0 and <= 1.0, otherwise the constructor will throw an std::invalid_argument. It can be zero.

Keep in mind that the beta parameter should be lower than the alpha parameter, although this is not enforced.

This constructor copies the S and A and discount parameters from the supplied model. It does not keep the reference, so if the discount needs to change you'll need to update it here manually too.

Parameters
modelThe MDP model that HystereticQLearning will use as a base.
alphaThe learning rate of the HystereticQLearning method.
betaThe learning rate for negative updates.

Member Function Documentation

◆ getA()

size_t AIToolbox::MDP::HystereticQLearning::getA ( ) const

This function returns the number of actions on which HystereticQLearning is working.

Returns
The number of actions.

◆ getDiscount()

double AIToolbox::MDP::HystereticQLearning::getDiscount ( ) const

This function returns the currently set discount parameter.

Returns
The currently set discount parameter.

◆ getNegativeLearningRate()

double AIToolbox::MDP::HystereticQLearning::getNegativeLearningRate ( ) const

This function will return the currently set learning rate parameter for negative updates.

Returns
The currently set learning rate parameter for negative updates.

◆ getPositiveLearningRate()

double AIToolbox::MDP::HystereticQLearning::getPositiveLearningRate ( ) const

This function will return the currently set learning rate parameter for positive updates.

Returns
The currently set learning rate parameter for positive updates.

◆ getQFunction()

const QFunction& AIToolbox::MDP::HystereticQLearning::getQFunction ( ) const

This function returns a reference to the internal QFunction.

The returned reference can be used to build Policies, for example MDP::QGreedyPolicy.

Returns
The internal QFunction.

◆ getS()

size_t AIToolbox::MDP::HystereticQLearning::getS ( ) const

This function returns the number of states on which HystereticQLearning is working.

Returns
The number of states.

◆ setDiscount()

void AIToolbox::MDP::HystereticQLearning::setDiscount ( double  d)

This function sets the new discount parameter.

The discount parameter controls the amount that future rewards are considered by HystereticQLearning. If 1, then any reward is the same, if obtained now or in a million timesteps. Thus the algorithm will optimize overall reward accretion. When less than 1, rewards obtained in the presents are valued more than future rewards.

Parameters
dThe new discount factor.

◆ setNegativeLearningRate()

void AIToolbox::MDP::HystereticQLearning::setNegativeLearningRate ( double  b)

This function sets the learning rate parameter for negative updates.

The learning parameter determines the speed at which the QFunction is modified with respect to new data, when updates are negative.

Note that this parameter can be zero.

The learning rate parameter must be >= 0.0 and <= 1.0, otherwise the function will throw an std::invalid_argument.

Parameters
bThe new learning rate parameter for negative updates.

◆ setPositiveLearningRate()

void AIToolbox::MDP::HystereticQLearning::setPositiveLearningRate ( double  a)

This function sets the learning rate parameter for positive updates.

The learning parameter determines the speed at which the QFunction is modified with respect to new data, when updates are positive.

The learning rate parameter must be > 0.0 and <= 1.0, otherwise the function will throw an std::invalid_argument.

Parameters
aThe new learning rate parameter for positive updates.

◆ stepUpdateQ()

void AIToolbox::MDP::HystereticQLearning::stepUpdateQ ( size_t  s,
size_t  a,
size_t  s1,
double  rew 
)

This function updates the internal QFunction using the discount set during construction.

This function takes a single experience point and uses it to update the QFunction. This is a very efficient method to keep the QFunction up to date with the latest experience.

Parameters
sThe previous state.
aThe action performed.
s1The new state.
rewThe reward obtained.

The documentation for this class was generated from the following file: