AIToolbox
A library that offers tools for AI problem solving.
|
This class represents the Hysteretic QLearning algorithm. More...
#include <AIToolbox/MDP/Algorithms/HystereticQLearning.hpp>
Public Member Functions | |
HystereticQLearning (size_t S, size_t A, double discount=1.0, double alpha=0.1, double beta=0.01) | |
Basic constructor. More... | |
template<IsGenerativeModel M> | |
HystereticQLearning (const M &model, double alpha=0.1, double beta=0.01) | |
Basic constructor. More... | |
void | setPositiveLearningRate (double a) |
This function sets the learning rate parameter for positive updates. More... | |
double | getPositiveLearningRate () const |
This function will return the currently set learning rate parameter for positive updates. More... | |
void | setNegativeLearningRate (double b) |
This function sets the learning rate parameter for negative updates. More... | |
double | getNegativeLearningRate () const |
This function will return the currently set learning rate parameter for negative updates. More... | |
void | setDiscount (double d) |
This function sets the new discount parameter. More... | |
double | getDiscount () const |
This function returns the currently set discount parameter. More... | |
void | stepUpdateQ (size_t s, size_t a, size_t s1, double rew) |
This function updates the internal QFunction using the discount set during construction. More... | |
size_t | getS () const |
This function returns the number of states on which HystereticQLearning is working. More... | |
size_t | getA () const |
This function returns the number of actions on which HystereticQLearning is working. More... | |
const QFunction & | getQFunction () const |
This function returns a reference to the internal QFunction. More... | |
This class represents the Hysteretic QLearning algorithm.
This algorithm is a very simple but powerful way to learn the optimal QFunction for an MDP model, where the transition and reward functions are unknown. It works in an offline fashion, meaning that it can be used even if the policy that the agent is currently using is not the optimal one, or is different by the one currently specified by the HystereticQLearning QFunction.
The algorithm functions quite like the normal QLearning algorithm, with a small difference: it has an additional learning parameter, beta.
One of the learning parameters (alpha) is used when the change to the underlying QFunction is positive. The other (beta), which should be kept lower than alpha, is used when the change is negative.
This is useful when using QLearning for multi-agent RL where each agent is independent. A multi-agent environment is non-stationary from the point of view of a single agent, which is disruptive for normal QLearning and generally prevents it to learn to coordinate with the other agents well.
By assigning a higher learning parameter to transitions resulting in a positive feedback, the agent insulates itself from bad results which happen when the other agents take exploratory actions.
Bad results are still guaranteed to be discovered, since the learning parameter is still greater than zero, but the algorithm tries to focus on the good things rather than the bad.
If the beta parameter is equal to the alpha, this becomes standard QLearning. When the beta parameter is zero, the algorithm becomes equivalent to Distributed QLearning.
AIToolbox::MDP::HystereticQLearning::HystereticQLearning | ( | size_t | S, |
size_t | A, | ||
double | discount = 1.0 , |
||
double | alpha = 0.1 , |
||
double | beta = 0.01 |
||
) |
Basic constructor.
The alpha learning rate must be > 0.0 and <= 1.0, otherwise the constructor will throw an std::invalid_argument.
The beta learning rate must be >= 0.0 and <= 1.0, otherwise the constructor will throw an std::invalid_argument. It can be zero.
Keep in mind that the beta parameter should be lower than the alpha parameter, although this is not enforced.
S | The size of the state space. |
A | The size of the action space. |
discount | The discount to use when learning. |
alpha | The learning rate for positive updates. |
beta | The learning rate for negative updates. |
AIToolbox::MDP::HystereticQLearning::HystereticQLearning | ( | const M & | model, |
double | alpha = 0.1 , |
||
double | beta = 0.01 |
||
) |
Basic constructor.
The alpha learning rate must be > 0.0 and <= 1.0, otherwise the constructor will throw an std::invalid_argument.
The beta learning rate must be >= 0.0 and <= 1.0, otherwise the constructor will throw an std::invalid_argument. It can be zero.
Keep in mind that the beta parameter should be lower than the alpha parameter, although this is not enforced.
This constructor copies the S and A and discount parameters from the supplied model. It does not keep the reference, so if the discount needs to change you'll need to update it here manually too.
model | The MDP model that HystereticQLearning will use as a base. |
alpha | The learning rate of the HystereticQLearning method. |
beta | The learning rate for negative updates. |
size_t AIToolbox::MDP::HystereticQLearning::getA | ( | ) | const |
This function returns the number of actions on which HystereticQLearning is working.
double AIToolbox::MDP::HystereticQLearning::getDiscount | ( | ) | const |
This function returns the currently set discount parameter.
double AIToolbox::MDP::HystereticQLearning::getNegativeLearningRate | ( | ) | const |
This function will return the currently set learning rate parameter for negative updates.
double AIToolbox::MDP::HystereticQLearning::getPositiveLearningRate | ( | ) | const |
This function will return the currently set learning rate parameter for positive updates.
const QFunction& AIToolbox::MDP::HystereticQLearning::getQFunction | ( | ) | const |
This function returns a reference to the internal QFunction.
The returned reference can be used to build Policies, for example MDP::QGreedyPolicy.
size_t AIToolbox::MDP::HystereticQLearning::getS | ( | ) | const |
This function returns the number of states on which HystereticQLearning is working.
void AIToolbox::MDP::HystereticQLearning::setDiscount | ( | double | d | ) |
This function sets the new discount parameter.
The discount parameter controls the amount that future rewards are considered by HystereticQLearning. If 1, then any reward is the same, if obtained now or in a million timesteps. Thus the algorithm will optimize overall reward accretion. When less than 1, rewards obtained in the presents are valued more than future rewards.
d | The new discount factor. |
void AIToolbox::MDP::HystereticQLearning::setNegativeLearningRate | ( | double | b | ) |
This function sets the learning rate parameter for negative updates.
The learning parameter determines the speed at which the QFunction is modified with respect to new data, when updates are negative.
Note that this parameter can be zero.
The learning rate parameter must be >= 0.0 and <= 1.0, otherwise the function will throw an std::invalid_argument.
b | The new learning rate parameter for negative updates. |
void AIToolbox::MDP::HystereticQLearning::setPositiveLearningRate | ( | double | a | ) |
This function sets the learning rate parameter for positive updates.
The learning parameter determines the speed at which the QFunction is modified with respect to new data, when updates are positive.
The learning rate parameter must be > 0.0 and <= 1.0, otherwise the function will throw an std::invalid_argument.
a | The new learning rate parameter for positive updates. |
void AIToolbox::MDP::HystereticQLearning::stepUpdateQ | ( | size_t | s, |
size_t | a, | ||
size_t | s1, | ||
double | rew | ||
) |
This function updates the internal QFunction using the discount set during construction.
This function takes a single experience point and uses it to update the QFunction. This is a very efficient method to keep the QFunction up to date with the latest experience.
s | The previous state. |
a | The action performed. |
s1 | The new state. |
rew | The reward obtained. |