AIToolbox
A library that offers tools for AI problem solving.
|
This class represents a single Joint Action Learner agent. More...
#include <AIToolbox/Factored/MDP/Algorithms/JointActionLearner.hpp>
Public Member Functions | |
JointActionLearner (size_t S, Action A, size_t id, double discount=1.0, double alpha=0.1) | |
Basic constructor. More... | |
void | stepUpdateQ (size_t s, const Action &a, size_t s1, double rew) |
This function updates the internal joint QFunction. More... | |
const AIToolbox::MDP::QFunction & | getJointQFunction () const |
This function returns the internal joint QFunction. More... | |
const AIToolbox::MDP::QFunction & | getSingleQFunction () const |
This function returns the internal single QFunction. More... | |
void | setLearningRate (double a) |
This function sets the learning rate parameter. More... | |
double | getLearningRate () const |
This function will return the current set learning rate parameter. More... | |
void | setDiscount (double d) |
This function sets the new discount parameter. More... | |
double | getDiscount () const |
This function returns the currently set discount parameter. More... | |
size_t | getS () const |
This function returns the number of states on which JointActionLearner is working. More... | |
const Action & | getA () const |
This function returns the action space on which JointActionLearner is working. More... | |
size_t | getId () const |
This function returns the id of the agent represented by this class. More... | |
This class represents a single Joint Action Learner agent.
A JAL agent learns a QFunction for its own values while keeping track of the actions performed by the other agents with which it is interacting.
In order to reason about its own QFunction, a JAL keeps a model of the policies of the other agents. This is done by keeping counters for each actions that other agents have performed, and performing a maximum likelihood computation in order to estimate their policies.
While internally a QFunction is kept for the full joint action space, after using the policy models the output will be a normal MDP::QFunction, which can then be used to provide a policy.
The internal learning is done using MDP::QLearning.
This method does not try to handle factorized states. Here we also assume that the joint action space is of reasonable size, as we allocate an MDP::QFunction for it.
AIToolbox::Factored::MDP::JointActionLearner::JointActionLearner | ( | size_t | S, |
Action | A, | ||
size_t | id, | ||
double | discount = 1.0 , |
||
double | alpha = 0.1 |
||
) |
Basic constructor.
S | The size of the state space. |
A | The size of the joint action space. |
id | The id of this agent in the joint action space. |
discount | The discount factor for the QLearning process. |
alpha | The learning rate for the QLearning process. |
const Action& AIToolbox::Factored::MDP::JointActionLearner::getA | ( | ) | const |
This function returns the action space on which JointActionLearner is working.
double AIToolbox::Factored::MDP::JointActionLearner::getDiscount | ( | ) | const |
This function returns the currently set discount parameter.
size_t AIToolbox::Factored::MDP::JointActionLearner::getId | ( | ) | const |
This function returns the id of the agent represented by this class.
const AIToolbox::MDP::QFunction& AIToolbox::Factored::MDP::JointActionLearner::getJointQFunction | ( | ) | const |
This function returns the internal joint QFunction.
double AIToolbox::Factored::MDP::JointActionLearner::getLearningRate | ( | ) | const |
This function will return the current set learning rate parameter.
size_t AIToolbox::Factored::MDP::JointActionLearner::getS | ( | ) | const |
This function returns the number of states on which JointActionLearner is working.
const AIToolbox::MDP::QFunction& AIToolbox::Factored::MDP::JointActionLearner::getSingleQFunction | ( | ) | const |
This function returns the internal single QFunction.
void AIToolbox::Factored::MDP::JointActionLearner::setDiscount | ( | double | d | ) |
This function sets the new discount parameter.
The discount parameter controls the amount that future rewards are considered by QLearning. If 1, then any reward is the same, if obtained now or in a million timesteps. Thus the algorithm will optimize overall reward accretion. When less than 1, rewards obtained in the presents are valued more than future rewards.
d | The new discount factor. |
void AIToolbox::Factored::MDP::JointActionLearner::setLearningRate | ( | double | a | ) |
This function sets the learning rate parameter.
The learning parameter determines the speed at which the QFunction is modified with respect to new data. In fully deterministic environments (such as an agent moving through a grid, for example), this parameter can be safely set to 1.0 for maximum learning.
The learning rate parameter must be > 0.0 and <= 1.0, otherwise the function will throw an std::invalid_argument.
a | The new learning rate parameter. |
void AIToolbox::Factored::MDP::JointActionLearner::stepUpdateQ | ( | size_t | s, |
const Action & | a, | ||
size_t | s1, | ||
double | rew | ||
) |
This function updates the internal joint QFunction.
This function updates the counts for the actions of the other agents, and the value of the joint QFunction based on the inputs.
Then, it updates the single agent QFunction only for the initial state using the internal counts to update its expected value given the new estimates for the other agents' policies.
s | The previous state. |
a | The action performed. |
s1 | The new state. |
rew | The reward obtained. |