AIToolbox
A library that offers tools for AI problem solving.
AIToolbox::Factored::MDP::JointActionLearner Class Reference

This class represents a single Joint Action Learner agent. More...

#include <AIToolbox/Factored/MDP/Algorithms/JointActionLearner.hpp>

Public Member Functions

 JointActionLearner (size_t S, Action A, size_t id, double discount=1.0, double alpha=0.1)
 Basic constructor. More...
 
void stepUpdateQ (size_t s, const Action &a, size_t s1, double rew)
 This function updates the internal joint QFunction. More...
 
const AIToolbox::MDP::QFunctiongetJointQFunction () const
 This function returns the internal joint QFunction. More...
 
const AIToolbox::MDP::QFunctiongetSingleQFunction () const
 This function returns the internal single QFunction. More...
 
void setLearningRate (double a)
 This function sets the learning rate parameter. More...
 
double getLearningRate () const
 This function will return the current set learning rate parameter. More...
 
void setDiscount (double d)
 This function sets the new discount parameter. More...
 
double getDiscount () const
 This function returns the currently set discount parameter. More...
 
size_t getS () const
 This function returns the number of states on which JointActionLearner is working. More...
 
const ActiongetA () const
 This function returns the action space on which JointActionLearner is working. More...
 
size_t getId () const
 This function returns the id of the agent represented by this class. More...
 

Detailed Description

This class represents a single Joint Action Learner agent.

A JAL agent learns a QFunction for its own values while keeping track of the actions performed by the other agents with which it is interacting.

In order to reason about its own QFunction, a JAL keeps a model of the policies of the other agents. This is done by keeping counters for each actions that other agents have performed, and performing a maximum likelihood computation in order to estimate their policies.

While internally a QFunction is kept for the full joint action space, after using the policy models the output will be a normal MDP::QFunction, which can then be used to provide a policy.

The internal learning is done using MDP::QLearning.

This method does not try to handle factorized states. Here we also assume that the joint action space is of reasonable size, as we allocate an MDP::QFunction for it.

See also
AIToolbox::MDP::QLearning

Constructor & Destructor Documentation

◆ JointActionLearner()

AIToolbox::Factored::MDP::JointActionLearner::JointActionLearner ( size_t  S,
Action  A,
size_t  id,
double  discount = 1.0,
double  alpha = 0.1 
)

Basic constructor.

Parameters
SThe size of the state space.
AThe size of the joint action space.
idThe id of this agent in the joint action space.
discountThe discount factor for the QLearning process.
alphaThe learning rate for the QLearning process.

Member Function Documentation

◆ getA()

const Action& AIToolbox::Factored::MDP::JointActionLearner::getA ( ) const

This function returns the action space on which JointActionLearner is working.

Returns
The action space.

◆ getDiscount()

double AIToolbox::Factored::MDP::JointActionLearner::getDiscount ( ) const

This function returns the currently set discount parameter.

Returns
The currently set discount parameter.

◆ getId()

size_t AIToolbox::Factored::MDP::JointActionLearner::getId ( ) const

This function returns the id of the agent represented by this class.

Returns
The id of this agent.

◆ getJointQFunction()

const AIToolbox::MDP::QFunction& AIToolbox::Factored::MDP::JointActionLearner::getJointQFunction ( ) const

This function returns the internal joint QFunction.

Returns
A reference to the internal joint QFunction.

◆ getLearningRate()

double AIToolbox::Factored::MDP::JointActionLearner::getLearningRate ( ) const

This function will return the current set learning rate parameter.

Returns
The currently set learning rate parameter.

◆ getS()

size_t AIToolbox::Factored::MDP::JointActionLearner::getS ( ) const

This function returns the number of states on which JointActionLearner is working.

Returns
The number of states.

◆ getSingleQFunction()

const AIToolbox::MDP::QFunction& AIToolbox::Factored::MDP::JointActionLearner::getSingleQFunction ( ) const

This function returns the internal single QFunction.

Returns
A reference to the internal single QFunction.

◆ setDiscount()

void AIToolbox::Factored::MDP::JointActionLearner::setDiscount ( double  d)

This function sets the new discount parameter.

The discount parameter controls the amount that future rewards are considered by QLearning. If 1, then any reward is the same, if obtained now or in a million timesteps. Thus the algorithm will optimize overall reward accretion. When less than 1, rewards obtained in the presents are valued more than future rewards.

See also
QLearning
Parameters
dThe new discount factor.

◆ setLearningRate()

void AIToolbox::Factored::MDP::JointActionLearner::setLearningRate ( double  a)

This function sets the learning rate parameter.

The learning parameter determines the speed at which the QFunction is modified with respect to new data. In fully deterministic environments (such as an agent moving through a grid, for example), this parameter can be safely set to 1.0 for maximum learning.

The learning rate parameter must be > 0.0 and <= 1.0, otherwise the function will throw an std::invalid_argument.

See also
QLearning
Parameters
aThe new learning rate parameter.

◆ stepUpdateQ()

void AIToolbox::Factored::MDP::JointActionLearner::stepUpdateQ ( size_t  s,
const Action a,
size_t  s1,
double  rew 
)

This function updates the internal joint QFunction.

This function updates the counts for the actions of the other agents, and the value of the joint QFunction based on the inputs.

Then, it updates the single agent QFunction only for the initial state using the internal counts to update its expected value given the new estimates for the other agents' policies.

Parameters
sThe previous state.
aThe action performed.
s1The new state.
rewThe reward obtained.

The documentation for this class was generated from the following file: