AIToolbox
A library that offers tools for AI problem solving.
AIToolbox::Factored::MDP::CooperativeQLearning Class Reference

This class represents the Cooperative QLearning algorithm. More...

#include <AIToolbox/Factored/MDP/Algorithms/CooperativeQLearning.hpp>

Public Member Functions

 CooperativeQLearning (const DDNGraph &g, const std::vector< std::vector< size_t >> &basisDomains, double discount, double alpha)
 Basic constructor. More...
 
Action stepUpdateQ (const State &s, const Action &a, const State &s1, const Rewards &rew)
 This function updates the internal QFunction based on experience. More...
 
void setLearningRate (double a)
 This function sets the learning rate parameter. More...
 
double getLearningRate () const
 This function will return the current set learning rate parameter. More...
 
void setDiscount (double d)
 This function sets the new discount parameter. More...
 
double getDiscount () const
 This function returns the currently set discount parameter. More...
 
const DDNGraphgetGraph () const
 This function returns the DDN on which SparseCooperativeQLearning is working. More...
 
const StategetS () const
 This function returns the state space on which SparseCooperativeQLearning is working. More...
 
const ActiongetA () const
 This function returns the action space on which SparseCooperativeQLearning is working. More...
 
const FactoredMatrix2DgetQFunction () const
 This function returns a reference to the internal QFunction. More...
 
void setQFunction (double val)
 This function sets the QFunction to a set value. More...
 

Detailed Description

This class represents the Cooperative QLearning algorithm.

This is the same as SparseCooperativeQLearning, but we handle dense factored spaces. This obviously is less flexible, but is computationally much faster and can help scale SCQL to larger problems.

Constructor & Destructor Documentation

◆ CooperativeQLearning()

AIToolbox::Factored::MDP::CooperativeQLearning::CooperativeQLearning ( const DDNGraph g,
const std::vector< std::vector< size_t >> &  basisDomains,
double  discount,
double  alpha 
)

Basic constructor.

This constructor initializes all data structures and parameters for the correct functioning of QLearning.

The Q-function is constructed so that each factor has a domain equal to the DDN parents of the relative input basisDomain.

Parameters
gThe DDN of the environment.
basisDomainsThe domains of the Q-Function to use.
discountThe discount for future rewards.
alphaThe learning parameter.

Member Function Documentation

◆ getA()

const Action& AIToolbox::Factored::MDP::CooperativeQLearning::getA ( ) const

This function returns the action space on which SparseCooperativeQLearning is working.

Returns
The number of actions.

◆ getDiscount()

double AIToolbox::Factored::MDP::CooperativeQLearning::getDiscount ( ) const

This function returns the currently set discount parameter.

Returns
The currently set discount parameter.

◆ getGraph()

const DDNGraph& AIToolbox::Factored::MDP::CooperativeQLearning::getGraph ( ) const

This function returns the DDN on which SparseCooperativeQLearning is working.

Returns
The number of states.

◆ getLearningRate()

double AIToolbox::Factored::MDP::CooperativeQLearning::getLearningRate ( ) const

This function will return the current set learning rate parameter.

Returns
The currently set learning rate parameter.

◆ getQFunction()

const FactoredMatrix2D& AIToolbox::Factored::MDP::CooperativeQLearning::getQFunction ( ) const

This function returns a reference to the internal QFunction.

Returns
The internal QFunction.

◆ getS()

const State& AIToolbox::Factored::MDP::CooperativeQLearning::getS ( ) const

This function returns the state space on which SparseCooperativeQLearning is working.

Returns
The number of states.

◆ setDiscount()

void AIToolbox::Factored::MDP::CooperativeQLearning::setDiscount ( double  d)

This function sets the new discount parameter.

The discount parameter controls the amount that future rewards are considered by SparseCooperativeQLearning. If 1, then any reward is the same, if obtained now or in a million timesteps. Thus the algorithm will optimize overall reward accretion. When less than 1, rewards obtained in the presents are valued more than future rewards.

Parameters
dThe new discount factor.

◆ setLearningRate()

void AIToolbox::Factored::MDP::CooperativeQLearning::setLearningRate ( double  a)

This function sets the learning rate parameter.

The learning parameter determines the speed at which the QFunctions are modified with respect to new data. In fully deterministic environments (such as an agent moving through a grid, for example), this parameter can be safely set to 1.0 for maximum learning.

On the other side, in stochastic environments, in order to converge this parameter should be higher when first starting to learn, and decrease slowly over time.

Otherwise it can be kept somewhat high if the environment dynamics change progressively, and the algorithm will adapt accordingly. The final behavior of SparseCooperativeQLearning is very dependent on this parameter.

The learning rate parameter must be > 0.0 and <= 1.0, otherwise the function will throw an std::invalid_argument.

Parameters
aThe new learning rate parameter.

◆ setQFunction()

void AIToolbox::Factored::MDP::CooperativeQLearning::setQFunction ( double  val)

This function sets the QFunction to a set value.

This function is useful to perform optimistic initialization.

Parameters
valThe value to set all entries in the QFunction.

◆ stepUpdateQ()

Action AIToolbox::Factored::MDP::CooperativeQLearning::stepUpdateQ ( const State s,
const Action a,
const State s1,
const Rewards rew 
)

This function updates the internal QFunction based on experience.

This function takes a single experience point and uses it to update the QFunction. Since in order to do this we have to compute the best possible action for the next timestep, we return it in case it is needed.

Note: this algorithm expects one reward per factored action (i.e. the size of the action input and the rewards input should be the same)!

Parameters
sThe previous state.
aThe action performed.
s1The new state.
rewThe reward obtained.
Returns
The best action to be performed in the next timestep.

The documentation for this class was generated from the following file: