This class represents the Sparse Cooperative QLearning algorithm. More...

#include <AIToolbox/Factored/MDP/Algorithms/SparseCooperativeQLearning.hpp>

Public Member Functions
	SparseCooperativeQLearning (State S, Action A, const std::vector< QFunctionRule > &rules, double discount, double alpha)
	Basic constructor. More...

void	setLearningRate (double a)
	This function sets the learning rate parameter. More...

double	getLearningRate () const
	This function will return the current set learning rate parameter. More...

void	setDiscount (double d)
	This function sets the new discount parameter. More...

double	getDiscount () const
	This function returns the currently set discount parameter. More...

Action	stepUpdateQ (const State &s, const Action &a, const State &s1, const Rewards &rew)
	This function updates the internal QFunctionRules based on experience. More...

const State &	getS () const
	This function returns the state space on which SparseCooperativeQLearning is working. More...

const Action &	getA () const
	This function returns the action space on which SparseCooperativeQLearning is working. More...

const FilterMap< QFunctionRule > &	getQFunctionRules () const
	This function returns a reference to the internal FilterMap of QFunctionRules. More...

Detailed Description

This class represents the Sparse Cooperative QLearning algorithm.

This algorithm is designed to work in cooperative multi-agent problems, but can as easily be used for factored state/action single agent MDPs (since the two things are equivalent).

Rather than having a single huge QFunction covering all possible state/action pairs, SparseCooperativeQLearning keeps its QFunction split into QFunctionRule. Each rule covers a specific reward that can be obtained via a PartialState and PartialAction.

As the agent interacts with the world, these rules are updated to better reflect the rewards obtained from the environment. At each timestep, each rule applicable on the starting State and Action are updated based on the next State and the optimal Action that is computed with the existing rules via VariableElimination.

Aside from this, this algorithm is very similar to the single agent MDP::QLearning (hence the name).

Constructor & Destructor Documentation

◆ SparseCooperativeQLearning()

AIToolbox::Factored::MDP::SparseCooperativeQLearning::SparseCooperativeQLearning	(	State	S,
		Action	A,
		const std::vector< QFunctionRule > &	rules,
		double	discount,
		double	alpha
	)

Basic constructor.

This constructor initializes all data structures and parameters for the correct functioning of QLearning.

Note: This algorithm can be used for bandit problems by simply omitting the state part (giving in an empty vector for states), rather than giving a single state vector. This should speed things up a bit.

Parameters

S	The factored state space of the environment.
A	The factored action space for the agent.
rules	The QFunctionRules to operate upon.
discount	The discount for future rewards.
alpha	The learning parameter.

Member Function Documentation

◆ getA()

const Action& AIToolbox::Factored::MDP::SparseCooperativeQLearning::getA ( ) const

This function returns the action space on which SparseCooperativeQLearning is working.

Returns: The number of actions.

◆ getDiscount()

double AIToolbox::Factored::MDP::SparseCooperativeQLearning::getDiscount ( ) const

This function returns the currently set discount parameter.

Returns: The currently set discount parameter.

◆ getLearningRate()

double AIToolbox::Factored::MDP::SparseCooperativeQLearning::getLearningRate ( ) const

This function will return the current set learning rate parameter.

Returns: The currently set learning rate parameter.

◆ getQFunctionRules()

const FilterMap<QFunctionRule>& AIToolbox::Factored::MDP::SparseCooperativeQLearning::getQFunctionRules ( ) const

This function returns a reference to the internal FilterMap of QFunctionRules.

Returns: The internal QFunctionRules.

◆ getS()

const State& AIToolbox::Factored::MDP::SparseCooperativeQLearning::getS ( ) const

This function returns the state space on which SparseCooperativeQLearning is working.

Returns: The number of states.

◆ setDiscount()

void AIToolbox::Factored::MDP::SparseCooperativeQLearning::setDiscount ( double d )

This function sets the new discount parameter.

The discount parameter controls the amount that future rewards are considered by SparseCooperativeQLearning. If 1, then any reward is the same, if obtained now or in a million timesteps. Thus the algorithm will optimize overall reward accretion. When less than 1, rewards obtained in the presents are valued more than future rewards.

Parameters

d	The new discount factor.

◆ setLearningRate()

void AIToolbox::Factored::MDP::SparseCooperativeQLearning::setLearningRate ( double a )

This function sets the learning rate parameter.

The learning parameter determines the speed at which the QFunctions are modified with respect to new data. In fully deterministic environments (such as an agent moving through a grid, for example), this parameter can be safely set to 1.0 for maximum learning.

On the other side, in stochastic environments, in order to converge this parameter should be higher when first starting to learn, and decrease slowly over time.

Otherwise it can be kept somewhat high if the environment dynamics change progressively, and the algorithm will adapt accordingly. The final behavior of SparseCooperativeQLearning is very dependent on this parameter.

The learning rate parameter must be > 0.0 and <= 1.0, otherwise the function will throw an std::invalid_argument.

Parameters

a	The new learning rate parameter.

◆ stepUpdateQ()

Action AIToolbox::Factored::MDP::SparseCooperativeQLearning::stepUpdateQ	(	const State &	s,
		const Action &	a,
		const State &	s1,
		const Rewards &	rew
	)

This function updates the internal QFunctionRules based on experience.

This function takes a single experience point and uses it to update the QFunctionRules. Since in order to do this we have to compute the best possible action for the next timestep, we return it in case it is needed.

Note: this algorithm expects one reward per factored action (i.e. the size of the action input and the rewards input should be the same)!

Parameters

s	The previous state.
a	The action performed.
s1	The new state.
rew	The reward obtained.

Returns: The best action to be performed in the next timestep.

The documentation for this class was generated from the following file:

include/AIToolbox/Factored/MDP/Algorithms/SparseCooperativeQLearning.hpp

Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ SparseCooperativeQLearning()

Member Function Documentation

◆ getA()

◆ getDiscount()

◆ getLearningRate()

◆ getQFunctionRules()

◆ getS()

◆ setDiscount()

◆ setLearningRate()

◆ stepUpdateQ()