This class computes averages and counts for a multi-agent cooperative Bandit problem. More...

#include <AIToolbox/Factored/Bandit/Experience.hpp>

Public Types
using	VisitsTable = std::vector< std::vector< unsigned long > >

using	Indeces = std::vector< size_t >

Public Member Functions
	Experience (Action A, const std::vector< PartialKeys > &dependencies)
	Basic constructor. More...

const Indeces &	record (const Action &a, const Rewards &rews)
	This function updates the QFunction and counts. More...

void	reset ()
	This function resets the QFunction and counts to zero. More...

const std::vector< PartialKeys > &	getDependencies () const
	This function returns the local groups of agents. More...

unsigned long	getTimesteps () const
	This function returns the number of times the record function has been called. More...

const QFunction &	getRewardMatrix () const
	This function returns a reference to the internal QFunction. More...

const VisitsTable &	getVisitsTable () const
	This function returns a reference for the counts for the actions. More...

const std::vector< Vector > &	getM2Matrix () const
	This function returns the estimated squared distance of the samples from the mean. More...

const Action &	getA () const
	This function returns the size of the action space. More...

Detailed Description

This class computes averages and counts for a multi-agent cooperative Bandit problem.

This class can be used to compute the averages and counts for all actions in a Bandit problem over time. The class assumes that the problem is factored, and agents depend on each other in smaller groups.

Member Typedef Documentation

◆ Indeces

using AIToolbox::Factored::Bandit::Experience::Indeces = std::vector<size_t>

◆ VisitsTable

using AIToolbox::Factored::Bandit::Experience::VisitsTable = std::vector<std::vector<unsigned long> >

Constructor & Destructor Documentation

◆ Experience()

AIToolbox::Factored::Bandit::Experience::Experience	(	Action	A,
		const std::vector< PartialKeys > &	dependencies
	)

Basic constructor.

Parameters

A	The size of the action space.
dependencies	The local groups to record. Multiple groups with the same keys are allowed.

Member Function Documentation

◆ getA()

const Action& AIToolbox::Factored::Bandit::Experience::getA ( ) const

This function returns the size of the action space.

Returns: The size of the action space.

◆ getDependencies()

const std::vector<PartialKeys>& AIToolbox::Factored::Bandit::Experience::getDependencies ( ) const

This function returns the local groups of agents.

◆ getM2Matrix()

const std::vector<Vector>& AIToolbox::Factored::Bandit::Experience::getM2Matrix ( ) const

This function returns the estimated squared distance of the samples from the mean.

The retuned values estimate sum_i (x_i - mean_x)^2 for the rewards of each local action. Note that these values only have meaning when the respective action has at least 2 samples.

Returns: A reference to the estimated square distance from the mean.

◆ getRewardMatrix()

const QFunction& AIToolbox::Factored::Bandit::Experience::getRewardMatrix ( ) const

This function returns a reference to the internal QFunction.

The reward matrix contains the current average rewards computed for each action.

Returns: A reference to the internal QFunction.

◆ getTimesteps()

unsigned long AIToolbox::Factored::Bandit::Experience::getTimesteps ( ) const

This function returns the number of times the record function has been called.

Returns: The number of recorded timesteps.

◆ getVisitsTable()

const VisitsTable& AIToolbox::Factored::Bandit::Experience::getVisitsTable ( ) const

This function returns a reference for the counts for the actions.

Returns: A reference to the counts of the actions.

◆ record()

const Indeces& AIToolbox::Factored::Bandit::Experience::record	(	const Action &	a,
		const Rewards &	rews
	)

This function updates the QFunction and counts.

This function additionally returns a reference to the indeces updated for each group of agents. This is useful, for example, when updating a model or a policy without needing to recompute these indeces all the time.