AIToolbox
A library that offers tools for AI problem solving.
AIToolbox::Factored::Bandit::Experience Class Reference

This class computes averages and counts for a multi-agent cooperative Bandit problem. More...

#include <AIToolbox/Factored/Bandit/Experience.hpp>

Public Types

using VisitsTable = std::vector< std::vector< unsigned long > >
 
using Indeces = std::vector< size_t >
 

Public Member Functions

 Experience (Action A, const std::vector< PartialKeys > &dependencies)
 Basic constructor. More...
 
const Indecesrecord (const Action &a, const Rewards &rews)
 This function updates the QFunction and counts. More...
 
void reset ()
 This function resets the QFunction and counts to zero. More...
 
const std::vector< PartialKeys > & getDependencies () const
 This function returns the local groups of agents. More...
 
unsigned long getTimesteps () const
 This function returns the number of times the record function has been called. More...
 
const QFunctiongetRewardMatrix () const
 This function returns a reference to the internal QFunction. More...
 
const VisitsTablegetVisitsTable () const
 This function returns a reference for the counts for the actions. More...
 
const std::vector< Vector > & getM2Matrix () const
 This function returns the estimated squared distance of the samples from the mean. More...
 
const ActiongetA () const
 This function returns the size of the action space. More...
 

Detailed Description

This class computes averages and counts for a multi-agent cooperative Bandit problem.

This class can be used to compute the averages and counts for all actions in a Bandit problem over time. The class assumes that the problem is factored, and agents depend on each other in smaller groups.

Member Typedef Documentation

◆ Indeces

◆ VisitsTable

using AIToolbox::Factored::Bandit::Experience::VisitsTable = std::vector<std::vector<unsigned long> >

Constructor & Destructor Documentation

◆ Experience()

AIToolbox::Factored::Bandit::Experience::Experience ( Action  A,
const std::vector< PartialKeys > &  dependencies 
)

Basic constructor.

Parameters
AThe size of the action space.
dependenciesThe local groups to record. Multiple groups with the same keys are allowed.

Member Function Documentation

◆ getA()

const Action& AIToolbox::Factored::Bandit::Experience::getA ( ) const

This function returns the size of the action space.

Returns
The size of the action space.

◆ getDependencies()

const std::vector<PartialKeys>& AIToolbox::Factored::Bandit::Experience::getDependencies ( ) const

This function returns the local groups of agents.

◆ getM2Matrix()

const std::vector<Vector>& AIToolbox::Factored::Bandit::Experience::getM2Matrix ( ) const

This function returns the estimated squared distance of the samples from the mean.

The retuned values estimate sum_i (x_i - mean_x)^2 for the rewards of each local action. Note that these values only have meaning when the respective action has at least 2 samples.

Returns
A reference to the estimated square distance from the mean.

◆ getRewardMatrix()

const QFunction& AIToolbox::Factored::Bandit::Experience::getRewardMatrix ( ) const

This function returns a reference to the internal QFunction.

The reward matrix contains the current average rewards computed for each action.

Returns
A reference to the internal QFunction.

◆ getTimesteps()

unsigned long AIToolbox::Factored::Bandit::Experience::getTimesteps ( ) const

This function returns the number of times the record function has been called.

Returns
The number of recorded timesteps.

◆ getVisitsTable()

const VisitsTable& AIToolbox::Factored::Bandit::Experience::getVisitsTable ( ) const

This function returns a reference for the counts for the actions.

Returns
A reference to the counts of the actions.

◆ record()

const Indeces& AIToolbox::Factored::Bandit::Experience::record ( const Action a,
const Rewards rews 
)

This function updates the QFunction and counts.

This function additionally returns a reference to the indeces updated for each group of agents. This is useful, for example, when updating a model or a policy without needing to recompute these indeces all the time.

Parameters
aThe action taken.
rewsThe rewards obtained in the previous timestep, one per agent group.
Returns
The indeces of each agent group updated.

◆ reset()

void AIToolbox::Factored::Bandit::Experience::reset ( )

This function resets the QFunction and counts to zero.


The documentation for this class was generated from the following file: