This class computes averages and counts for a Bandit problem. More...

#include <AIToolbox/Bandit/Experience.hpp>

Public Types
using	VisitsTable = std::vector< unsigned long >

Public Member Functions
	Experience (size_t A)
	Basic constructor. More...

void	record (size_t a, double rew)
	This function updates the reward matrix and counts. More...

void	reset ()
	This function resets the QFunction and counts to zero. More...

unsigned long	getTimesteps () const
	This function returns the number of times the record function has been called. More...

const QFunction &	getRewardMatrix () const
	This function returns a reference to the internal reward matrix. More...

const VisitsTable &	getVisitsTable () const
	This function returns a reference for the counts for the actions. More...

const Vector &	getM2Matrix () const
	This function returns the estimated squared distance of the samples from the mean. More...

size_t	getA () const
	This function returns the size of the action space. More...

Detailed Description

This class computes averages and counts for a Bandit problem.

This class can be used to compute the averages and counts for all actions in a Bandit problem over time.

Member Typedef Documentation

◆ VisitsTable

using AIToolbox::Bandit::Experience::VisitsTable = std::vector<unsigned long>

Constructor & Destructor Documentation

◆ Experience()

AIToolbox::Bandit::Experience::Experience ( size_t A )

Basic constructor.

Parameters

A	The size of the action space.

Member Function Documentation

◆ getA()

size_t AIToolbox::Bandit::Experience::getA ( ) const

This function returns the size of the action space.

Returns: The size of the action space.

◆ getM2Matrix()

const Vector& AIToolbox::Bandit::Experience::getM2Matrix ( ) const

This function returns the estimated squared distance of the samples from the mean.

The retuned values estimate sum_i (x_i - mean_x)^2 for the rewards of each action. Note that these values only have meaning when the respective action has at least 2 samples.

Returns: A reference to the estimated square distance from the mean.

◆ getRewardMatrix()

const QFunction& AIToolbox::Bandit::Experience::getRewardMatrix ( ) const

This function returns a reference to the internal reward matrix.

The reward matrix contains the current average rewards computed for each action.

Returns: A reference to the internal reward matrix.

◆ getTimesteps()

unsigned long AIToolbox::Bandit::Experience::getTimesteps ( ) const

This function returns the number of times the record function has been called.

Returns: The number of recorded timesteps.

◆ getVisitsTable()

const VisitsTable& AIToolbox::Bandit::Experience::getVisitsTable ( ) const

This function returns a reference for the counts for the actions.

Returns: A reference to the counts of the actions.

◆ record()

void AIToolbox::Bandit::Experience::record	(	size_t	a,
		double	rew
	)

This function updates the reward matrix and counts.