AIToolbox
A library that offers tools for AI problem solving.
AIToolbox::Bandit::T3CPolicy Class Reference

This class implements the T3C sampling policy. More...

#include <AIToolbox/Bandit/Policies/T3CPolicy.hpp>

Inheritance diagram for AIToolbox::Bandit::T3CPolicy:
AIToolbox::Bandit::PolicyInterface AIToolbox::PolicyInterface< void, void, size_t >

Public Member Functions

 T3CPolicy (const Experience &exp, double beta, double var)
 Basic constructor. More...
 
virtual size_t sampleAction () const override
 This function chooses an action using T3CPolicy. More...
 
size_t recommendAction () const
 This function returns the most likely best action until this point. More...
 
virtual double getActionProbability (const size_t &a) const override
 This function returns the probability of taking the specified action. More...
 
virtual Vector getPolicy () const override
 This function returns a vector containing all probabilities of the policy. More...
 
const ExperiencegetExperience () const
 This function returns a reference to the underlying Experience we use. More...
 
- Public Member Functions inherited from AIToolbox::PolicyInterface< void, void, size_t >
 PolicyInterface (void s, size_t a)
 Basic constructor. More...
 
virtual ~PolicyInterface ()
 Basic virtual destructor. More...
 
virtual size_t sampleAction (const void &s) const=0
 This function chooses a random action for state s, following the policy distribution. More...
 
virtual double getActionProbability (const void &s, const size_t &a) const=0
 This function returns the probability of taking the specified action in the specified state. More...
 
const void & getS () const
 This function returns the number of states of the world. More...
 
const size_t & getA () const
 This function returns the number of available actions to the agent. More...
 

Additional Inherited Members

- Public Types inherited from AIToolbox::Bandit::PolicyInterface
using Base = AIToolbox::PolicyInterface< void, void, size_t >
 
- Protected Attributes inherited from AIToolbox::PolicyInterface< void, void, size_t >
void S
 
size_t A
 
RandomEngine rand_
 

Detailed Description

This class implements the T3C sampling policy.

This class assumes that the rewards of all bandit arms are normally distributed, with all arms having the same variance.

T3C was designed as a replacement for TopTwoThompsonSamplingPolicy. The main idea is that, when we want to pull the estimated second best arm, instead of having to resample the arm means until a new unique contender appears, we can deterministically compute that contender using a measure of distance between the distributions of the arms.

This allows the algorithm to keep the computational costs low even after many pulls, while TopTwoThompsonSamplingPolicy tends to degrade in performance as time passes (as resampling is less and less likely to generate a unique second best contender).

Constructor & Destructor Documentation

◆ T3CPolicy()

AIToolbox::Bandit::T3CPolicy::T3CPolicy ( const Experience exp,
double  beta,
double  var 
)

Basic constructor.

Parameters
expThe Experience we learn from.
betaThe probability of playing the first sampled best action instead of the second sampled best.
varThe known variance of all arms.

Member Function Documentation

◆ getActionProbability()

virtual double AIToolbox::Bandit::T3CPolicy::getActionProbability ( const size_t &  a) const
overridevirtual

This function returns the probability of taking the specified action.

WARNING: The only way to compute the true probability of selecting the input action is via empirical sampling. we simply call sampleAction() a lot and return an approximation of the times the input action was actually selected. This makes this function very very SLOW. Do not call at will!!

Parameters
aThe selected action.
Returns
This function returns an approximation of the probability of choosing the input action.

◆ getExperience()

const Experience& AIToolbox::Bandit::T3CPolicy::getExperience ( ) const

This function returns a reference to the underlying Experience we use.

Returns
The internal Experience reference.

◆ getPolicy()

virtual Vector AIToolbox::Bandit::T3CPolicy::getPolicy ( ) const
overridevirtual

This function returns a vector containing all probabilities of the policy.

Ideally this function can be called only when there is a repeated need to access the same policy values in an efficient manner.

WARNING: This can be really expensive, as it does pretty much the same work as getActionProbability(). It shouldn't be slower than that call though, so if you do need the overall policy, do call this method.

Implements AIToolbox::Bandit::PolicyInterface.

◆ recommendAction()

size_t AIToolbox::Bandit::T3CPolicy::recommendAction ( ) const

This function returns the most likely best action until this point.

Returns
The most likely best action.

◆ sampleAction()

virtual size_t AIToolbox::Bandit::T3CPolicy::sampleAction ( ) const
overridevirtual

This function chooses an action using T3CPolicy.

Returns
The chosen action.

The documentation for this class was generated from the following file: