AIToolbox
A library that offers tools for AI problem solving.
AIToolbox::Bandit::QGreedyPolicyWrapper< V, Gen > Class Template Reference

This class implements some basic greedy policy primitives. More...

#include <AIToolbox/Bandit/Policies/Utils/QGreedyPolicyWrapper.hpp>

Public Member Functions

 QGreedyPolicyWrapper (V q, std::vector< size_t > &buffer, Gen &gen)
 Basic constructor. More...
 
size_t sampleAction ()
 This function chooses the greediest action. More...
 
double getActionProbability (size_t a) const
 This function returns the probability of taking the specified action. More...
 
template<typename P >
void getPolicy (P &&p) const
 This function writes in a vector all probabilities of the policy. More...
 

Detailed Description

template<typename V, typename Gen>
class AIToolbox::Bandit::QGreedyPolicyWrapper< V, Gen >

This class implements some basic greedy policy primitives.

Since the basic operations on discrete vectors to select an action greedily are the same both in Bandits and in MDPs, we implement them once here. This class operates on references, so that it does not need to allocate memory and we can keep using the most appropriate storage for whatever problem we are actually working on.

Note that you shouldn't really need to specify the template parameters by hand.

Constructor & Destructor Documentation

◆ QGreedyPolicyWrapper()

template<typename V , typename Gen >
AIToolbox::Bandit::QGreedyPolicyWrapper< V, Gen >::QGreedyPolicyWrapper ( q,
std::vector< size_t > &  buffer,
Gen &  gen 
)

Basic constructor.

Parameters
qReference to the QFunction to operate on.
bufferA buffer space free to use (will be overwritten).
genA random number generator.

Member Function Documentation

◆ getActionProbability()

template<typename V , typename Gen >
double AIToolbox::Bandit::QGreedyPolicyWrapper< V, Gen >::getActionProbability ( size_t  a) const

This function returns the probability of taking the specified action.

If multiple greedy actions exist, this function returns the correct probability of picking each one, since we return a random one with sampleAction().

Parameters
aThe selected action.
Returns
This function returns 0 if the action is not greedy, and 1/the number of greedy actions otherwise.

◆ getPolicy()

template<typename V , typename Gen >
template<typename P >
void AIToolbox::Bandit::QGreedyPolicyWrapper< V, Gen >::getPolicy ( P &&  p) const

This function writes in a vector all probabilities of the policy.

Ideally this function can be called only when there is a repeated need to access the same policy values in an efficient manner.

◆ sampleAction()

template<typename V , typename Gen >
size_t AIToolbox::Bandit::QGreedyPolicyWrapper< V, Gen >::sampleAction

This function chooses the greediest action.

If multiple actions would be equally as greedy, a random one is returned.

Returns
The chosen action.

The documentation for this class was generated from the following file: