AIToolbox
A library that offers tools for AI problem solving.
|
Simple typedef for most of a normal Bandit's policy needs. More...
#include <AIToolbox/Bandit/Policies/PolicyInterface.hpp>
Public Types | |
using | Base = AIToolbox::PolicyInterface< void, void, size_t > |
Public Member Functions | |
virtual Vector | getPolicy () const =0 |
This function returns a vector containing all probabilities of the policy. More... | |
Public Member Functions inherited from AIToolbox::PolicyInterface< void, void, size_t > | |
PolicyInterface (void s, size_t a) | |
Basic constructor. More... | |
virtual | ~PolicyInterface () |
Basic virtual destructor. More... | |
virtual size_t | sampleAction (const void &s) const=0 |
This function chooses a random action for state s, following the policy distribution. More... | |
virtual double | getActionProbability (const void &s, const size_t &a) const=0 |
This function returns the probability of taking the specified action in the specified state. More... | |
const void & | getS () const |
This function returns the number of states of the world. More... | |
const size_t & | getA () const |
This function returns the number of available actions to the agent. More... | |
Additional Inherited Members | |
Protected Attributes inherited from AIToolbox::PolicyInterface< void, void, size_t > | |
void | S |
size_t | A |
RandomEngine | rand_ |
Simple typedef for most of a normal Bandit's policy needs.
using AIToolbox::Bandit::PolicyInterface::Base = AIToolbox::PolicyInterface<void, void, size_t> |
|
pure virtual |
This function returns a vector containing all probabilities of the policy.
Note that this may be expensive to compute, and should not be called often (aside from the fact that it needs to allocate a new Vector each time).
Ideally this function can be called only when there is a repeated need to access the same policy values in an efficient manner.
Implemented in AIToolbox::Bandit::ESRLPolicy, AIToolbox::Bandit::LRPPolicy, AIToolbox::Bandit::SuccessiveRejectsPolicy, AIToolbox::Bandit::QSoftmaxPolicy, AIToolbox::Bandit::TopTwoThompsonSamplingPolicy, AIToolbox::Bandit::T3CPolicy, AIToolbox::Bandit::ThompsonSamplingPolicy, AIToolbox::Bandit::QGreedyPolicy, AIToolbox::Bandit::RandomPolicy, and AIToolbox::Bandit::EpsilonPolicy.