AIToolbox
A library that offers tools for AI problem solving.
QSoftmaxPolicy.hpp
Go to the documentation of this file.
1 #ifndef AI_TOOLBOX_BANDIT_Q_SOFTMAX_POLICY_HEADER_FILE
2 #define AI_TOOLBOX_BANDIT_Q_SOFTMAX_POLICY_HEADER_FILE
3 
6 
7 namespace AIToolbox::Bandit {
22  public:
29  QSoftmaxPolicy(const QFunction & q, double temperature = 1.0);
30 
47  virtual size_t sampleAction() const override;
48 
58  virtual double getActionProbability(const size_t & a) const override;
59 
76  void setTemperature(double t);
77 
83  double getTemperature() const;
84 
92  virtual Vector getPolicy() const override;
93 
94  private:
95  double temperature_;
96  const QFunction & q_;
97  // To avoid reallocating a vector every time for sampling.
98  mutable std::vector<size_t> bestActions_;
99  mutable Vector vbuffer_;
100  };
101 }
102 
103 #endif
AIToolbox::Bandit::QSoftmaxPolicy::getActionProbability
virtual double getActionProbability(const size_t &a) const override
This function returns the probability of taking the specified action in the specified state.
AIToolbox::Bandit::QFunction
Vector QFunction
Definition: Types.hpp:16
AIToolbox::Bandit::PolicyInterface
Simple typedef for most of a normal Bandit's policy needs.
Definition: PolicyInterface.hpp:11
AIToolbox::Vector
Eigen::Matrix< double, Eigen::Dynamic, 1 > Vector
Definition: Types.hpp:16
AIToolbox::Bandit
Definition: Experience.hpp:6
AIToolbox::Bandit::QSoftmaxPolicy
This class implements a softmax policy through a QFunction.
Definition: QSoftmaxPolicy.hpp:21
AIToolbox::Bandit::QSoftmaxPolicy::getTemperature
double getTemperature() const
This function will return the currently set temperature parameter.
PolicyInterface.hpp
AIToolbox::Bandit::QSoftmaxPolicy::sampleAction
virtual size_t sampleAction() const override
This function chooses an action for state s with probability dependent on value.
Types.hpp
AIToolbox::Bandit::QSoftmaxPolicy::setTemperature
void setTemperature(double t)
This function sets the temperature parameter.
AIToolbox::Bandit::QSoftmaxPolicy::QSoftmaxPolicy
QSoftmaxPolicy(const QFunction &q, double temperature=1.0)
Basic constructor.
AIToolbox::Bandit::QSoftmaxPolicy::getPolicy
virtual Vector getPolicy() const override
This function returns a vector containing all probabilities of the policy.