AIToolbox
A library that offers tools for AI problem solving.
|
This class implements a softmax policy through a QFunction. More...
#include <AIToolbox/MDP/Policies/QSoftmaxPolicy.hpp>
Public Member Functions | |
QSoftmaxPolicy (const QFunction &q, double temperature=1.0) | |
Basic constructor. More... | |
virtual size_t | sampleAction (const size_t &s) const override |
This function chooses an action for state s with probability dependent on value. More... | |
virtual double | getActionProbability (const size_t &s, const size_t &a) const override |
This function returns the probability of taking the specified action in the specified state. More... | |
virtual Matrix2D | getPolicy () const override |
This function returns a matrix containing all probabilities of the policy. More... | |
void | setTemperature (double t) |
This function sets the temperature parameter. More... | |
double | getTemperature () const |
This function will return the currently set temperature parameter. More... | |
Public Member Functions inherited from AIToolbox::MDP::QPolicyInterface | |
QPolicyInterface (const QFunction &q) | |
Basic constructor. More... | |
const QFunction & | getQFunction () const |
This function returns the underlying QFunction reference. More... | |
Public Member Functions inherited from AIToolbox::PolicyInterface< size_t, size_t, size_t > | |
PolicyInterface (size_t s, size_t a) | |
Basic constructor. More... | |
virtual | ~PolicyInterface () |
Basic virtual destructor. More... | |
const size_t & | getS () const |
This function returns the number of states of the world. More... | |
const size_t & | getA () const |
This function returns the number of available actions to the agent. More... | |
Additional Inherited Members | |
Public Types inherited from AIToolbox::MDP::PolicyInterface | |
using | Base = AIToolbox::PolicyInterface< size_t, size_t, size_t > |
Protected Attributes inherited from AIToolbox::MDP::QPolicyInterface | |
const QFunction & | q_ |
Protected Attributes inherited from AIToolbox::PolicyInterface< size_t, size_t, size_t > | |
size_t | S |
size_t | A |
RandomEngine | rand_ |
This class implements a softmax policy through a QFunction.
A softmax policy is a policy that selects actions based on their expected reward: the more advantageous an action seems to be, the more probable its selection is. There are many ways to implement a softmax policy, this class implements selection using the most common method of sampling from a Boltzmann distribution.
As the epsilon-policy, this type of policy is useful to force the agent to explore an unknown model, in order to gain new information to refine it and thus gain more reward.
AIToolbox::MDP::QSoftmaxPolicy::QSoftmaxPolicy | ( | const QFunction & | q, |
double | temperature = 1.0 |
||
) |
Basic constructor.
The temperature parameter must be >= 0.0 otherwise the constructor will throw an std::invalid_argument.
q | The QFunction this policy is linked with. |
temperature | The parameter that controls the amount of exploration. |
|
overridevirtual |
This function returns the probability of taking the specified action in the specified state.
s | The selected state. |
a | The selected action. |
Implements AIToolbox::PolicyInterface< size_t, size_t, size_t >.
|
overridevirtual |
This function returns a matrix containing all probabilities of the policy.
Note that this may be expensive to compute, and should not be called often (aside from the fact that it needs to allocate a new Matrix2D each time).
Ideally this function can be called only when there is a repeated need to access the same policy values in an efficient manner.
Implements AIToolbox::MDP::PolicyInterface.
double AIToolbox::MDP::QSoftmaxPolicy::getTemperature | ( | ) | const |
This function will return the currently set temperature parameter.
|
overridevirtual |
This function chooses an action for state s with probability dependent on value.
This class implements softmax through the Boltzmann distribution. Thus an action will be chosen with probability:
\[ P(a) = \frac{e^{(Q(s,a)/t)})}{\sum_b{e^{(Q(s,b)/t)}}} \]
where t is the temperature. This value is not cached anywhere, so continuous sampling may not be extremely fast.
s | The sampled state of the policy. |
Implements AIToolbox::PolicyInterface< size_t, size_t, size_t >.
void AIToolbox::MDP::QSoftmaxPolicy::setTemperature | ( | double | t | ) |
This function sets the temperature parameter.
The temperature parameter determines the amount of exploration this policy will enforce when selecting actions. Following the Boltzmann distribution, as the temperature approaches infinity all actions will become equally probable. On the opposite side, as the temperature approaches zero, action selection will become completely greedy.
The temperature parameter must be >= 0.0 otherwise the function will do throw std::invalid_argument.
t | The new temperature parameter. |