This class implements some basic softmax policy primitives. More...

#include <AIToolbox/Bandit/Policies/Utils/QSoftmaxPolicyWrapper.hpp>

Public Member Functions
	QSoftmaxPolicyWrapper (double t, V q, Vector &valueBuffer, std::vector< size_t > &buffer, Gen &gen)
	Basic constructor. More...

size_t	sampleAction ()
	This function chooses an action for state s with probability dependent on value. More...

double	getActionProbability (size_t a) const
	This function returns the probability of taking the specified action in the specified state. More...

template<typename P >
void	getPolicy (P &&p) const
	This function writes in a vector all probabilities of the policy. More...

Detailed Description

template<typename V, typename Gen>
class AIToolbox::Bandit::QSoftmaxPolicyWrapper< V, Gen >

This class implements some basic softmax policy primitives.

Since the basic operations on discrete vectors to select an action with softmax are the same both in Bandits and in MDPs, we implement them once here. This class operates on references, so that it does not need to allocate memory and we can keep using the most appropriate storage for whatever problem we are actually working on.

Note that you shouldn't really need to specify the template parameters by hand.

Constructor & Destructor Documentation

◆ QSoftmaxPolicyWrapper()

template<typename V , typename Gen >

AIToolbox::Bandit::QSoftmaxPolicyWrapper< V, Gen >::QSoftmaxPolicyWrapper	(	double	t,
		V	q,
		Vector &	valueBuffer,
		std::vector< size_t > &	buffer,
		Gen &	gen
	)

Basic constructor.

Parameters

t	The temperature to use.
q	A reference to the QFunction to use.
valueBuffer	A buffer to compute softmax values.
buffer	A buffer to determine which action to take in case of equalities.
gen	A random engine.

Member Function Documentation

◆ getActionProbability()

template<typename V , typename Gen >

double AIToolbox::Bandit::QSoftmaxPolicyWrapper< V, Gen >::getActionProbability ( size_t a ) const

This function returns the probability of taking the specified action in the specified state.

See also: sampleAction();

Parameters

a	The selected action.

Returns: The probability of taking the specified action in the specified state.

◆ getPolicy()

template<typename V , typename Gen >

template<typename P >

void AIToolbox::Bandit::QSoftmaxPolicyWrapper< V, Gen >::getPolicy ( P && p ) const

This function writes in a vector all probabilities of the policy.

Ideally this function can be called only when there is a repeated need to access the same policy values in an efficient manner.

◆ sampleAction()

template<typename V , typename Gen >

size_t AIToolbox::Bandit::QSoftmaxPolicyWrapper< V, Gen >::sampleAction

This function chooses an action for state s with probability dependent on value.

This class implements softmax through the Boltzmann distribution. Thus an action will be chosen with probability:

\[ P(a) = \frac{e^{(Q(a)/t)})}{\sum_b{e^{(Q(b)/t)}}} \]

where t is the temperature. This value is not cached anywhere, so continuous sampling may not be extremely fast.

Returns: The chosen action.

The documentation for this class was generated from the following file:

include/AIToolbox/Bandit/Policies/Utils/QSoftmaxPolicyWrapper.hpp

Public Member Functions

Detailed Description

template<typename V, typename Gen> class AIToolbox::Bandit::QSoftmaxPolicyWrapper< V, Gen >

Constructor & Destructor Documentation

◆ QSoftmaxPolicyWrapper()

Member Function Documentation

◆ getActionProbability()

◆ getPolicy()

◆ sampleAction()

template<typename V, typename Gen>
class AIToolbox::Bandit::QSoftmaxPolicyWrapper< V, Gen >