AIToolbox
A library that offers tools for AI problem solving.
AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer > Class Template Reference

This class implements PrioritizedSweeping for cooperative environments. More...

#include <AIToolbox/Factored/MDP/Algorithms/CooperativePrioritizedSweeping.hpp>

Public Member Functions

 CooperativePrioritizedSweeping (const M &m, std::vector< std::vector< size_t >> basisDomains, double alpha=0.3, double theta=0.001)
 Basic constructor. More...
 
void stepUpdateQ (const State &s, const Action &a, const State &s1, const Rewards &r)
 This function performs a single update of the Q-Function with the input data. More...
 
void batchUpdateQ (const unsigned N=50)
 This function performs a series of batch updates using the model to sample. More...
 
QGreedyPolicy< Maximizer > & getInternalQGreedyPolicy ()
 This function returns the QGreedyPolicy we use to determine a1* in the updates. More...
 
const QGreedyPolicy< Maximizer > & getInternalQGreedyPolicy () const
 This function returns the QGreedyPolicy we use to determine a1* in the updates. More...
 
const QFunctiongetQFunction () const
 This function returns a reference to the internal QFunction. More...
 
void setQFunction (double val)
 This function sets the QFunction to a set value. More...
 

Detailed Description

template<typename M, typename Maximizer = Bandit::VariableElimination>
class AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >

This class implements PrioritizedSweeping for cooperative environments.

This class allows to perform prioritized sweeping in cooperative environments.

CooperativePrioritizedSweeping learns an approximation of the true QFunction. After each interaction with the environment, the estimated QFunction is updated. Additionally, a priority queue is updated which keeps sets of the state and action spaces which are likely to need updating.

These sets are then sampled during batch updating, and the input model (which should be also learned via environment interaction) is used to sample new state-reward pairs to further refine the QFunction.

Template Parameters
MThe type of the model to sample from.

Constructor & Destructor Documentation

◆ CooperativePrioritizedSweeping()

template<typename M , typename Maximizer >
AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >::CooperativePrioritizedSweeping ( const M &  m,
std::vector< std::vector< size_t >>  basisDomains,
double  alpha = 0.3,
double  theta = 0.001 
)

Basic constructor.

Parameters
mThe model to use for learning.
basisDomainsThe domains of the Q-Function to use.
alphaThe alpha parameter of the Q-Learning update.
thetaThe threshold for queue inclusion.

Member Function Documentation

◆ batchUpdateQ()

template<typename M , typename Maximizer >
void AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >::batchUpdateQ ( const unsigned  N = 50)

This function performs a series of batch updates using the model to sample.

The updates are generated from the contents of the queue, so that the updates are done in priority order.

Parameters
NThe number of priority updates to perform.

◆ getInternalQGreedyPolicy() [1/2]

template<typename M , typename Maximizer >
const QGreedyPolicy< Maximizer > & AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >::getInternalQGreedyPolicy

This function returns the QGreedyPolicy we use to determine a1* in the updates.

This function is useful to set the parameters of the Maximizer used by the policy, or even to use it to sample actions greedily from the QFunction without necessarily constructing another policy.

◆ getInternalQGreedyPolicy() [2/2]

template<typename M , typename Maximizer = Bandit::VariableElimination>
const QGreedyPolicy<Maximizer>& AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >::getInternalQGreedyPolicy ( ) const

This function returns the QGreedyPolicy we use to determine a1* in the updates.

◆ getQFunction()

template<typename M , typename Maximizer >
const QFunction & AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >::getQFunction

This function returns a reference to the internal QFunction.

◆ setQFunction()

template<typename M , typename Maximizer >
void AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >::setQFunction ( double  val)

This function sets the QFunction to a set value.

This function is useful to perform optimistic initialization.

Parameters
valThe value to set all entries in the QFunction.

◆ stepUpdateQ()

template<typename M , typename Maximizer >
void AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >::stepUpdateQ ( const State s,
const Action a,
const State s1,
const Rewards r 
)

This function performs a single update of the Q-Function with the input data.

Parameters
sThe initial state.
aThe action performed.
s1The final state.
rThe rewards obtained (one per state factor).

The documentation for this class was generated from the following file: