This class implements PrioritizedSweeping for cooperative environments. More...

#include <AIToolbox/Factored/MDP/Algorithms/CooperativePrioritizedSweeping.hpp>

Public Member Functions
	CooperativePrioritizedSweeping (const M &m, std::vector< std::vector< size_t >> basisDomains, double alpha=0.3, double theta=0.001)
	Basic constructor. More...

void	stepUpdateQ (const State &s, const Action &a, const State &s1, const Rewards &r)
	This function performs a single update of the Q-Function with the input data. More...

void	batchUpdateQ (const unsigned N=50)
	This function performs a series of batch updates using the model to sample. More...

QGreedyPolicy< Maximizer > &	getInternalQGreedyPolicy ()
	This function returns the QGreedyPolicy we use to determine a1* in the updates. More...

const QGreedyPolicy< Maximizer > &	getInternalQGreedyPolicy () const
	This function returns the QGreedyPolicy we use to determine a1* in the updates. More...

const QFunction &	getQFunction () const
	This function returns a reference to the internal QFunction. More...

void	setQFunction (double val)
	This function sets the QFunction to a set value. More...

Detailed Description

template<typename M, typename Maximizer = Bandit::VariableElimination>
class AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >

This class implements PrioritizedSweeping for cooperative environments.

This class allows to perform prioritized sweeping in cooperative environments.

CooperativePrioritizedSweeping learns an approximation of the true QFunction. After each interaction with the environment, the estimated QFunction is updated. Additionally, a priority queue is updated which keeps sets of the state and action spaces which are likely to need updating.

These sets are then sampled during batch updating, and the input model (which should be also learned via environment interaction) is used to sample new state-reward pairs to further refine the QFunction.

Template Parameters

M	The type of the model to sample from.

Constructor & Destructor Documentation

◆ CooperativePrioritizedSweeping()

template<typename M , typename Maximizer >

AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >::CooperativePrioritizedSweeping	(	const M &	m,
		std::vector< std::vector< size_t >>	basisDomains,
		double	alpha = `0.3`,
		double	theta = `0.001`
	)

Basic constructor.

Parameters

m	The model to use for learning.
basisDomains	The domains of the Q-Function to use.
alpha	The alpha parameter of the Q-Learning update.
theta	The threshold for queue inclusion.

Member Function Documentation

◆ batchUpdateQ()

template<typename M , typename Maximizer >

void AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >::batchUpdateQ ( const unsigned N = 50 )

This function performs a series of batch updates using the model to sample.

The updates are generated from the contents of the queue, so that the updates are done in priority order.

Parameters

N	The number of priority updates to perform.

◆ getInternalQGreedyPolicy() [1/2]

template<typename M , typename Maximizer >

const QGreedyPolicy< Maximizer > & AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >::getInternalQGreedyPolicy

This function returns the QGreedyPolicy we use to determine a1* in the updates.

This function is useful to set the parameters of the Maximizer used by the policy, or even to use it to sample actions greedily from the QFunction without necessarily constructing another policy.

◆ getInternalQGreedyPolicy() [2/2]

template<typename M , typename Maximizer = Bandit::VariableElimination>

const QGreedyPolicy<Maximizer>& AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >::getInternalQGreedyPolicy ( ) const

This function returns the QGreedyPolicy we use to determine a1* in the updates.

◆ getQFunction()

template<typename M , typename Maximizer >

const QFunction & AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >::getQFunction

This function returns a reference to the internal QFunction.

◆ setQFunction()

template<typename M , typename Maximizer >

void AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >::setQFunction ( double val )

This function sets the QFunction to a set value.

This function is useful to perform optimistic initialization.

Parameters

val	The value to set all entries in the QFunction.

◆ stepUpdateQ()

template<typename M , typename Maximizer >

void AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >::stepUpdateQ	(	const State &	s,
		const Action &	a,
		const State &	s1,
		const Rewards &	r
	)

This function performs a single update of the Q-Function with the input data.

Parameters

s	The initial state.
a	The action performed.
s1	The final state.
r	The rewards obtained (one per state factor).

The documentation for this class was generated from the following file:

include/AIToolbox/Factored/MDP/Algorithms/CooperativePrioritizedSweeping.hpp

Public Member Functions

Detailed Description

template<typename M, typename Maximizer = Bandit::VariableElimination> class AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >

Constructor & Destructor Documentation

◆ CooperativePrioritizedSweeping()

Member Function Documentation

◆ batchUpdateQ()

◆ getInternalQGreedyPolicy() [1/2]

◆ getInternalQGreedyPolicy() [2/2]

◆ getQFunction()

◆ setQFunction()

◆ stepUpdateQ()

template<typename M, typename Maximizer = Bandit::VariableElimination>
class AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >