AIToolbox
A library that offers tools for AI problem solving.
|
This class implements PrioritizedSweeping for cooperative environments. More...
#include <AIToolbox/Factored/MDP/Algorithms/CooperativePrioritizedSweeping.hpp>
Public Member Functions | |
CooperativePrioritizedSweeping (const M &m, std::vector< std::vector< size_t >> basisDomains, double alpha=0.3, double theta=0.001) | |
Basic constructor. More... | |
void | stepUpdateQ (const State &s, const Action &a, const State &s1, const Rewards &r) |
This function performs a single update of the Q-Function with the input data. More... | |
void | batchUpdateQ (const unsigned N=50) |
This function performs a series of batch updates using the model to sample. More... | |
QGreedyPolicy< Maximizer > & | getInternalQGreedyPolicy () |
This function returns the QGreedyPolicy we use to determine a1* in the updates. More... | |
const QGreedyPolicy< Maximizer > & | getInternalQGreedyPolicy () const |
This function returns the QGreedyPolicy we use to determine a1* in the updates. More... | |
const QFunction & | getQFunction () const |
This function returns a reference to the internal QFunction. More... | |
void | setQFunction (double val) |
This function sets the QFunction to a set value. More... | |
This class implements PrioritizedSweeping for cooperative environments.
This class allows to perform prioritized sweeping in cooperative environments.
CooperativePrioritizedSweeping learns an approximation of the true QFunction. After each interaction with the environment, the estimated QFunction is updated. Additionally, a priority queue is updated which keeps sets of the state and action spaces which are likely to need updating.
These sets are then sampled during batch updating, and the input model (which should be also learned via environment interaction) is used to sample new state-reward pairs to further refine the QFunction.
M | The type of the model to sample from. |
AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >::CooperativePrioritizedSweeping | ( | const M & | m, |
std::vector< std::vector< size_t >> | basisDomains, | ||
double | alpha = 0.3 , |
||
double | theta = 0.001 |
||
) |
Basic constructor.
m | The model to use for learning. |
basisDomains | The domains of the Q-Function to use. |
alpha | The alpha parameter of the Q-Learning update. |
theta | The threshold for queue inclusion. |
void AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >::batchUpdateQ | ( | const unsigned | N = 50 | ) |
This function performs a series of batch updates using the model to sample.
The updates are generated from the contents of the queue, so that the updates are done in priority order.
N | The number of priority updates to perform. |
const QGreedyPolicy< Maximizer > & AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >::getInternalQGreedyPolicy |
This function returns the QGreedyPolicy we use to determine a1* in the updates.
This function is useful to set the parameters of the Maximizer used by the policy, or even to use it to sample actions greedily from the QFunction without necessarily constructing another policy.
const QGreedyPolicy<Maximizer>& AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >::getInternalQGreedyPolicy | ( | ) | const |
This function returns the QGreedyPolicy we use to determine a1* in the updates.
const QFunction & AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >::getQFunction |
This function returns a reference to the internal QFunction.
void AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >::setQFunction | ( | double | val | ) |
This function sets the QFunction to a set value.
This function is useful to perform optimistic initialization.
val | The value to set all entries in the QFunction. |
void AIToolbox::Factored::MDP::CooperativePrioritizedSweeping< M, Maximizer >::stepUpdateQ | ( | const State & | s, |
const Action & | a, | ||
const State & | s1, | ||
const Rewards & | r | ||
) |
This function performs a single update of the Q-Function with the input data.
s | The initial state. |
a | The action performed. |
s1 | The final state. |
r | The rewards obtained (one per state factor). |