AIToolbox
A library that offers tools for AI problem solving.
|
This class represents the PrioritizedSweeping algorithm. More...
#include <AIToolbox/MDP/Algorithms/PrioritizedSweeping.hpp>
Public Member Functions | |
PrioritizedSweeping (const M &m, double theta=0.5, unsigned n=50) | |
Basic constructor. More... | |
void | stepUpdateQ (size_t s, size_t a) |
This function updates the PrioritizedSweeping internal update queue. More... | |
void | batchUpdateQ () |
This function updates a QFunction based on simulated experience. More... | |
void | setQueueThreshold (double t) |
This function sets the theta parameter. More... | |
double | getQueueThreshold () const |
This function will return the currently set theta parameter. More... | |
void | setN (unsigned n) |
This function sets the number of sampling passes during batchUpdateQ(). More... | |
unsigned | getN () const |
This function returns the currently set number of sampling passes during batchUpdateQ(). More... | |
size_t | getQueueLength () const |
This function returns the current number of elements unprocessed in the queue. More... | |
const M & | getModel () const |
This function returns a reference to the referenced Model. More... | |
const QFunction & | getQFunction () const |
This function returns a reference to the internal QFunction. More... | |
void | setQFunction (const QFunction &q) |
This function allows you to set the value of the internal QFunction. More... | |
const ValueFunction & | getValueFunction () const |
This function returns a reference to the internal ValueFunction. More... | |
This class represents the PrioritizedSweeping algorithm.
This algorithm is a refinement of the DynaQ algorithm. Instead of randomly sampling experienced state action pairs to get more information, we order each pair based on an estimate of how much information we can still extract from them.
In particular, pairs are sorted based on the amount they modified the estimated ValueFunction on their last sample. This ensures that we always try to sample from useful pairs instead of randomly, extracting knowledge much faster.
At the same time, this algorithm keeps a threshold for each state-action pair, so that it does not have to internally store all the pairs and save some memory/cpu time keeping the queue updated. Only pairs which obtained an amount of change higher than this treshold are kept in the queue.
Differently from the QLearning and DynaQ algorithm, this class automatically computes the ValueFunction since it is useful to determine which state-action pairs are actually useful, so there's no need to compute it manually.
Given how this algorithm updates the QFunction, the only problems supported by this approach are ones with an infinite horizon.
AIToolbox::MDP::PrioritizedSweeping< M >::PrioritizedSweeping | ( | const M & | m, |
double | theta = 0.5 , |
||
unsigned | n = 50 |
||
) |
Basic constructor.
m | The model to be used to update the QFunction. |
theta | The queue threshold. |
n | The number of sampling passes to do on the model upon batchUpdateQ(). |
void AIToolbox::MDP::PrioritizedSweeping< M >::batchUpdateQ |
This function updates a QFunction based on simulated experience.
In PrioritizedSweeping we sample from the queue at most N times for state action pairs that need updating. For each one of them we update the QFunction and recursively check whether this produces new changes worth updating. If so, they are inserted in the queue_ and the function proceeds to the next most urgent iteration.
const M & AIToolbox::MDP::PrioritizedSweeping< M >::getModel |
unsigned AIToolbox::MDP::PrioritizedSweeping< M >::getN |
This function returns the currently set number of sampling passes during batchUpdateQ().
const QFunction & AIToolbox::MDP::PrioritizedSweeping< M >::getQFunction |
This function returns a reference to the internal QFunction.
size_t AIToolbox::MDP::PrioritizedSweeping< M >::getQueueLength |
This function returns the current number of elements unprocessed in the queue.
double AIToolbox::MDP::PrioritizedSweeping< M >::getQueueThreshold |
This function will return the currently set theta parameter.
const ValueFunction & AIToolbox::MDP::PrioritizedSweeping< M >::getValueFunction |
This function returns a reference to the internal ValueFunction.
void AIToolbox::MDP::PrioritizedSweeping< M >::setN | ( | unsigned | n | ) |
This function sets the number of sampling passes during batchUpdateQ().
n | The new number of updates. |
void AIToolbox::MDP::PrioritizedSweeping< M >::setQFunction | ( | const QFunction & | q | ) |
This function allows you to set the value of the internal QFunction.
This function can be useful in case you are starting with an already populated Experience/Model, which you can solve (for example with ValueIteration) and then improve the solution with new experience.
q | The QFunction that will be copied. |
void AIToolbox::MDP::PrioritizedSweeping< M >::setQueueThreshold | ( | double | t | ) |
This function sets the theta parameter.
The discount parameter must be >= 0.0. otherwise the function will throw an std::invalid_argument.
t | The new theta parameter. |
void AIToolbox::MDP::PrioritizedSweeping< M >::stepUpdateQ | ( | size_t | s, |
size_t | a | ||
) |
This function updates the PrioritizedSweeping internal update queue.
This function updates the QFunction for the specified pair, and decides whether any parent couple that can lead to this state is worth pushing into the queue.
s | The previous state. |
a | The action performed. |