AIToolbox
A library that offers tools for AI problem solving.
|
This class represents the DynaQ algorithm. More...
#include <AIToolbox/MDP/Algorithms/DynaQ.hpp>
Public Member Functions | |
DynaQ (const M &m, double alpha=0.5, unsigned n=50) | |
Basic constructor. More... | |
void | stepUpdateQ (size_t s, size_t a, size_t s1, double rew) |
This function updates the internal QFunction. More... | |
void | batchUpdateQ () |
This function updates a QFunction based on simulated experience. More... | |
void | setLearningRate (double a) |
This function sets the learning rate parameter. More... | |
double | getLearningRate () const |
This function will return the current set learning rate parameter. More... | |
void | setN (unsigned n) |
This function sets the current sample number parameter. More... | |
unsigned | getN () const |
This function returns the currently set number of sampling passes during batchUpdateQ(). More... | |
const QFunction & | getQFunction () const |
This function returns a reference to the internal QFunction. More... | |
const M & | getModel () const |
This function returns a reference to the referenced Model. More... | |
This class represents the DynaQ algorithm.
This algorithm is a simple extension to the QLearning algorithm. What it does is it keeps track of every experienced state-action pair. Each QFunction update is exactly equivalent to the QLearning one, however this algorithm allows for an additional learning phase that can take place, time permitting, before the agent takes another action.
The state-action pairs we already explored are thus known as possible, and so we use the generative model to obtain more and more data about them. This, of course, requires that the model be sampled from, in constrast with QLearning which does not require this.
The algorithm selects randomly which state action pairs to try again from.
|
explicit |
Basic constructor.
m | The model to be used to update the QFunction. |
alpha | The learning rate of the QLearning method. |
n | The number of sampling passes to do on the model upon batchUpdateQ(). |
void AIToolbox::MDP::DynaQ< M >::batchUpdateQ |
This function updates a QFunction based on simulated experience.
In DynaQ we sample N times from already experienced state-action pairs, and we update the resulting QFunction as if this experience was actually real.
The idea is that since we know which state action pairs we already explored, we know that whose pairs are actually possible. Thus we use the generative model to sample them again, and obtain a better estimate of the QFunction.
double AIToolbox::MDP::DynaQ< M >::getLearningRate |
This function will return the current set learning rate parameter.
const M & AIToolbox::MDP::DynaQ< M >::getModel |
unsigned AIToolbox::MDP::DynaQ< M >::getN |
This function returns the currently set number of sampling passes during batchUpdateQ().
const QFunction & AIToolbox::MDP::DynaQ< M >::getQFunction |
This function returns a reference to the internal QFunction.
void AIToolbox::MDP::DynaQ< M >::setLearningRate | ( | double | a | ) |
This function sets the learning rate parameter.
The learning rate parameter must be > 0.0 and <= 1.0, otherwise the function will throw an std::invalid_argument.
a | The new learning rate parameter. |
void AIToolbox::MDP::DynaQ< M >::setN | ( | unsigned | n | ) |
This function sets the current sample number parameter.
n | The new sample number parameter. |
void AIToolbox::MDP::DynaQ< M >::stepUpdateQ | ( | size_t | s, |
size_t | a, | ||
size_t | s1, | ||
double | rew | ||
) |
This function updates the internal QFunction.
This function takes a single experience point and uses it to update a QFunction. This is a very efficient method to keep the QFunction up to date with the latest experience.
In addition, the sampling list is updated so that batch updating becomes possible as a second phase.
The sampling list in DynaQ is a simple list of all visited state action pairs. This function is responsible for inserting them in a set, keeping them unique.
s | The previous state. |
a | The action performed. |
s1 | The new state. |
rew | The reward obtained. |