AIToolbox
A library that offers tools for AI problem solving.
|
This class computes averages and counts for a multi-agent cooperative Bandit problem. More...
#include <AIToolbox/Factored/Bandit/Experience.hpp>
Public Types | |
using | VisitsTable = std::vector< std::vector< unsigned long > > |
using | Indeces = std::vector< size_t > |
Public Member Functions | |
Experience (Action A, const std::vector< PartialKeys > &dependencies) | |
Basic constructor. More... | |
const Indeces & | record (const Action &a, const Rewards &rews) |
This function updates the QFunction and counts. More... | |
void | reset () |
This function resets the QFunction and counts to zero. More... | |
const std::vector< PartialKeys > & | getDependencies () const |
This function returns the local groups of agents. More... | |
unsigned long | getTimesteps () const |
This function returns the number of times the record function has been called. More... | |
const QFunction & | getRewardMatrix () const |
This function returns a reference to the internal QFunction. More... | |
const VisitsTable & | getVisitsTable () const |
This function returns a reference for the counts for the actions. More... | |
const std::vector< Vector > & | getM2Matrix () const |
This function returns the estimated squared distance of the samples from the mean. More... | |
const Action & | getA () const |
This function returns the size of the action space. More... | |
This class computes averages and counts for a multi-agent cooperative Bandit problem.
This class can be used to compute the averages and counts for all actions in a Bandit problem over time. The class assumes that the problem is factored, and agents depend on each other in smaller groups.
using AIToolbox::Factored::Bandit::Experience::Indeces = std::vector<size_t> |
using AIToolbox::Factored::Bandit::Experience::VisitsTable = std::vector<std::vector<unsigned long> > |
AIToolbox::Factored::Bandit::Experience::Experience | ( | Action | A, |
const std::vector< PartialKeys > & | dependencies | ||
) |
Basic constructor.
A | The size of the action space. |
dependencies | The local groups to record. Multiple groups with the same keys are allowed. |
const Action& AIToolbox::Factored::Bandit::Experience::getA | ( | ) | const |
This function returns the size of the action space.
const std::vector<PartialKeys>& AIToolbox::Factored::Bandit::Experience::getDependencies | ( | ) | const |
This function returns the local groups of agents.
const std::vector<Vector>& AIToolbox::Factored::Bandit::Experience::getM2Matrix | ( | ) | const |
This function returns the estimated squared distance of the samples from the mean.
The retuned values estimate sum_i (x_i - mean_x)^2 for the rewards of each local action. Note that these values only have meaning when the respective action has at least 2 samples.
const QFunction& AIToolbox::Factored::Bandit::Experience::getRewardMatrix | ( | ) | const |
This function returns a reference to the internal QFunction.
The reward matrix contains the current average rewards computed for each action.
unsigned long AIToolbox::Factored::Bandit::Experience::getTimesteps | ( | ) | const |
This function returns the number of times the record function has been called.
const VisitsTable& AIToolbox::Factored::Bandit::Experience::getVisitsTable | ( | ) | const |
This function returns a reference for the counts for the actions.
const Indeces& AIToolbox::Factored::Bandit::Experience::record | ( | const Action & | a, |
const Rewards & | rews | ||
) |
This function updates the QFunction and counts.
This function additionally returns a reference to the indeces updated for each group of agents. This is useful, for example, when updating a model or a policy without needing to recompute these indeces all the time.
a | The action taken. |
rews | The rewards obtained in the previous timestep, one per agent group. |
void AIToolbox::Factored::Bandit::Experience::reset | ( | ) |
This function resets the QFunction and counts to zero.