AIToolbox
A library that offers tools for AI problem solving.
|
This class keeps track of registered events and rewards. More...
#include <AIToolbox/Factored/MDP/CooperativeExperience.hpp>
Public Types | |
using | RewardMatrix = std::vector< Vector > |
using | VisitsTable = std::vector< Table2D > |
using | Indeces = std::vector< size_t > |
Public Member Functions | |
CooperativeExperience (const DDNGraph &graph) | |
Basic constructor. More... | |
const Indeces & | record (const State &s, const Action &a, const State &s1, const Rewards &rew) |
This function adds a new event to the recordings. More... | |
void | reset () |
This function resets all experienced rewards and transitions. More... | |
unsigned long | getTimesteps () const |
This function returns the number of times the record function has been called. More... | |
const VisitsTable & | getVisitsTable () const |
This function returns the visits table for inspection. More... | |
const RewardMatrix & | getRewardMatrix () const |
This function returns the rewards matrix for inspection. More... | |
const RewardMatrix & | getM2Matrix () const |
This function returns the rewards squared matrix for inspection. More... | |
const State & | getS () const |
This function returns the number of states of the world. More... | |
const Action & | getA () const |
This function returns the number of available actions to the agent. More... | |
const DDNGraph & | getGraph () const |
This function returns the underlying DDNGraph of the CooperativeExperience. More... | |
This class keeps track of registered events and rewards.
This class is a simple logger of events. It keeps track of both the number of times a particular transition has happened, and the average reward gained in any particular transition. (i.e. the maximum likelihood estimator of a QFunction from the data). It also computes the M2 statistic for the rewards (avg sum of squares minus square avg).
However, it does not record each event separately (i.e. you can't extract the results of a particular transition in the past).
The events are recorded with respect to a given structure, which should match the one of the generative model.
Note that since this class contains data in a DDN format, it's probably only usable by directly inspecting the stored VisitsTable and RewardMatrix. Thus we don't yet provide general getters for state/action pairs.
using AIToolbox::Factored::MDP::CooperativeExperience::Indeces = std::vector<size_t> |
using AIToolbox::Factored::MDP::CooperativeExperience::RewardMatrix = std::vector<Vector> |
using AIToolbox::Factored::MDP::CooperativeExperience::VisitsTable = std::vector<Table2D> |
AIToolbox::Factored::MDP::CooperativeExperience::CooperativeExperience | ( | const DDNGraph & | graph | ) |
Basic constructor.
Note that the structure input does not need to pre-allocate the value matrices, nor to fill their values, since we do that internally. Here we only need the structure of the problem.
graph | The coordination graph of the cooperative problem. |
const Action& AIToolbox::Factored::MDP::CooperativeExperience::getA | ( | ) | const |
This function returns the number of available actions to the agent.
const DDNGraph& AIToolbox::Factored::MDP::CooperativeExperience::getGraph | ( | ) | const |
This function returns the underlying DDNGraph of the CooperativeExperience.
const RewardMatrix& AIToolbox::Factored::MDP::CooperativeExperience::getM2Matrix | ( | ) | const |
This function returns the rewards squared matrix for inspection.
const RewardMatrix& AIToolbox::Factored::MDP::CooperativeExperience::getRewardMatrix | ( | ) | const |
This function returns the rewards matrix for inspection.
const State& AIToolbox::Factored::MDP::CooperativeExperience::getS | ( | ) | const |
This function returns the number of states of the world.
unsigned long AIToolbox::Factored::MDP::CooperativeExperience::getTimesteps | ( | ) | const |
This function returns the number of times the record function has been called.
const VisitsTable& AIToolbox::Factored::MDP::CooperativeExperience::getVisitsTable | ( | ) | const |
This function returns the visits table for inspection.
const Indeces& AIToolbox::Factored::MDP::CooperativeExperience::record | ( | const State & | s, |
const Action & | a, | ||
const State & | s1, | ||
const Rewards & | rew | ||
) |
This function adds a new event to the recordings.
Note that here we expect a vector of rewards, of the same size as the state space.
This function additionally returns a reference to the indeces updated for each element of the underlying DDN. This is useful, for example, when updating the CoordinatedRLModel without needing to recompute these indeces all the time.
s | Old state. |
a | Performed action. |
s1 | New state. |
rew | Obtained rewards. |
void AIToolbox::Factored::MDP::CooperativeExperience::reset | ( | ) |
This function resets all experienced rewards and transitions.