AIToolbox
A library that offers tools for AI problem solving.
AIToolbox::Factored::Bandit::MiningBandit Class Reference

This class represents the mining bandit problem. More...

#include <AIToolbox/Factored/Bandit/Environments/MiningProblem.hpp>

Public Member Functions

 MiningBandit (Action A, std::vector< unsigned > workersPerVillage, std::vector< double > productivityPerMine, bool normalizeToOne=true)
 Basic constructor. More...
 
const RewardssampleR (const Action &a) const
 This function samples the rewards for each mine from a set of Bernoulli distributions. More...
 
double getRegret (const Action &a) const
 This function computes the deterministic regret of the input joint action. More...
 
const ActiongetOptimalAction () const
 This function returns the optimal action for this bandit. More...
 
const ActiongetA () const
 This function returns the joint action space. More...
 
const std::vector< PartialKeys > & getGroups () const
 This function returns, for each mine, which villages are connected to it. More...
 
std::vector< QFunctionRulegetDeterministicRules () const
 This function returns a set of QFunctionRule for the bandit, ignoring stochasticity. More...
 
double getNormalizationConstant () const
 This function returns the normalization constant used. More...
 

Detailed Description

This class represents the mining bandit problem.

This problem was introduced in the paper

"Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems"

by Bargiacchi et al.

There are a set of villages and mines. Each village has a number of mine workers. At each timestep, the village sends all its mine workers to a single mine. Each timestep, each mine produces an amount of minerals proportional to its hidden productivity and the number of workers sent to it.

For each index 'i', each village 'i' is always connected to the mines with indeces from 'i' onwards. The last village is always connected to 4 mines.

Thus, the action for a given village is a number from 0 to N; where 0 corresponds to sending all the workers to the 'i' mine . Action N is instead sending all the workers to the mine number 'i' + N.

The mineral amounts produced by each mine are computed with this formula:

  • 0 if no workers are sent to it
  • (productivity * 1.03^workers) otherwise.

Since these amounts are deterministic for each joint action, discovering the optimal action would be too easy. To generate a proper bandit, we convert these amounts into stochastic rewards through Bernoulli distributions.

First, we optionally normalize the outputs of each mine that the maximum joint mineral amount that can be produced is 1. This is useful sometimes as it results in pretty values for regrets and rewards (since the expected optimal action then has reward exactly 1).

In any case, each mine will be associated with a number between 0 and

  1. We use this number as the parameter of a Bernoulli distribution, which is sampled to generate the mine's actual reward (either 0 or 1).

Note that this means that it can happen that an action randomly produces a higher reward than 1 (since multiple Bernoullis are sampled). However, on average the optimal action will have an expected reward of 1.

Constructor & Destructor Documentation

◆ MiningBandit()

AIToolbox::Factored::Bandit::MiningBandit::MiningBandit ( Action  A,
std::vector< unsigned >  workersPerVillage,
std::vector< double >  productivityPerMine,
bool  normalizeToOne = true 
)

Basic constructor.

Parameters
AThe action space. There is one action per village, which represents to which mine to send the workers.
workersPerVillageHow many workers there are in each village.
productivityPerMineThe productivity factor for each mine (between 0 and 1).
normalizeToOneWhether to normalize rewards so that the optimal action has expected reward 1.

Member Function Documentation

◆ getA()

const Action& AIToolbox::Factored::Bandit::MiningBandit::getA ( ) const

This function returns the joint action space.

◆ getDeterministicRules()

std::vector<QFunctionRule> AIToolbox::Factored::Bandit::MiningBandit::getDeterministicRules ( ) const

This function returns a set of QFunctionRule for the bandit, ignoring stochasticity.

This function is provided for testing maximization algorithms, to automatically generate rules for a given set of parameters.

The rules contain the true underlying rewards of the problem, ignoring the sampling stochasticity that is present in the sampleR() function.

In other words, finding the joint action that maximizes these rules is equivalent to finding the optimal action of the bandit.

◆ getGroups()

const std::vector<PartialKeys>& AIToolbox::Factored::Bandit::MiningBandit::getGroups ( ) const

This function returns, for each mine, which villages are connected to it.

This function returns, for each local reward function (a mine), all groups of agents connected to it (villages).

◆ getNormalizationConstant()

double AIToolbox::Factored::Bandit::MiningBandit::getNormalizationConstant ( ) const

This function returns the normalization constant used.

This class ensures that the optimal action has an expected reward of 1. To do this, we normalize each local reward obtained by the unnormalized expected maximum possible reward.

This function returns that value.

Returns
The normalization constant used so that the optimal action has expected reward of 1.

◆ getOptimalAction()

const Action& AIToolbox::Factored::Bandit::MiningBandit::getOptimalAction ( ) const

This function returns the optimal action for this bandit.

◆ getRegret()

double AIToolbox::Factored::Bandit::MiningBandit::getRegret ( const Action a) const

This function computes the deterministic regret of the input joint action.

This function bypassed the Bernoulli distributions and directly computes the true regret of any given joint action.

Parameters
aThe joint action of all villages.
Returns
The joint regret of the input action.

◆ sampleR()

const Rewards& AIToolbox::Factored::Bandit::MiningBandit::sampleR ( const Action a) const

This function samples the rewards for each mine from a set of Bernoulli distributions.

Parameters
aThe joint action of all villages.
Returns
The rewards generated by the mines.

The documentation for this class was generated from the following file: