This class represents the mining bandit problem. More...

#include <AIToolbox/Factored/Bandit/Environments/MiningProblem.hpp>

Public Member Functions
	MiningBandit (Action A, std::vector< unsigned > workersPerVillage, std::vector< double > productivityPerMine, bool normalizeToOne=true)
	Basic constructor. More...

const Rewards &	sampleR (const Action &a) const
	This function samples the rewards for each mine from a set of Bernoulli distributions. More...

double	getRegret (const Action &a) const
	This function computes the deterministic regret of the input joint action. More...

const Action &	getOptimalAction () const
	This function returns the optimal action for this bandit. More...

const Action &	getA () const
	This function returns the joint action space. More...

const std::vector< PartialKeys > &	getGroups () const
	This function returns, for each mine, which villages are connected to it. More...

std::vector< QFunctionRule >	getDeterministicRules () const
	This function returns a set of QFunctionRule for the bandit, ignoring stochasticity. More...

double	getNormalizationConstant () const
	This function returns the normalization constant used. More...

Detailed Description

This class represents the mining bandit problem.

This problem was introduced in the paper

"Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems"

by Bargiacchi et al.

There are a set of villages and mines. Each village has a number of mine workers. At each timestep, the village sends all its mine workers to a single mine. Each timestep, each mine produces an amount of minerals proportional to its hidden productivity and the number of workers sent to it.

For each index 'i', each village 'i' is always connected to the mines with indeces from 'i' onwards. The last village is always connected to 4 mines.

Thus, the action for a given village is a number from 0 to N; where 0 corresponds to sending all the workers to the 'i' mine . Action N is instead sending all the workers to the mine number 'i' + N.

The mineral amounts produced by each mine are computed with this formula:

0 if no workers are sent to it
(productivity * 1.03^workers) otherwise.

Since these amounts are deterministic for each joint action, discovering the optimal action would be too easy. To generate a proper bandit, we convert these amounts into stochastic rewards through Bernoulli distributions.

First, we optionally normalize the outputs of each mine that the maximum joint mineral amount that can be produced is 1. This is useful sometimes as it results in pretty values for regrets and rewards (since the expected optimal action then has reward exactly 1).

In any case, each mine will be associated with a number between 0 and

We use this number as the parameter of a Bernoulli distribution, which is sampled to generate the mine's actual reward (either 0 or 1).

Note that this means that it can happen that an action randomly produces a higher reward than 1 (since multiple Bernoullis are sampled). However, on average the optimal action will have an expected reward of 1.

Constructor & Destructor Documentation

◆ MiningBandit()

AIToolbox::Factored::Bandit::MiningBandit::MiningBandit	(	Action	A,
		std::vector< unsigned >	workersPerVillage,
		std::vector< double >	productivityPerMine,
		bool	normalizeToOne = `true`
	)

Basic constructor.

Parameters

A	The action space. There is one action per village, which represents to which mine to send the workers.
workersPerVillage	How many workers there are in each village.
productivityPerMine	The productivity factor for each mine (between 0 and 1).
normalizeToOne	Whether to normalize rewards so that the optimal action has expected reward 1.

Member Function Documentation

◆ getA()

const Action& AIToolbox::Factored::Bandit::MiningBandit::getA ( ) const

This function returns the joint action space.

◆ getDeterministicRules()

std::vector<QFunctionRule> AIToolbox::Factored::Bandit::MiningBandit::getDeterministicRules ( ) const

This function returns a set of QFunctionRule for the bandit, ignoring stochasticity.

This function is provided for testing maximization algorithms, to automatically generate rules for a given set of parameters.

The rules contain the true underlying rewards of the problem, ignoring the sampling stochasticity that is present in the sampleR() function.

In other words, finding the joint action that maximizes these rules is equivalent to finding the optimal action of the bandit.

◆ getGroups()

const std::vector<PartialKeys>& AIToolbox::Factored::Bandit::MiningBandit::getGroups ( ) const

This function returns, for each mine, which villages are connected to it.

This function returns, for each local reward function (a mine), all groups of agents connected to it (villages).

◆ getNormalizationConstant()

double AIToolbox::Factored::Bandit::MiningBandit::getNormalizationConstant ( ) const

This function returns the normalization constant used.

This class ensures that the optimal action has an expected reward of 1. To do this, we normalize each local reward obtained by the unnormalized expected maximum possible reward.

This function returns that value.

Returns: The normalization constant used so that the optimal action has expected reward of 1.

◆ getOptimalAction()

const Action& AIToolbox::Factored::Bandit::MiningBandit::getOptimalAction ( ) const

This function returns the optimal action for this bandit.

◆ getRegret()

double AIToolbox::Factored::Bandit::MiningBandit::getRegret ( const Action & a ) const

This function computes the deterministic regret of the input joint action.

This function bypassed the Bernoulli distributions and directly computes the true regret of any given joint action.

Parameters

a	The joint action of all villages.

Returns: The joint regret of the input action.

◆ sampleR()

const Rewards& AIToolbox::Factored::Bandit::MiningBandit::sampleR ( const Action & a ) const

This function samples the rewards for each mine from a set of Bernoulli distributions.

Parameters

a	The joint action of all villages.

Returns: The rewards generated by the mines.

The documentation for this class was generated from the following file:

include/AIToolbox/Factored/Bandit/Environments/MiningProblem.hpp

Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ MiningBandit()

Member Function Documentation

◆ getA()

◆ getDeterministicRules()

◆ getGroups()

◆ getNormalizationConstant()

◆ getOptimalAction()

◆ getRegret()

◆ sampleR()