This class represents the MCTS online planner using UCB1. More...

#include <AIToolbox/MDP/Algorithms/MCTS.hpp>

Classes
struct	ActionNode

struct	StateNode

Public Types
using	StateNodes = std::unordered_map< size_t, StateNode >

using	ActionNodes = std::vector< ActionNode >

Public Member Functions
	MCTS (const M &m, unsigned iterations, double exp)
	Basic constructor. More...

size_t	sampleAction (const State &s, unsigned horizon)
	This function resets the internal graph and samples for the provided state and horizon. More...

size_t	sampleAction (size_t a, const State &s1, unsigned horizon)
	This function uses the internal graph to plan. More...

void	setIterations (unsigned iter)
	This function sets the number of performed rollouts in MCTS. More...

void	setExploration (double exp)
	This function sets the new exploration constant for MCTS. More...

const M &	getModel () const
	This function returns the MDP generative model being used. More...

const StateNode &	getGraph () const
	This function returns a reference to the internal graph structure holding the results of rollouts. More...

unsigned	getIterations () const
	This function returns the number of iterations performed to plan for an action. More...

double	getExploration () const
	This function returns the currently set exploration constant. More...

template<typename Iterator >
requires AIToolbox::IsGenerativeModel< M > &&HasIntegralActionSpace< M > Iterator	findBestA (Iterator begin, Iterator end)

template<typename Iterator >
requires AIToolbox::IsGenerativeModel< M > &&HasIntegralActionSpace< M > Iterator	findBestBonusA (Iterator begin, Iterator end, const unsigned count)

Detailed Description

template<typename M, template< typename > class StateHash = std::hash>
class AIToolbox::MDP::MCTS< M, StateHash >

This class represents the MCTS online planner using UCB1.

This algorithm is an online planner for MDPs. As an online planner, it needs to have a generative model of the problem. This means that it only needs a way to sample transitions and rewards from the model, but it does not need to know directly the distribution probabilities for them.

MCTS plans for a single state at a time. It builds a tree structure progressively and action values are deduced as averages of the obtained rewards over rollouts. If the number of sample episodes is high enough, it is guaranteed to converge to the optimal solution.

At each rollout, we follow each action and resulting state within the tree from root to leaves. During this path we chose actions using an algorithm called UCT. What this does is privilege the most promising actions, while guaranteeing that in the limit every action will still be tried an infinite amount of times.

Once we arrive to a leaf in the tree, we then expand it with a single new node, representing a new state for the path we just followed. We then proceed outside the tree following a random policy, but this time we do not track which actions and states we actually experience. The final reward obtained by this random rollout policy is used to approximate the values for all nodes visited in this rollout inside the tree, before leaving it.

Since MCTS expands a tree, it can reuse work it has done if multiple action requests are done in order. To do so, it simply asks for the action that has been performed and its respective new state. Then it simply makes that root branch the new root, and starts again.

Member Typedef Documentation

◆ ActionNodes

template<typename M , template< typename > class StateHash = std::hash>

using AIToolbox::MDP::MCTS< M, StateHash >::ActionNodes = std::vector<ActionNode>

◆ StateNodes

template<typename M , template< typename > class StateHash = std::hash>

using AIToolbox::MDP::MCTS< M, StateHash >::StateNodes = std::unordered_map<size_t, StateNode>

Constructor & Destructor Documentation

◆ MCTS()

template<typename M , template< typename > class StateHash>

requires AIToolbox::IsGenerativeModel< M > &&HasIntegralActionSpace< M > AIToolbox::MDP::MCTS< M, StateHash >::MCTS	(	const M &	m,
		unsigned	iterations,
		double	exp
	)

Basic constructor.

Parameters

m	The MDP model that MCTS will operate upon.
iterations	The number of episodes to run before completion.
exp	The exploration constant. This parameter is VERY important to determine the final MCTS performance.

Member Function Documentation

◆ findBestA()

template<typename M , template< typename > class StateHash = std::hash>

template<typename Iterator >

requires AIToolbox::IsGenerativeModel<M>&& HasIntegralActionSpace<M> Iterator AIToolbox::MDP::MCTS< M, StateHash >::findBestA	(	Iterator	begin,
		Iterator	end
	)

◆ findBestBonusA()

template<typename M , template< typename > class StateHash = std::hash>

template<typename Iterator >

requires AIToolbox::IsGenerativeModel<M>&& HasIntegralActionSpace<M> Iterator AIToolbox::MDP::MCTS< M, StateHash >::findBestBonusA	(	Iterator	begin,
		Iterator	end,
		const unsigned	count
	)

◆ getExploration()

template<typename M , template< typename > class StateHash>

requires AIToolbox::IsGenerativeModel< M > &&HasIntegralActionSpace< M > double AIToolbox::MDP::MCTS< M, StateHash >::getExploration

This function returns the currently set exploration constant.

Returns: The exploration constant.

◆ getGraph()

template<typename M , template< typename > class StateHash>

requires AIToolbox::IsGenerativeModel< M > &&const HasIntegralActionSpace< M > MCTS< M, StateHash >::StateNode & AIToolbox::MDP::MCTS< M, StateHash >::getGraph

This function returns a reference to the internal graph structure holding the results of rollouts.

Returns: The internal graph.

◆ getIterations()

template<typename M , template< typename > class StateHash>

requires AIToolbox::IsGenerativeModel< M > &&HasIntegralActionSpace< M > unsigned AIToolbox::MDP::MCTS< M, StateHash >::getIterations

This function returns the number of iterations performed to plan for an action.

Returns: The number of iterations.

◆ getModel()

template<typename M , template< typename > class StateHash>

requires AIToolbox::IsGenerativeModel< M > &&const HasIntegralActionSpace< M > M & AIToolbox::MDP::MCTS< M, StateHash >::getModel

This function returns the MDP generative model being used.

Returns: The MDP generative model.

◆ sampleAction() [1/2]

template<typename M , template< typename > class StateHash>

requires AIToolbox::IsGenerativeModel< M > &&HasIntegralActionSpace< M > size_t AIToolbox::MDP::MCTS< M, StateHash >::sampleAction	(	const State &	s,
		unsigned	horizon
	)

This function resets the internal graph and samples for the provided state and horizon.

Parameters

s	The initial state for the environment.
horizon	The horizon to plan for.

Returns: The best action.

◆ sampleAction() [2/2]

template<typename M , template< typename > class StateHash>

requires AIToolbox::IsGenerativeModel< M > &&HasIntegralActionSpace< M > size_t AIToolbox::MDP::MCTS< M, StateHash >::sampleAction	(	size_t	a,
		const State &	s1,
		unsigned	horizon
	)

This function uses the internal graph to plan.

This function can be called after a previous call to sampleAction with a state. Otherwise, it will invoke it anyway with the provided next state.

If a graph is already present though, this function will select the branch defined by the input action and observation, and prune the rest. The search will be started using the existing graph: this should make search faster.

Parameters

a	The action taken in the last timestep.
s1	The state experienced after the action was taken.
horizon	The horizon to plan for.

Returns: The best action.

◆ setExploration()

template<typename M , template< typename > class StateHash>

requires AIToolbox::IsGenerativeModel< M > &&HasIntegralActionSpace< M > void AIToolbox::MDP::MCTS< M, StateHash >::setExploration ( double exp )

This function sets the new exploration constant for MCTS.

This parameter is EXTREMELY important to determine MCTS performance and, ultimately, convergence. In general it is better to find it empirically, by testing some values and see which one performs best. Tune this parameter, it really matters!

Parameters

exp	The new exploration constant.

◆ setIterations()

template<typename M , template< typename > class StateHash>

requires AIToolbox::IsGenerativeModel< M > &&HasIntegralActionSpace< M > void AIToolbox::MDP::MCTS< M, StateHash >::setIterations ( unsigned iter )

This function sets the number of performed rollouts in MCTS.

Parameters

iter	The new number of rollouts.

The documentation for this class was generated from the following file:

include/AIToolbox/MDP/Algorithms/MCTS.hpp

Classes

Public Types

Public Member Functions

Detailed Description

template<typename M, template< typename > class StateHash = std::hash> class AIToolbox::MDP::MCTS< M, StateHash >

Member Typedef Documentation

◆ ActionNodes

◆ StateNodes

Constructor & Destructor Documentation

◆ MCTS()

Member Function Documentation

◆ findBestA()

◆ findBestBonusA()

◆ getExploration()

◆ getGraph()

◆ getIterations()

◆ getModel()

◆ sampleAction() [1/2]

◆ sampleAction() [2/2]

◆ setExploration()

◆ setIterations()

template<typename M, template< typename > class StateHash = std::hash>
class AIToolbox::MDP::MCTS< M, StateHash >