This class represents the Policy Iteration algorithm. More...

#include <AIToolbox/MDP/Algorithms/PolicyIteration.hpp>

Public Member Functions
	PolicyIteration (unsigned horizon, double tolerance=0.001)
	Basic constructor. More...

template<IsModel M>
QFunction	operator() (const M &m)
	This function applies policy iteration on an MDP to solve it. More...

void	setTolerance (double t)
	This function sets the tolerance parameter. More...

void	setHorizon (unsigned h)
	This function sets the horizon parameter. More...

double	getTolerance () const
	This function returns the currently set tolerance parameter. More...

unsigned	getHorizon () const
	This function returns the currently set horizon parameter. More...

Detailed Description

This class represents the Policy Iteration algorithm.

This algorithm begins with an arbitrary policy (random), and uses the PolicyEvaluation algorithm to find out the Values for each state of this policy.

Once this is done, the policy can be improved by using a greedy approach towards the QFunction found. The new policy is then newly evaluated, and the process repeated.

When the policy does not change anymore, it is guaranteed to be optimal, and the found QFunction is returned.

Constructor & Destructor Documentation

◆ PolicyIteration()

AIToolbox::MDP::PolicyIteration::PolicyIteration	(	unsigned	horizon,
		double	tolerance = `0.001`
	)

Basic constructor.

Parameters

horizon	The horizon parameter to use during the PolicyEvaluation phase.
tolerance	The tolerance parameter to use during the PolicyEvaluation phase.

Member Function Documentation

◆ getHorizon()

unsigned AIToolbox::MDP::PolicyIteration::getHorizon ( ) const

This function returns the currently set horizon parameter.

◆ getTolerance()

double AIToolbox::MDP::PolicyIteration::getTolerance ( ) const

This function returns the currently set tolerance parameter.

◆ operator()()

template<IsModel M>

QFunction AIToolbox::MDP::PolicyIteration::operator() ( const M & m )

This function applies policy iteration on an MDP to solve it.

The algorithm is constrained by the currently set parameters.

Parameters

m	The MDP that needs to be solved.

Returns: The QFunction of the optimal policy found.

◆ setHorizon()

void AIToolbox::MDP::PolicyIteration::setHorizon ( unsigned h )

This function sets the horizon parameter.

◆ setTolerance()

void AIToolbox::MDP::PolicyIteration::setTolerance ( double t )

This function sets the tolerance parameter.

The tolerance parameter must be >= 0 or the function will throw.

The documentation for this class was generated from the following file:

include/AIToolbox/MDP/Algorithms/PolicyIteration.hpp

Public Member Functions