This class represents an MDP Policy. More...

#include <AIToolbox/MDP/Policies/Policy.hpp>

Inheritance diagram for AIToolbox::MDP::Policy:

Public Types
using	PolicyMatrix = Matrix2D

Public Types inherited from AIToolbox::MDP::PolicyWrapper
using	PolicyMatrix = Matrix2D

Public Types inherited from AIToolbox::MDP::PolicyInterface
using	Base = AIToolbox::PolicyInterface< size_t, size_t, size_t >

Public Member Functions
	Policy (size_t s, size_t a)
	Basic constructor. More...

	Policy (const PolicyInterface::Base &p)
	Basic constructor. More...

	Policy (const PolicyInterface &p)
	Basic constructor. More...

	Policy (size_t s, size_t a, const ValueFunction &v)
	Basic constructor. More...

	Policy (const PolicyMatrix &p)
	Basic constructor. More...

Public Member Functions inherited from AIToolbox::MDP::PolicyWrapper
	PolicyWrapper (const PolicyMatrix &p)
	Basic constructor. More...

virtual size_t	sampleAction (const size_t &s) const override
	This function chooses a random action for state s, following the policy distribution. More...

virtual double	getActionProbability (const size_t &s, const size_t &a) const override
	This function returns the probability of taking the specified action in the specified state. More...

const PolicyMatrix &	getPolicyMatrix () const
	This function enables inspection of the internal policy. More...

virtual Matrix2D	getPolicy () const override
	This function returns a matrix containing all probabilities of the policy. More...

Public Member Functions inherited from AIToolbox::PolicyInterface< size_t, size_t, size_t >
	PolicyInterface (size_t s, size_t a)
	Basic constructor. More...

virtual	~PolicyInterface ()
	Basic virtual destructor. More...

const size_t &	getS () const
	This function returns the number of states of the world. More...

const size_t &	getA () const
	This function returns the number of available actions to the agent. More...

Friends
std::istream &	operator>> (std::istream &is, Policy &p)
	This function reads a policy from a file. More...

Additional Inherited Members
Protected Attributes inherited from AIToolbox::PolicyInterface< size_t, size_t, size_t >
size_t	S

size_t	A

RandomEngine	rand_

Detailed Description

This class represents an MDP Policy.

This class is one of the many ways to represent an MDP Policy. In particular, it maintains a 2 dimensional matrix of probabilities determining the probability of choosing an action in a given state.

The class offers facilities to sample from these distributions, so that you can directly embed it into a decision-making process.

Building this object is somewhat expensive, so it should be done mostly when it is known that the final solution won't change again.

Note that this class is meant to be read-only, after being constructed. If you are looking to manually modify the policy matrix you should save it on the side and use the PolicyWrapper class.

Member Typedef Documentation

◆ PolicyMatrix

using AIToolbox::MDP::Policy::PolicyMatrix = Matrix2D

Constructor & Destructor Documentation

◆ Policy() [1/5]

AIToolbox::MDP::Policy::Policy	(	size_t	s,
		size_t	a
	)

Basic constructor.

This constructor initializes the internal policy matrix so that each action in each state has the same probability of being chosen (random policy). This class guarantees that at any point the internal policy is a true probability distribution, i.e. for each state the sum of the probabilities of choosing an action sums up to 1.

Parameters

s	The number of states of the world.
a	The number of actions available to the agent.

◆ Policy() [2/5]

AIToolbox::MDP::Policy::Policy ( const PolicyInterface::Base & p )

Basic constructor.

This constructor simply copies policy probability values from any other compatible PolicyInterface, and stores them internally. This is probably the main way you may want to use this class.

This may be a useful thing to do in case the policy that is being copied is very costly to use (for example, QGreedyPolicy) and it is known that it will not change anymore.

Parameters

p	The policy which is being copied.

◆ Policy() [3/5]

AIToolbox::MDP::Policy::Policy ( const PolicyInterface & p )

Basic constructor.

This constructor simply copies policy probability values from any other compatible PolicyInterface, and stores them internally. This is probably the main way you may want to use this class.

This may be a useful thing to do in case the policy that is being copied is very costly to use (for example, QGreedyPolicy) and it is known that it will not change anymore.

This is an optimized method using the getPolicy() function of the input.

Parameters

p	The policy which is being copied.

◆ Policy() [4/5]

AIToolbox::MDP::Policy::Policy	(	size_t	s,
		size_t	a,
		const ValueFunction &	v
	)

Basic constructor.

This constructor copies the implied policy contained in a ValueFunction. Keep in mind that the policy stored within a ValueFunction is non-stochastic in nature, since for each state it can only save a single action.

Parameters

s	The number of states of the world.
a	The number of actions available to the agent.
v	The ValueFunction used as a basis for the Policy.

◆ Policy() [5/5]

AIToolbox::MDP::Policy::Policy ( const PolicyMatrix & p )

Basic constructor.

This constructor copies the input matrix inside the Policy.

This constructor checks whether the input is a valid set of probabilities. If not, it will throw an std::invalid_argument exception.

Parameters

p	The policy matrix to copy.

Friends And Related Function Documentation

◆ operator>>

std::istream& operator>>	(	std::istream &	is,
		Policy &	p
	)

friend

This function reads a policy from a file.

This function reads files that have been outputted through operator<<(). If not enough values can be extracted from the stream, the function stops and the input policy is not modified. In addition, it checks whether the probability values are within 0 and 1.

State and actions are also verified, and this function does not accept a randomly shuffled policy file. The file must be sorted by state, and each state must be sorted by action.

As a layer of additional precaution, the function normalizes the policy once it has been read, to assure true probability distribution on the internal policy.

Parameters

is	The stream were the policy is being read from.
p	The policy that is being assigned.

Returns: The input stream.

The documentation for this class was generated from the following file:

include/AIToolbox/MDP/Policies/Policy.hpp

Public Types

Public Member Functions

Friends

Additional Inherited Members

Detailed Description

Member Typedef Documentation

◆ PolicyMatrix

Constructor & Destructor Documentation

◆ Policy() [1/5]

◆ Policy() [2/5]

◆ Policy() [3/5]

◆ Policy() [4/5]

◆ Policy() [5/5]

Friends And Related Function Documentation

◆ operator>>