AIToolbox
A library that offers tools for AI problem solving.
AIToolbox::MDP::Policy Class Reference

This class represents an MDP Policy. More...

#include <AIToolbox/MDP/Policies/Policy.hpp>

Inheritance diagram for AIToolbox::MDP::Policy:
AIToolbox::MDP::PolicyWrapper AIToolbox::MDP::PolicyInterface AIToolbox::PolicyInterface< size_t, size_t, size_t >

Public Types

using PolicyMatrix = Matrix2D
 
- Public Types inherited from AIToolbox::MDP::PolicyWrapper
using PolicyMatrix = Matrix2D
 
- Public Types inherited from AIToolbox::MDP::PolicyInterface
using Base = AIToolbox::PolicyInterface< size_t, size_t, size_t >
 

Public Member Functions

 Policy (size_t s, size_t a)
 Basic constructor. More...
 
 Policy (const PolicyInterface::Base &p)
 Basic constructor. More...
 
 Policy (const PolicyInterface &p)
 Basic constructor. More...
 
 Policy (size_t s, size_t a, const ValueFunction &v)
 Basic constructor. More...
 
 Policy (const PolicyMatrix &p)
 Basic constructor. More...
 
- Public Member Functions inherited from AIToolbox::MDP::PolicyWrapper
 PolicyWrapper (const PolicyMatrix &p)
 Basic constructor. More...
 
virtual size_t sampleAction (const size_t &s) const override
 This function chooses a random action for state s, following the policy distribution. More...
 
virtual double getActionProbability (const size_t &s, const size_t &a) const override
 This function returns the probability of taking the specified action in the specified state. More...
 
const PolicyMatrixgetPolicyMatrix () const
 This function enables inspection of the internal policy. More...
 
virtual Matrix2D getPolicy () const override
 This function returns a matrix containing all probabilities of the policy. More...
 
- Public Member Functions inherited from AIToolbox::PolicyInterface< size_t, size_t, size_t >
 PolicyInterface (size_t s, size_t a)
 Basic constructor. More...
 
virtual ~PolicyInterface ()
 Basic virtual destructor. More...
 
const size_t & getS () const
 This function returns the number of states of the world. More...
 
const size_t & getA () const
 This function returns the number of available actions to the agent. More...
 

Friends

std::istream & operator>> (std::istream &is, Policy &p)
 This function reads a policy from a file. More...
 

Additional Inherited Members

- Protected Attributes inherited from AIToolbox::PolicyInterface< size_t, size_t, size_t >
size_t S
 
size_t A
 
RandomEngine rand_
 

Detailed Description

This class represents an MDP Policy.

This class is one of the many ways to represent an MDP Policy. In particular, it maintains a 2 dimensional matrix of probabilities determining the probability of choosing an action in a given state.

The class offers facilities to sample from these distributions, so that you can directly embed it into a decision-making process.

Building this object is somewhat expensive, so it should be done mostly when it is known that the final solution won't change again.

Note that this class is meant to be read-only, after being constructed. If you are looking to manually modify the policy matrix you should save it on the side and use the PolicyWrapper class.

Member Typedef Documentation

◆ PolicyMatrix

Constructor & Destructor Documentation

◆ Policy() [1/5]

AIToolbox::MDP::Policy::Policy ( size_t  s,
size_t  a 
)

Basic constructor.

This constructor initializes the internal policy matrix so that each action in each state has the same probability of being chosen (random policy). This class guarantees that at any point the internal policy is a true probability distribution, i.e. for each state the sum of the probabilities of choosing an action sums up to 1.

Parameters
sThe number of states of the world.
aThe number of actions available to the agent.

◆ Policy() [2/5]

AIToolbox::MDP::Policy::Policy ( const PolicyInterface::Base p)

Basic constructor.

This constructor simply copies policy probability values from any other compatible PolicyInterface, and stores them internally. This is probably the main way you may want to use this class.

This may be a useful thing to do in case the policy that is being copied is very costly to use (for example, QGreedyPolicy) and it is known that it will not change anymore.

Parameters
pThe policy which is being copied.

◆ Policy() [3/5]

AIToolbox::MDP::Policy::Policy ( const PolicyInterface p)

Basic constructor.

This constructor simply copies policy probability values from any other compatible PolicyInterface, and stores them internally. This is probably the main way you may want to use this class.

This may be a useful thing to do in case the policy that is being copied is very costly to use (for example, QGreedyPolicy) and it is known that it will not change anymore.

This is an optimized method using the getPolicy() function of the input.

Parameters
pThe policy which is being copied.

◆ Policy() [4/5]

AIToolbox::MDP::Policy::Policy ( size_t  s,
size_t  a,
const ValueFunction v 
)

Basic constructor.

This constructor copies the implied policy contained in a ValueFunction. Keep in mind that the policy stored within a ValueFunction is non-stochastic in nature, since for each state it can only save a single action.

Parameters
sThe number of states of the world.
aThe number of actions available to the agent.
vThe ValueFunction used as a basis for the Policy.

◆ Policy() [5/5]

AIToolbox::MDP::Policy::Policy ( const PolicyMatrix p)

Basic constructor.

This constructor copies the input matrix inside the Policy.

This constructor checks whether the input is a valid set of probabilities. If not, it will throw an std::invalid_argument exception.

Parameters
pThe policy matrix to copy.

Friends And Related Function Documentation

◆ operator>>

std::istream& operator>> ( std::istream &  is,
Policy p 
)
friend

This function reads a policy from a file.

This function reads files that have been outputted through operator<<(). If not enough values can be extracted from the stream, the function stops and the input policy is not modified. In addition, it checks whether the probability values are within 0 and 1.

State and actions are also verified, and this function does not accept a randomly shuffled policy file. The file must be sorted by state, and each state must be sorted by action.

As a layer of additional precaution, the function normalizes the policy once it has been read, to assure true probability distribution on the internal policy.

Parameters
isThe stream were the policy is being read from.
pThe policy that is being assigned.
Returns
The input stream.

The documentation for this class was generated from the following file: