This class represents a POMDP Policy. More...

#include <AIToolbox/POMDP/Policies/Policy.hpp>

Inheritance diagram for AIToolbox::POMDP::Policy:

Public Types
using	Base = PolicyInterface< size_t, Belief, size_t >

Public Member Functions
	Policy (size_t s, size_t a, size_t o)
	Basic constrctor. More...

	Policy (size_t s, size_t a, size_t o, const ValueFunction &v)
	Basic constrctor. More...

virtual size_t	sampleAction (const Belief &b) const override
	This function chooses a random action for belief b, following the policy distribution. More...

std::tuple< size_t, size_t >	sampleAction (const Belief &b, unsigned horizon) const
	This function chooses a random action for belief b when horizon steps are missing, following the policy distribution. More...

std::tuple< size_t, size_t >	sampleAction (size_t id, size_t o, unsigned horizon) const
	This function chooses a random action after performing a sampled action and observing observation o, for a particular horizon. More...

virtual double	getActionProbability (const Belief &b, const size_t &a) const override
	This function returns the probability of taking the specified action in the specified belief. More...

double	getActionProbability (const Belief &b, size_t a, unsigned horizon) const
	This function returns the probability of taking the specified action in the specified belief. More...

size_t	getO () const
	This function returns the number of observations possible for the agent. More...

size_t	getH () const
	This function returns the highest horizon available within this Policy. More...

const ValueFunction &	getValueFunction () const
	This function returns the internally stored ValueFunction. More...

Public Member Functions inherited from AIToolbox::PolicyInterface< size_t, Belief, size_t >
	PolicyInterface (size_t s, size_t a)
	Basic constructor. More...

virtual	~PolicyInterface ()
	Basic virtual destructor. More...

virtual size_t	sampleAction (const Belief &s) const=0
	This function chooses a random action for state s, following the policy distribution. More...

virtual double	getActionProbability (const Belief &s, const size_t &a) const=0
	This function returns the probability of taking the specified action in the specified state. More...

const size_t &	getS () const
	This function returns the number of states of the world. More...

const size_t &	getA () const
	This function returns the number of available actions to the agent. More...

Friends
std::istream &	operator>> (std::istream &is, Policy &p)
	This function reads a policy from a file. More...

Additional Inherited Members
Protected Attributes inherited from AIToolbox::PolicyInterface< size_t, Belief, size_t >
size_t	S

size_t	A

RandomEngine	rand_

Detailed Description

This class represents a POMDP Policy.

This class currently represents a basic Policy adaptor for a POMDP::ValueFunction. What this class does is to extract the policy tree contained within a POMDP::ValueFunction. The idea is that, at each horizon, the ValueFunction contains a set of applicable solutions (alpha vectors) for the POMDP. At each Belief point, only one of those vectors applies.

This class finds out at every belief which is the vector that applies, and returns the appropriate action. At the same time, it provides facilities to follow the chosen vector along the tree (since future actions depend on the observations obtained by the agent).

Member Typedef Documentation

◆ Base

using AIToolbox::POMDP::Policy::Base = PolicyInterface<size_t, Belief, size_t>

Constructor & Destructor Documentation

◆ Policy() [1/2]

AIToolbox::POMDP::Policy::Policy	(	size_t	s,
		size_t	a,
		size_t	o
	)

Basic constrctor.

This constructor initializes the internal ValueFunction as having only the horizon 0 no values solution. This is most useful if the Policy needs to be read from a file.

Parameters

s	The number of states of the world.
a	The number of actions available to the agent.
o	The number of possible observations the agent could make.

◆ Policy() [2/2]

AIToolbox::POMDP::Policy::Policy	(	size_t	s,
		size_t	a,
		size_t	o,
		const ValueFunction &	v
	)

Basic constrctor.

This constructor copies the implied policy contained in a ValueFunction. Keep in mind that the policy stored within a ValueFunction is non-stochastic in nature, since for each state it can only save a single action.

Parameters

s	The number of states of the world.
a	The number of actions available to the agent.
o	The number of possible observations the agent could make.
v	The ValueFunction used as a basis for the Policy.

Member Function Documentation

◆ getActionProbability() [1/2]

virtual double AIToolbox::POMDP::Policy::getActionProbability	(	const Belief &	b,
		const size_t &	a
	)		const

overridevirtual

This function returns the probability of taking the specified action in the specified belief.

Parameters

b	The selected belief.
a	The selected action.

Returns: The probability of taking the selected action in the specified belief.

◆ getActionProbability() [2/2]

double AIToolbox::POMDP::Policy::getActionProbability	(	const Belief &	b,
		size_t	a,
		unsigned	horizon
	)		const

This function returns the probability of taking the specified action in the specified belief.

Parameters

b	The selected belief.
a	The selected action.
horizon	The requested horizon, meaning the number of timesteps missing until the end of the "episode".

Returns: The probability of taking the selected action in the specified belief in the specified horizon.

◆ getH()

size_t AIToolbox::POMDP::Policy::getH ( ) const

This function returns the highest horizon available within this Policy.

Note that all functions that accept an horizon as a parameter DO NOT check the bounds of that variable. In addition, note that while for S,A,O getters you get a number that exceeds by 1 the values allowed (since counting starts from 0), here the bound is actually included in the limit, as horizon 0 does not really do anything.

Example: getH() returns 5. This means that 5 is the highest allowed parameter for an horizon in any other Policy method.

Returns: The highest horizon policied.

◆ getO()

size_t AIToolbox::POMDP::Policy::getO ( ) const

This function returns the number of observations possible for the agent.

Returns: The total number of observations.

◆ getValueFunction()

const ValueFunction& AIToolbox::POMDP::Policy::getValueFunction ( ) const

This function returns the internally stored ValueFunction.

Returns: The internally stored ValueFunction.

◆ sampleAction() [1/3]

virtual size_t AIToolbox::POMDP::Policy::sampleAction ( const Belief & b ) const

overridevirtual

This function chooses a random action for belief b, following the policy distribution.

Note that this will sample from the highest horizon that the Policy was computed for.

Parameters

b	The sampled belief of the policy.

Returns: The chosen action.

◆ sampleAction() [2/3]

std::tuple<size_t, size_t> AIToolbox::POMDP::Policy::sampleAction	(	const Belief &	b,
		unsigned	horizon
	)		const

This function chooses a random action for belief b when horizon steps are missing, following the policy distribution.

There are a couple of differences between this sampling function and the simpler version. The first one is that this function is actually able to sample from different timesteps, since this class is able to maintain a full policy tree over time.

The second difference is that it returns two values. The first one is the requested action. The second return value is an id that allows the policy to compute more efficiently the sampled action during the next timestep, if provided to the Policy together with the obtained observation.

Parameters

b	The sampled belief of the policy.
horizon	The requested horizon, meaning the number of timesteps missing until the end of the "episode". horizon 0 will return a valid, non-specified action.

Returns: A tuple containing the chosen action, plus an id useful to sample an action more efficiently at the next timestep, if required.

◆ sampleAction() [3/3]

std::tuple<size_t, size_t> AIToolbox::POMDP::Policy::sampleAction	(	size_t	id,
		size_t	o,
		unsigned	horizon
	)		const

This function chooses a random action after performing a sampled action and observing observation o, for a particular horizon.

This sampling function is provided in case an already sampled action has been performed, an observation registered, and now a new action is needed for the next timestep. Using this function is highly recommended, as no belief update is necessary, and no lookup in a possibly very long list of VEntries required.

Note that this function works if and only if the horizon is going to be 1 (one) less than the value used for the previous sampling, otherwise anything could happen. This does not mean that the calls depend on each other (the function is "pure" in that sense), just that to obtain meaningful values back the horizon should be decreased.

To keep things simple, the id does not store internally the needed horizon value, and you are requested to keep track of it yourself.

An example of usage for this function would be:

horizon = 3;
// First sample
auto result = sampleAction(belief, horizon);
// We do the action, something happens, we get an observation.
size_t observation = performAction(std::get<0>(result));
--horizon;
// We sample again, after reducing the horizon, with the previous id.
result = sampleAction(std::get<1>(result), observation, horizon);

Parameters

id	An id returned from a previous call of sampleAction.
o	The observation obtained after performing a previously sampled action.
horizon	The new horizon, equal to the old sampled horizon - 1.

Returns: A tuple containing the chosen action, plus an id useful to sample an action more efficiently at the next timestep, if required.

Friends And Related Function Documentation

◆ operator>>

std::istream& operator>>	(	std::istream &	is,
		Policy &	p
	)

friend

This function reads a policy from a file.

This function reads files that have been outputted through operator<<(std::ostream&, const Policy&). If not enough values can be extracted from the stream, the function stops and the input policy is not modified. In addition, it checks whether the probability values are within 0 and 1.

Parameters

is	The stream were the policy is being read from.
p	The policy that is being assigned.

Returns: The input stream.

The documentation for this class was generated from the following file:

include/AIToolbox/POMDP/Policies/Policy.hpp

Public Types

Public Member Functions

Friends

Additional Inherited Members

Detailed Description

Member Typedef Documentation

◆ Base

Constructor & Destructor Documentation

◆ Policy() [1/2]

◆ Policy() [2/2]

Member Function Documentation

◆ getActionProbability() [1/2]

◆ getActionProbability() [2/2]

◆ getH()

◆ getO()

◆ getValueFunction()

◆ sampleAction() [1/3]

◆ sampleAction() [2/3]

◆ sampleAction() [3/3]

Friends And Related Function Documentation

◆ operator>>