Spaces¶
Space¶

class
rl_coach.spaces.
Space
(shape: Union[int, tuple, list, numpy.ndarray], low: Union[None, int, float, numpy.ndarray] = inf, high: Union[None, int, float, numpy.ndarray] = inf)[source]¶ A space defines a set of valid values
 Parameters
shape – the shape of the space
low – the lowest values possible in the space. can be an array defining the lowest values per point, or a single value defining the general lowest values
high – the highest values possible in the space. can be an array defining the highest values per point, or a single value defining the general highest values

contains
(val: Union[int, float, numpy.ndarray]) → bool[source]¶ Checks if value is contained by this space. The shape must match and all of the values must be within the low and high bounds.
 Parameters
val – a value to check
 Returns
True / False depending on if the val matches the space definition
Observation Spaces¶

class
rl_coach.spaces.
ObservationSpace
(shape: Union[int, numpy.ndarray], low: Union[None, int, float, numpy.ndarray] = inf, high: Union[None, int, float, numpy.ndarray] = inf)[source]¶ 
contains
(val: Union[int, float, numpy.ndarray]) → bool¶ Checks if value is contained by this space. The shape must match and all of the values must be within the low and high bounds.
 Parameters
val – a value to check
 Returns
True / False depending on if the val matches the space definition

is_valid_index
(index: numpy.ndarray) → bool¶ Checks if a given multidimensional index is within the bounds of the shape of the space
 Parameters
index – a multidimensional index
 Returns
True if the index is within the shape of the space. False otherwise

sample
() → numpy.ndarray¶ Sample the defined space, either uniformly, if space bounds are defined, or Normal distributed if no bounds are defined
 Returns
A numpy array sampled from the space

VectorObservationSpace¶

class
rl_coach.spaces.
VectorObservationSpace
(shape: int, low: Union[None, int, float, numpy.ndarray] = inf, high: Union[None, int, float, numpy.ndarray] = inf, measurements_names: List[str] = None)[source]¶ An observation space which is defined as a vector of elements. This can be particularly useful for environments which return measurements, such as in robotic environments.
PlanarMapsObservationSpace¶
Action Spaces¶

class
rl_coach.spaces.
ActionSpace
(shape: Union[int, numpy.ndarray], low: Union[None, int, float, numpy.ndarray] = inf, high: Union[None, int, float, numpy.ndarray] = inf, descriptions: Union[None, List, Dict] = None, default_action: Union[int, float, numpy.ndarray, List] = None)[source]¶ 
clip_action_to_space
(action: Union[int, float, numpy.ndarray, List]) → Union[int, float, numpy.ndarray, List][source]¶ Given an action, clip its values to fit to the action space ranges
 Parameters
action – a given action
 Returns
the clipped action

contains
(val: Union[int, float, numpy.ndarray]) → bool¶ Checks if value is contained by this space. The shape must match and all of the values must be within the low and high bounds.
 Parameters
val – a value to check
 Returns
True / False depending on if the val matches the space definition

is_valid_index
(index: numpy.ndarray) → bool¶ Checks if a given multidimensional index is within the bounds of the shape of the space
 Parameters
index – a multidimensional index
 Returns
True if the index is within the shape of the space. False otherwise

sample
() → numpy.ndarray¶ Sample the defined space, either uniformly, if space bounds are defined, or Normal distributed if no bounds are defined
 Returns
A numpy array sampled from the space

AttentionActionSpace¶

class
rl_coach.spaces.
AttentionActionSpace
(shape: int, low: Union[None, int, float, numpy.ndarray] = inf, high: Union[None, int, float, numpy.ndarray] = inf, descriptions: Union[None, List, Dict] = None, default_action: numpy.ndarray = None, forced_attention_size: Union[None, int, float, numpy.ndarray] = None)[source]¶ A box selection continuous action space, meaning that the actions are defined as selecting a multidimensional box from a given range. The actions will be in the form: [[low_x, low_y, …], [high_x, high_y, …]]
BoxActionSpace¶

class
rl_coach.spaces.
BoxActionSpace
(shape: Union[int, numpy.ndarray], low: Union[None, int, float, numpy.ndarray] = inf, high: Union[None, int, float, numpy.ndarray] = inf, descriptions: Union[None, List, Dict] = None, default_action: numpy.ndarray = None)[source]¶ A multidimensional bounded or unbounded continuous action space
DiscreteActionSpace¶
MultiSelectActionSpace¶

class
rl_coach.spaces.
MultiSelectActionSpace
(size: int, max_simultaneous_selected_actions: int = 1, descriptions: Union[None, List, Dict] = None, default_action: numpy.ndarray = None, allow_no_action_to_be_selected=True)[source]¶ A discrete action space where multiple actions can be selected at once. The actions are encoded as multihot vectors
CompoundActionSpace¶

class
rl_coach.spaces.
CompoundActionSpace
(sub_spaces: List[rl_coach.spaces.ActionSpace])[source]¶ An action space which consists of multiple subaction spaces. For example, in Starcraft the agent should choose an action identifier from ~550 options (Discrete(550)), but it also needs to choose 13 different arguments for the selected action identifier, where each argument is by itself an action space. In Starcraft, the arguments are Discrete action spaces as well, but this is not mandatory.
Goal Spaces¶

class
rl_coach.spaces.
GoalsSpace
(goal_name: str, reward_type: rl_coach.spaces.GoalToRewardConversion, distance_metric: Union[rl_coach.spaces.GoalsSpace.DistanceMetric, Callable])[source]¶ A multidimensional space with a goal type definition. It also behaves as an action space, so that hierarchical agents can use it as an output action space. The class acts as a wrapper to the target space. So after setting the target space, all the values of the class will match the values of the target space (the shape, low, high, etc.)
 Parameters
goal_name – the name of the observation space to use as the achieved goal.
reward_type – the reward type to use for converting distances from goal to rewards
distance_metric – the distance metric to use. could be either one of the distances in the DistanceMetric enum, or a custom function that gets two vectors as input and returns the distance between them

clip_action_to_space
(action: Union[int, float, numpy.ndarray, List]) → Union[int, float, numpy.ndarray, List]¶ Given an action, clip its values to fit to the action space ranges
 Parameters
action – a given action
 Returns
the clipped action

contains
(val: Union[int, float, numpy.ndarray]) → bool¶ Checks if value is contained by this space. The shape must match and all of the values must be within the low and high bounds.
 Parameters
val – a value to check
 Returns
True / False depending on if the val matches the space definition

distance_from_goal
(goal: numpy.ndarray, state: dict) → float[source]¶ Given a state, check its distance from the goal
 Parameters
goal – a numpy array representing the goal
state – a dict representing the state
 Returns
the distance from the goal

get_reward_for_goal_and_state
(goal: numpy.ndarray, state: dict) → Tuple[float, bool][source]¶ Given a state, check if the goal was reached and return a reward accordingly
 Parameters
goal – a numpy array representing the goal
state – a dict representing the state
 Returns
the reward for the current goal and state pair and a boolean representing if the goal was reached

goal_from_state
(state: Dict)[source]¶ Given a state, extract an observation according to the goal_name
 Parameters
state – a dictionary of observations
 Returns
the observation corresponding to the goal_name

is_valid_index
(index: numpy.ndarray) → bool¶ Checks if a given multidimensional index is within the bounds of the shape of the space
 Parameters
index – a multidimensional index
 Returns
True if the index is within the shape of the space. False otherwise

sample
() → numpy.ndarray¶ Sample the defined space, either uniformly, if space bounds are defined, or Normal distributed if no bounds are defined
 Returns
A numpy array sampled from the space

sample_with_info
() → rl_coach.core_types.ActionInfo¶ Get a random action with additional “fake” info
 Returns
An action info instance