Policy API
Contents
Note
From Ray 2.6.0 onwards, RLlib is adopting a new stack for training and model customization, gradually replacing the ModelV2 API and some convoluted parts of Policy API with the RLModule API. Click here for details.
Policy API#
The Policy class contains functionality to compute
actions for decision making in an environment, as well as computing loss(es) and gradients,
updating a neural network model as well as postprocessing a collected environment trajectory.
One or more Policy objects sit inside a
RolloutWorker’s PolicyMap and
are - if more than one - are selected based on a multi-agent policy_mapping_fn,
which maps agent IDs to a policy ID.
RLlib’s Policy class hierarchy: Policies are deep-learning framework specific as they hold functionality to handle a computation graph (e.g. a TensorFlow 1.x graph in a session). You can define custom policy behavior by sub-classing either of the available, built-in classes, depending on your needs.#
Building Custom Policy Classes#
Warning
As of Ray >= 1.9, it is no longer recommended to use the build_policy_class() or
build_tf_policy() utility functions for creating custom Policy sub-classes.
Instead, follow the simple guidelines here for directly sub-classing from
either one of the built-in types:
EagerTFPolicyV2
or
TorchPolicyV2
In order to create a custom Policy, sub-class Policy (for a generic,
framework-agnostic policy),
TorchPolicyV2
(for a PyTorch specific policy), or
EagerTFPolicyV2
(for a TensorFlow specific policy) and override one or more of their methods. Those are in particular:
compute_actions_from_input_dict()postprocess_trajectory()loss()