Note

From Ray 2.6.0 onwards, RLlib is adopting a new stack for training and model customization, gradually replacing the ModelV2 API and some convoluted parts of Policy API with the RLModule API. Click here for details.

Policy API#

The Policy class contains functionality to compute actions for decision making in an environment, as well as computing loss(es) and gradients, updating a neural network model as well as postprocessing a collected environment trajectory. One or more Policy objects sit inside a RolloutWorker’s PolicyMap and are - if more than one - are selected based on a multi-agent policy_mapping_fn, which maps agent IDs to a policy ID.

../../_images/policy_classes_overview.svg

RLlib’s Policy class hierarchy: Policies are deep-learning framework specific as they hold functionality to handle a computation graph (e.g. a TensorFlow 1.x graph in a session). You can define custom policy behavior by sub-classing either of the available, built-in classes, depending on your needs.#

Building Custom Policy Classes#

Warning

As of Ray >= 1.9, it is no longer recommended to use the build_policy_class() or build_tf_policy() utility functions for creating custom Policy sub-classes. Instead, follow the simple guidelines here for directly sub-classing from either one of the built-in types: EagerTFPolicyV2 or TorchPolicyV2

In order to create a custom Policy, sub-class Policy (for a generic, framework-agnostic policy), TorchPolicyV2 (for a PyTorch specific policy), or EagerTFPolicyV2 (for a TensorFlow specific policy) and override one or more of their methods. Those are in particular:

  • compute_actions_from_input_dict()

  • postprocess_trajectory()

  • loss()

See here for an example on how to override TorchPolicy.

Base Policy classes#

Making models#

Base Policy#

Torch Policy#

Tensorflow Policy#

Inference#

Base Policy#

Torch Policy#

Tensorflow Policy#

Computing, processing, and applying gradients#

Base Policy#

Torch Policy#

Tensorflow Policy#

Updating the Policy’s model#

Base Policy#

Loss, Logging, optimizers, and trajectory processing#

Base Policy#

Torch Policy#

Tensorflow Policy#

Saving and restoring#

Base Policy#

Connectors#

Base Policy#

Recurrent Policies#

Base Policy#

Miscellaneous#

Base Policy#

Torch Policy#

Tensorflow Policy#