.. _train-horovod: Get Started with Horovod ======================== Ray Train configures the Horovod environment and Rendezvous server for you, allowing you to run your ``DistributedOptimizer`` training script. See the `Horovod documentation `_ for more information. Quickstart ----------- .. literalinclude:: ./doc_code/hvd_trainer.py :language: python Update your training function ----------------------------- First, update your :ref:`training function ` to support distributed training. If you have a training function that already runs with the `Horovod Ray Executor `_, you shouldn't need to make any additional changes. To onboard onto Horovod, visit the `Horovod guide `_. Create a HorovodTrainer ----------------------- ``Trainer``\s are the primary Ray Train classes to use to manage state and execute training. For Horovod, use a :class:`~ray.train.horovod.HorovodTrainer` that you can setup like this: .. code-block:: python from ray.train import ScalingConfig from ray.train.horovod import HorovodTrainer # For GPU Training, set `use_gpu` to True. use_gpu = False trainer = HorovodTrainer( train_func, scaling_config=ScalingConfig(use_gpu=use_gpu, num_workers=2) ) When training with Horovod, always use a HorovodTrainer, irrespective of the training framework, for example, PyTorch or TensorFlow. To customize the backend setup, you can pass a :class:`~ray.train.horovod.HorovodConfig`: .. code-block:: python from ray.train import ScalingConfig from ray.train.horovod import HorovodTrainer, HorovodConfig trainer = HorovodTrainer( train_func, tensorflow_backend=HorovodConfig(...), scaling_config=ScalingConfig(num_workers=2), ) For more configurability, see the :py:class:`~ray.train.data_parallel_trainer.DataParallelTrainer` API. Run a training function ----------------------- With a distributed training function and a Ray Train ``Trainer``, you are now ready to start training. .. code-block:: python trainer.fit() Further reading --------------- Ray Train's :class:`~ray.train.horovod.HorovodTrainer` replaces the distributed communication backend of the native libraries with its own implementation. Thus, the remaining integration points remain the same. If you're using Horovod with :ref:`PyTorch ` or :ref:`Tensorflow `, refer to the respective guides for further configuration and information. If you are implementing your own Horovod-based training routine without using any of the training libraries, read through the :ref:`User Guides `, as you can apply much of the content to generic use cases and adapt them easily.