.. _train-deepspeed: DeepSpeed 入门 ========================== The :class:`~ray.train.torch.TorchTrainer` can help you easily launch your `DeepSpeed `_ training across a distributed Ray cluster. Code example ------------ You only need to run your existing training code with a TorchTrainer. You can expect the final code to look like this: .. code-block:: python import deepspeed from deepspeed.accelerator import get_accelerator def train_func(config): # Instantiate your model and dataset model = ... train_dataset = ... eval_dataset = ... deepspeed_config = {...} # Your Deepspeed config # Prepare everything for distributed training model, optimizer, train_dataloader, lr_scheduler = deepspeed.initialize( model=model, model_parameters=model.parameters(), training_data=tokenized_datasets["train"], collate_fn=collate_fn, config=deepspeed_config, ) # Define the GPU device for the current worker device = get_accelerator().device_name(model.local_rank) # Start training ... from ray.train.torch import TorchTrainer from ray.train import ScalingConfig trainer = TorchTrainer( train_func, scaling_config=ScalingConfig(...), ... ) trainer.fit() Below is a simple example of ZeRO-3 training with DeepSpeed only. .. tabs:: .. group-tab:: Example with Ray Data .. dropdown:: Show Code .. literalinclude:: /../../python/ray/train/examples/deepspeed/deepspeed_torch_trainer.py :language: python :start-after: __deepspeed_torch_basic_example_start__ :end-before: __deepspeed_torch_basic_example_end__ .. group-tab:: Example with PyTorch DataLoader .. dropdown:: Show Code .. literalinclude:: /../../python/ray/train/examples/deepspeed/deepspeed_torch_trainer_no_raydata.py :language: python :start-after: __deepspeed_torch_basic_example_no_raydata_start__ :end-before: __deepspeed_torch_basic_example_no_raydata_end__ .. tip:: To run DeepSpeed with pure PyTorch, you **don't need to** provide any additional Ray Train utilities like :meth:`~ray.train.torch.prepare_model` or :meth:`~ray.train.torch.prepare_data_loader` in your training funciton. Instead, keep using `deepspeed.initialize() `_ as usual to prepare everything for distributed training. Run DeepSpeed with other frameworks ----------------------------------- Many deep learning frameworks have integrated with DeepSpeed, including Lightning, Transformers, Accelerate, and more. You can run all these combinations in Ray Train. Check the below examples for more details: .. list-table:: :header-rows: 1 * - Framework - Example * - Accelelate (:ref:`User Guide `) - `Fine-tune Llama-2 series models with Deepspeed, Accelerate, and Ray Train. `_ * - Transformers (:ref:`User Guide `) - :ref:`Fine-tune GPT-J-6b with DeepSpeed and Hugging Face Transformers ` * - Lightning (:ref:`User Guide `) - :ref:`Fine-tune vicuna-13b with DeepSpeed and PyTorch Lightning `