.. _train-deepspeed:
DeepSpeed 入门
==========================
The :class:`~ray.train.torch.TorchTrainer` can help you easily launch your `DeepSpeed `_ training across a distributed Ray cluster.
Code example
------------
You only need to run your existing training code with a TorchTrainer. You can expect the final code to look like this:
.. code-block:: python
import deepspeed
from deepspeed.accelerator import get_accelerator
def train_func(config):
# Instantiate your model and dataset
model = ...
train_dataset = ...
eval_dataset = ...
deepspeed_config = {...} # Your Deepspeed config
# Prepare everything for distributed training
model, optimizer, train_dataloader, lr_scheduler = deepspeed.initialize(
model=model,
model_parameters=model.parameters(),
training_data=tokenized_datasets["train"],
collate_fn=collate_fn,
config=deepspeed_config,
)
# Define the GPU device for the current worker
device = get_accelerator().device_name(model.local_rank)
# Start training
...
from ray.train.torch import TorchTrainer
from ray.train import ScalingConfig
trainer = TorchTrainer(
train_func,
scaling_config=ScalingConfig(...),
...
)
trainer.fit()
Below is a simple example of ZeRO-3 training with DeepSpeed only.
.. tabs::
.. group-tab:: Example with Ray Data
.. dropdown:: Show Code
.. literalinclude:: /../../python/ray/train/examples/deepspeed/deepspeed_torch_trainer.py
:language: python
:start-after: __deepspeed_torch_basic_example_start__
:end-before: __deepspeed_torch_basic_example_end__
.. group-tab:: Example with PyTorch DataLoader
.. dropdown:: Show Code
.. literalinclude:: /../../python/ray/train/examples/deepspeed/deepspeed_torch_trainer_no_raydata.py
:language: python
:start-after: __deepspeed_torch_basic_example_no_raydata_start__
:end-before: __deepspeed_torch_basic_example_no_raydata_end__
.. tip::
To run DeepSpeed with pure PyTorch, you **don't need to** provide any additional Ray Train utilities
like :meth:`~ray.train.torch.prepare_model` or :meth:`~ray.train.torch.prepare_data_loader` in your training funciton. Instead,
keep using `deepspeed.initialize() `_ as usual to prepare everything
for distributed training.
Run DeepSpeed with other frameworks
-----------------------------------
Many deep learning frameworks have integrated with DeepSpeed, including Lightning, Transformers, Accelerate, and more. You can run all these combinations in Ray Train.
Check the below examples for more details:
.. list-table::
:header-rows: 1
* - Framework
- Example
* - Accelelate (:ref:`User Guide `)
- `Fine-tune Llama-2 series models with Deepspeed, Accelerate, and Ray Train. `_
* - Transformers (:ref:`User Guide `)
- :ref:`Fine-tune GPT-J-6b with DeepSpeed and Hugging Face Transformers `
* - Lightning (:ref:`User Guide `)
- :ref:`Fine-tune vicuna-13b with DeepSpeed and PyTorch Lightning `