(gentle-intro)=
# 入门
使用 Ray 扩展笔记本电脑或云上的应用程序。为您的任务选择正确的指南。
* 扩展 ML 工作负载:[Ray Libraries 快速入门](#ray-libraries-quickstart)
* 扩展通用 Python 应用程序:[Ray Core 快速入门](#ray-core-quickstart)
* 部署到云:[Ray Clusters 快速入门](#ray-cluster-quickstart)
* 调试和监视应用:[调试和监视快速入门](#debugging-and-monitoring-quickstart)
(libraries-quickstart)=
## Ray AI 库快速入门
使用单独的库来处理 ML 工作负载。单击下面适合您的工作负载的下拉菜单。
`````{dropdown}
数据:用于 ML 的可扩展数据集
:animate: fade-in-slide-down
使用 [Ray Data](data_key_concepts) 扩展离线推理和训练摄取。--
一个专为 ML 设置的数据处理库。
要了解更多信息,请参阅 [离线批量推理](batch_inference_overview) 和
[用于 ML 训练的数据预处理和摄取](batch_inference_overview) 。
````{note}
要运行此示例,请安装 Ray Data:
```bash
pip install -U "ray[data]"
```
````
```{testcode}
from typing import Dict
import numpy as np
import ray
# 从本地磁盘文件,Python 对象,以及 s3 云存储 创建数据集
ds = ray.data.read_csv("s3://anonymous@ray-example-data/iris.csv")
# Apply functions to transform data. Ray Data executes transformations in parallel.
def compute_area(batch: Dict[str, np.ndarray]) -> Dict[str, np.ndarray]:
length = batch["petal length (cm)"]
width = batch["petal width (cm)"]
batch["petal area (cm^2)"] = length * width
return batch
transformed_ds = ds.map_batches(compute_area)
# Iterate over batches of data.
for batch in transformed_ds.iter_batches(batch_size=4):
print(batch)
# Save dataset contents to on-disk files or cloud storage.
transformed_ds.write_parquet("local:///tmp/iris/")
```
```{testoutput}
:hide:
...
```
```{button-ref} ../data/data
:color: primary
:outline:
:expand:
了解更多关于 Ray Data 的信息
```
`````
``````{dropdown}
训练:分布式模型训练
:animate: fade-in-slide-down
Ray Train 抽象化了建立分布式训练系统的复杂性。
`````{tab-set}
````{tab-item} PyTorch
此示例展示了如何将 Ray Train 与 PyTorch 结合使用。
要运行此示例,请安装 Ray Train 和 PyTorch 软件包:
:::{note}
```bash
pip install -U "ray[train]" torch torchvision
```
:::
设置您的数据集和模型。
```{literalinclude} /../../python/ray/train/examples/pytorch/torch_quick_start.py
:language: python
:start-after: __torch_setup_begin__
:end-before: __torch_setup_end__
```
现在定义您的 single-worker PyTorch 训练函数。
```{literalinclude} /../../python/ray/train/examples/pytorch/torch_quick_start.py
:language: python
:start-after: __torch_single_begin__
:end-before: __torch_single_end__
```
该训练函数可以通过以下方式执行:
```{literalinclude} /../../python/ray/train/examples/pytorch/torch_quick_start.py
:language: python
:start-after: __torch_single_run_begin__
:end-before: __torch_single_run_end__
:dedent: 0
```
将其转换为分布式 multi-worker 训练函数。
使用 `ray.train.torch.prepare_model` 和 `ray.train.torch.prepare_data_loader` 实用函数设置模型和数据以进行分布式训练。
这会自动包装模型并将 `DistributedDataParallel` 其放置在正确的设备上,
然后添加 `DistributedSampler` 到 `DataLoaders` 中。
```{literalinclude} /../../python/ray/train/examples/pytorch/torch_quick_start.py
:language: python
:start-after: __torch_distributed_begin__
:end-before: __torch_distributed_end__
```
实例化一个有 4 个工作线程的 ``TorchTrainer``,
并使用他运行新的训练方法。
```{literalinclude} /../../python/ray/train/examples/pytorch/torch_quick_start.py
:language: python
:start-after: __torch_trainer_begin__
:end-before: __torch_trainer_end__
:dedent: 0
```
````
````{tab-item} TensorFlow
This example shows how you can use Ray Train to set up [Multi-worker training
with Keras](https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras).
要运行本示例需安装 Ray Train 以及 Tensorflow 包:
:::{note}
```bash
pip install -U "ray[train]" tensorflow
```
:::
设置数据集和模型。
```{literalinclude} /../../python/ray/train/examples/tf/tensorflow_quick_start.py
:language: python
:start-after: __tf_setup_begin__
:end-before: __tf_setup_end__
```
现在定义 single-worker TensorFlow 训练方法。
```{literalinclude} /../../python/ray/train/examples/tf/tensorflow_quick_start.py
:language: python
:start-after: __tf_single_begin__
:end-before: __tf_single_end__
```
此训练方法可通过以下执行:
```{literalinclude} /../../python/ray/train/examples/tf/tensorflow_quick_start.py
:language: python
:start-after: __tf_single_run_begin__
:end-before: __tf_single_run_end__
:dedent: 0
```
现在将其转换成分布式 multi-worker 训练方法。
1. 设置 *global* 批量大小 - each worker processes the same size
batch as in the single-worker code.
2. Choose your TensorFlow distributed training strategy. This examples
uses the ``MultiWorkerMirroredStrategy``.
```{literalinclude} /../../python/ray/train/examples/tf/tensorflow_quick_start.py
:language: python
:start-after: __tf_distributed_begin__
:end-before: __tf_distributed_end__
```
初始化一个 4 个工作现成的 ``TensorflowTrainer``,
并使用它运行新的训练方法。
```{literalinclude} /../../python/ray/train/examples/tf/tensorflow_quick_start.py
:language: python
:start-after: __tf_trainer_begin__
:end-before: __tf_trainer_end__
:dedent: 0
```
```{button-ref} ../train/train
:color: primary
:outline:
:expand:
了解更多 Ray Train
```
````
`````
``````
`````{dropdown}
Tune: Hyperparameter Tuning at Scale
:animate: fade-in-slide-down
[Tune](../tune/index.rst) 是用于任何规模的超参数调优的库。
借助 Tune,您可以用不到 10 行代码启动多节点分布式超参数扫描。
Tune 支持任何深度学习框架,包括 PyTorch、TensorFlow 和 Keras。
````{note}
要运行此示例,请安装 Ray Tune:
```bash
pip install -U "ray[tune]"
```
````
此示例使用迭代训练函数运行小型网格搜索。
```{literalinclude} ../../../python/ray/tune/tests/example.py
:end-before: __quick_start_end__
:language: python
:start-after: __quick_start_begin__
```
如果安装了 TensorBoard, automatically visualize all trial results:
```bash
tensorboard --logdir ~/ray_results
```
```{button-ref} ../tune/index
:color: primary
:outline:
:expand:
了解更多 Ray Tune
```
`````
`````{dropdown}
服务:可扩展模型服务
:animate: fade-in-slide-down
[Ray Serve](../serve/index) 是一个基于 Ray 构建的可扩展模型服务库。
````{note}
运行此示例,需安装 Ray Serve 和 scikit-learn:
```{code-block} bash
pip install -U "ray[serve]" scikit-learn
```
````
此示例运行服务于 scikit-learn 梯度增强分类器。
```{literalinclude} ../serve/doc_code/sklearn_quickstart.py
:language: python
:start-after: __serve_example_begin__
:end-before: __serve_example_end__
```
你将看到如示结果 `{"result": "versicolor"}`。
```{button-ref} ../serve/index
:color: primary
:outline:
:expand:
了解更多 Ray Serve
```
`````
`````{dropdown}
RLlib: Industry-Grade Reinforcement Learning
:animate: fade-in-slide-down
[RLlib](../rllib/index.rst) 是一个基于 Ray 构建的工业级强化学习 (RL) 库。
RLlib 为各种行业和研究应用程序提供高可扩展性和统一的 API。
````{note}
运行此示例,请安装 `rllib` 以及 `tensorflow` 或 `pytorch`:
```bash
pip install -U "ray[rllib]" tensorflow # or torch
```
````
```{literalinclude} ../../../rllib/examples/documentation/rllib_on_ray_readme.py
:end-before: __quick_start_end__
:language: python
:start-after: __quick_start_begin__
```
```{button-ref} ../rllib/index
:color: primary
:outline:
:expand:
了解更多 Ray RLlib
```
`````
## Ray Core 入门
将类和函数轻松转为 Ray 任务和 actor,
适用于 Python 和 Java,具有用于构建和运行分布式应用程序的简单原语。
``````{dropdown}
Core: Parallelizing Functions with Ray Tasks
:animate: fade-in-slide-down
`````{tab-set}
````{tab-item} Python
:::{note}
要运行此示例,请安装 Ray Core:
```bash
pip install -U "ray"
```
:::
导入 Ray 并用以下命令初始化它 `ray.init()`。
使用 ``@ray.remote`` 装饰方法来定义要在远程运行的函数。
最后,通过调用方法的 ``.remote()`` 来替代常规调用。
这个远程调用会产生一个 future,一个 Ray _对象引用_,然后您可以使用 ``ray.get`` 获取它。
```{code-block} python
import ray
ray.init()
@ray.remote
def f(x):
return x * x
futures = [f.remote(i) for i in range(4)]
print(ray.get(futures)) # [0, 1, 4, 9]
```
````
````{tab-item} Java
```{note}
要运行此示例,在你的项目添加 [ray-api](https://mvnrepository.com/artifact/io.ray/ray-api) 和 [ray-runtime](https://mvnrepository.com/artifact/io.ray/ray-runtime) 依赖。
```
使用 `Ray.init` 实例化 Ray 运行时。
使用 `Ray.task(...).remote()` 来转换任何 Java 静态方法为 Ray 任务。
该任务在远程工作进程中异步运行。`remote` 方法返回一个 ``ObjectRef``,
你可以通过 ``get`` 来获取实际的结果。
```{code-block} java
import io.ray.api.ObjectRef;
import io.ray.api.Ray;
import java.util.ArrayList;
import java.util.List;
public class RayDemo {
public static int square(int x) {
return x * x;
}
public static void main(String[] args) {
// Intialize Ray runtime.
Ray.init();
List> objectRefList = new ArrayList<>();
// Invoke the `square` method 4 times remotely as Ray tasks.
// The tasks will run in parallel in the background.
for (int i = 0; i < 4; i++) {
objectRefList.add(Ray.task(RayDemo::square, i).remote());
}
// Get the actual results of the tasks.
System.out.println(Ray.get(objectRefList)); // [0, 1, 4, 9]
}
}
```
在上面的代码块中,我们定义了一些 Ray 任务。
虽然这些对于无状态操作非常有用,但有时您必须维护应用程序的状态。您可以使用 Ray Actors 来做到这一点。
```{button-ref} ../ray-core/walkthrough
:color: primary
:outline:
:expand:
了解更多关于 Ray Core
```
````
`````
``````
``````{dropdown}
Core: Parallelizing Classes with Ray Actors
:animate: fade-in-slide-down
Ray 提供了 actor 来允许您并行化 Python 或 Java 中的类实例。
当您实例化作为 Ray actor 的类时,Ray 将在集群中启动该类的远程实例。
然后,该 actor 可以执行远程方法调用并维护其自己的内部状态。
`````{tab-set}
````{tab-item} Python
:::{note}
安装 Ray Core 运行示例:
```bash
pip install -U "ray"
```
:::
```{code-block} python
import ray
ray.init() # Only call this once.
@ray.remote
class Counter(object):
def __init__(self):
self.n = 0
def increment(self):
self.n += 1
def read(self):
return self.n
counters = [Counter.remote() for i in range(4)]
[c.increment.remote() for c in counters]
futures = [c.read.remote() for c in counters]
print(ray.get(futures)) # [1, 1, 1, 1]
```
````
````{tab-item} Java
```{note}
要运行示例,需在项目添加 [ray-api](https://mvnrepository.com/artifact/io.ray/ray-api) 和 [ray-runtime](https://mvnrepository.com/artifact/io.ray/ray-runtime) 依赖。
```
```{code-block} java
import io.ray.api.ActorHandle;
import io.ray.api.ObjectRef;
import io.ray.api.Ray;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
public class RayDemo {
public static class Counter {
private int value = 0;
public void increment() {
this.value += 1;
}
public int read() {
return this.value;
}
}
public static void main(String[] args) {
// Intialize Ray runtime.
Ray.init();
List> counters = new ArrayList<>();
// Create 4 actors from the `Counter` class.
// They will run in remote worker processes.
for (int i = 0; i < 4; i++) {
counters.add(Ray.actor(Counter::new).remote());
}
// Invoke the `increment` method on each actor.
// This will send an actor task to each remote actor.
for (ActorHandle counter : counters) {
counter.task(Counter::increment).remote();
}
// Invoke the `read` method on each actor, and print the results.
List> objectRefList = counters.stream()
.map(counter -> counter.task(Counter::read).remote())
.collect(Collectors.toList());
System.out.println(Ray.get(objectRefList)); // [1, 1, 1, 1]
}
}
```
```{button-ref} ../ray-core/walkthrough
:color: primary
:outline:
:expand:
了解更多 Ray Core
```
````
`````
``````
## Ray Cluster 入门
在 Ray 集群部署你的应用,通常对现有代码进行最少的代码更改。
`````{dropdown}
Clusters: Launching a Ray Cluster on AWS
:animate: fade-in-slide-down
Ray 程序可以在单台机器上运行,也可以无缝扩展到大型集群。
以这个等待各个节点加入集群的简单示例为例。
````{dropdown} example.py
:animate: fade-in-slide-down
```{literalinclude} ../../yarn/example.py
:language: python
```
````
你可以从我们的 [GitHub 仓库](https://github.com/ray-project/ray/blob/master/doc/yarn/example.py) 下载示例。
继续将其存储在本地一个名为 `example.py` 的文件中。
要在云端执行脚本,需要下载 [配置文件](https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/aws/example-full.yaml),
或者从这里拷贝:
````{dropdown} cluster.yaml
:animate: fade-in-slide-down
```{literalinclude} ../../../python/ray/autoscaler/aws/example-full.yaml
:language: yaml
```
````
假设您已将此配置存储在名为 `cluster.yaml` 的文件中,您现在可以按如下方式启动 AWS 集群:
```bash
ray submit cluster.yaml example.py --start
```
```{button-ref} cluster-index
:color: primary
:outline:
:expand:
了解有关启动 Ray Cluster 的更多信息
```
`````
## 调试和监控快速入门
使用内置的可观察性工具来监视和调试 Ray 应用程序和集群。
`````{dropdown}
Ray Dashboard: Web GUI to monitor and debug Ray
:animate: fade-in-slide-down
Ray仪表板提供可视化界面,显示实时系统指标、节点级资源监控、作业分析和任务可视化。该仪表板旨在帮助用户了解 Ray 应用程序的性能并识别潜在问题。
```{image} https://raw.githubusercontent.com/ray-project/Images/master/docs/new-dashboard/Dashboard-overview.png
:align: center
```
````{note}
要开始使用仪表板,请安装默认安装,如下所示:
```bash
pip install -U "ray[default]"
```
````
通过默认 URL 访问仪表板,http://localhost:8265。
```{button-ref} observability-getting-started
:color: primary
:outline:
:expand:
了解更多 Ray Dashboard
```
`````
`````{dropdown}
Ray State APIs: CLI to access cluster states
:animate: fade-in-slide-down
Ray 状态 API 允许用户通过 CLI 或 Python SDK 方便地访问 Ray 的当前状态(快照)。
````{note}
要开始使用状态 API,请参考默认安装,如下所示:
```bash
pip install -U "ray[default]"
```
````
运行以下代码。
```{code-block} python
import ray
import time
ray.init(num_cpus=4)
@ray.remote
def task_running_300_seconds():
print("Start!")
time.sleep(300)
@ray.remote
class Actor:
def __init__(self):
print("Actor created")
# Create 2 tasks
tasks = [task_running_300_seconds.remote() for _ in range(2)]
# Create 2 actors
actors = [Actor.remote() for _ in range(2)]
ray.get(tasks)
```
使用 ``ray summary tasks`` 查看 Ray 任务的汇总统计数据。
```{code-block} bash
ray summary tasks
```
```{code-block} text
======== Tasks Summary: 2022-07-22 08:54:38.332537 ========
Stats:
------------------------------------
total_actor_scheduled: 2
total_actor_tasks: 0
total_tasks: 2
Table (group by func_name):
------------------------------------
FUNC_OR_CLASS_NAME STATE_COUNTS TYPE
0 task_running_300_seconds RUNNING: 2 NORMAL_TASK
1 Actor.__init__ FINISHED: 2 ACTOR_CREATION_TASK
```
```{button-ref} observability-programmatic
:color: primary
:outline:
:expand:
学习更多关于 Ray State APIs
```
`````
```{include} learn-more.md
```