ray.data.Dataset.mean
ray.data.Dataset.mean#
- Dataset.mean(on: Optional[Union[str, List[str]]] = None, ignore_nulls: bool = True) Union[Any, Dict[str, Any]][source]#
Compute the mean of one or more columns.
Note
This operation will trigger execution of the lazy transformations performed on this dataset.
Examples
>>> import ray >>> ray.data.range(100).mean("id") 49.5 >>> ray.data.from_items([ ... {"A": i, "B": i**2} ... for i in range(100) ... ]).mean(["A", "B"]) {'mean(A)': 49.5, 'mean(B)': 3283.5}
- Parameters
on – a column name or a list of column names to aggregate.
ignore_nulls – Whether to ignore null values. If
True, null values are ignored when computing the mean; ifFalse, when a null value is encountered, the output isNone. This method considersnp.nan,None, andpd.NaTto be null values. Default isTrue.
- Returns
The mean result.
For different values of
on, the return varies:on=None: an dict containing the column-wise mean of all columns,on="col": a scalar representing the mean of all items in column"col",on=["col_1", ..., "col_n"]: an n-column dict containing the column-wise mean of the provided columns.
If the dataset is empty, all values are null. If
ignore_nullsisFalseand any value is null, then the output isNone.