ray.data.preprocessor.Preprocessor
ray.data.preprocessor.Preprocessor#
- class ray.data.preprocessor.Preprocessor[source]#
Bases:
abc.ABCImplements an ML preprocessing operation.
Preprocessors are stateful objects that can be fitted against a Dataset and used to transform both local data batches and distributed data. For example, a Normalization preprocessor may calculate the mean and stdev of a field during fitting, and uses these attributes to implement its normalization transform.
Preprocessors can also be stateless and transform data without needed to be fitted. For example, a preprocessor may simply remove a column, which does not require any state to be fitted.
If you are implementing your own Preprocessor sub-class, you should override the following:
_fitif your preprocessor is stateful. Otherwise, set_is_fittable=False._transform_pandasand/or_transform_numpyfor best performance, implement both. Otherwise, the data will be converted to the match the implemented method.
PublicAPI (beta): This API is in beta and may change before becoming stable.
Methods
__init__()deserialize(serialized)Load the original preprocessor serialized via
self.serialize().fit(ds)Fit this Preprocessor to the Dataset.
fit_transform(ds)Fit this Preprocessor to the Dataset and then transform the Dataset.
Batch format hint for upstream producers to try yielding best block format.
Return this preprocessor serialized as a string.
transform(ds)Transform the given dataset.
transform_batch(data)Transform a single batch of data.
Return Dataset stats for the most recent transform call, if any.