ray.data.preprocessors.MaxAbsScaler#

class ray.data.preprocessors.MaxAbsScaler(columns: List[str])[source]#

Bases: ray.data.preprocessor.Preprocessor

Scale each column by its absolute max value.

The general formula is given by

\[x' = \frac{x}{\max{\vert x \vert}}\]

where \(x\) is the column and \(x'\) is the transformed column. If \(\max{\vert x \vert} = 0\) (i.e., the column contains all zeros), then the column is unmodified.

Tip

This is the recommended way to scale sparse data. If you data isn’t sparse, you can use MinMaxScaler or StandardScaler instead.

Examples

>>> import pandas as pd
>>> import ray
>>> from ray.data.preprocessors import MaxAbsScaler
>>>
>>> df = pd.DataFrame({"X1": [-6, 3], "X2": [2, -4], "X3": [0, 0]})   # noqa: E501
>>> ds = ray.data.from_pandas(df)  
>>> ds.to_pandas()  
   X1  X2  X3
0  -6   2   0
1   3  -4   0

Columns are scaled separately.

>>> preprocessor = MaxAbsScaler(columns=["X1", "X2"])
>>> preprocessor.fit_transform(ds).to_pandas()  
    X1   X2  X3
0 -1.0  0.5   0
1  0.5 -1.0   0

Zero-valued columns aren’t scaled.

>>> preprocessor = MaxAbsScaler(columns=["X3"])
>>> preprocessor.fit_transform(ds).to_pandas()  
   X1  X2   X3
0  -6   2  0.0
1   3  -4  0.0

Parameters: columns – The columns to separately scale.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

`deserialize`(serialized)	Load the original preprocessor serialized via `self.serialize()`.
`fit`(ds)	Fit this Preprocessor to the Dataset.
`fit_transform`(ds)	Fit this Preprocessor to the Dataset and then transform the Dataset.
`preferred_batch_format`()	Batch format hint for upstream producers to try yielding best block format.
`serialize`()	Return this preprocessor serialized as a string.
`transform`(ds)	Transform the given dataset.
`transform_batch`(data)	Transform a single batch of data.
`transform_stats`()	Return Dataset stats for the most recent transform call, if any.

Ray 2.7.2

ray.data.preprocessors.MaxAbsScaler

ray.data.preprocessors.MaxAbsScaler#