ray.data.preprocessors.MaxAbsScaler
ray.data.preprocessors.MaxAbsScaler#
- class ray.data.preprocessors.MaxAbsScaler(columns: List[str])[source]#
Bases:
ray.data.preprocessor.PreprocessorScale each column by its absolute max value.
The general formula is given by
\[x' = \frac{x}{\max{\vert x \vert}}\]where \(x\) is the column and \(x'\) is the transformed column. If \(\max{\vert x \vert} = 0\) (i.e., the column contains all zeros), then the column is unmodified.
Tip
This is the recommended way to scale sparse data. If you data isn’t sparse, you can use
MinMaxScalerorStandardScalerinstead.Examples
>>> import pandas as pd >>> import ray >>> from ray.data.preprocessors import MaxAbsScaler >>> >>> df = pd.DataFrame({"X1": [-6, 3], "X2": [2, -4], "X3": [0, 0]}) # noqa: E501 >>> ds = ray.data.from_pandas(df) >>> ds.to_pandas() X1 X2 X3 0 -6 2 0 1 3 -4 0
Columns are scaled separately.
>>> preprocessor = MaxAbsScaler(columns=["X1", "X2"]) >>> preprocessor.fit_transform(ds).to_pandas() X1 X2 X3 0 -1.0 0.5 0 1 0.5 -1.0 0
Zero-valued columns aren’t scaled.
>>> preprocessor = MaxAbsScaler(columns=["X3"]) >>> preprocessor.fit_transform(ds).to_pandas() X1 X2 X3 0 -6 2 0.0 1 3 -4 0.0
- Parameters
columns – The columns to separately scale.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
Methods
deserialize(serialized)Load the original preprocessor serialized via
self.serialize().fit(ds)Fit this Preprocessor to the Dataset.
fit_transform(ds)Fit this Preprocessor to the Dataset and then transform the Dataset.
Batch format hint for upstream producers to try yielding best block format.
Return this preprocessor serialized as a string.
transform(ds)Transform the given dataset.
transform_batch(data)Transform a single batch of data.
Return Dataset stats for the most recent transform call, if any.