ray.data.preprocessors.Normalizer
ray.data.preprocessors.Normalizer#
- class ray.data.preprocessors.Normalizer(columns: List[str], norm='l2')[source]#
Bases:
ray.data.preprocessor.PreprocessorScales each sample to have unit norm.
This preprocessor works by dividing each sample (i.e., row) by the sample’s norm. The general formula is given by
\[s' = \frac{s}{\lVert s \rVert_p}\]where \(s\) is the sample, \(s'\) is the transformed sample, :math:lVert s rVert`, and \(p\) is the norm type.
The following norms are supported:
"l1"(\(L^1\)): Sum of the absolute values."l2"(\(L^2\)): Square root of the sum of the squared values."max"(\(L^\infty\)): Maximum value.
Examples
>>> import pandas as pd >>> import ray >>> from ray.data.preprocessors import Normalizer >>> >>> df = pd.DataFrame({"X1": [1, 1], "X2": [1, 0], "X3": [0, 1]}) >>> ds = ray.data.from_pandas(df) >>> ds.to_pandas() X1 X2 X3 0 1 1 0 1 1 0 1
The \(L^2\)-norm of the first sample is \(\sqrt{2}\), and the \(L^2\)-norm of the second sample is \(1\).
>>> preprocessor = Normalizer(columns=["X1", "X2"]) >>> preprocessor.fit_transform(ds).to_pandas() X1 X2 X3 0 0.707107 0.707107 0 1 1.000000 0.000000 1
The \(L^1\)-norm of the first sample is \(2\), and the \(L^1\)-norm of the second sample is \(1\).
>>> preprocessor = Normalizer(columns=["X1", "X2"], norm="l1") >>> preprocessor.fit_transform(ds).to_pandas() X1 X2 X3 0 0.5 0.5 0 1 1.0 0.0 1
The \(L^\infty\)-norm of the both samples is \(1\).
>>> preprocessor = Normalizer(columns=["X1", "X2"], norm="max") >>> preprocessor.fit_transform(ds).to_pandas() X1 X2 X3 0 1.0 1.0 0 1 1.0 0.0 1
- Parameters
columns – The columns to scale. For each row, these colmumns are scaled to unit-norm.
norm – The norm to use. The supported values are
"l1","l2", or"max". Defaults to"l2".
- Raises
ValueError – if
normis not"l1","l2", or"max".
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
Methods
deserialize(serialized)Load the original preprocessor serialized via
self.serialize().fit(ds)Fit this Preprocessor to the Dataset.
fit_transform(ds)Fit this Preprocessor to the Dataset and then transform the Dataset.
Batch format hint for upstream producers to try yielding best block format.
Return this preprocessor serialized as a string.
transform(ds)Transform the given dataset.
transform_batch(data)Transform a single batch of data.
Return Dataset stats for the most recent transform call, if any.