Preprocessor#

Preprocessor Interface#

Constructor#

Preprocessor()

Implements an ML preprocessing operation.

Fit/Transform APIs#

fit(ds)

Fit this Preprocessor to the Dataset.

fit_transform(ds)

Fit this Preprocessor to the Dataset and then transform the Dataset.

transform(ds)

Transform the given dataset.

transform_batch(data)

Transform a single batch of data.

transform_stats()

Return Dataset stats for the most recent transform call, if any.

Generic Preprocessors#

Concatenator([output_column_name, include, ...])

Combine numeric columns into a column of type TensorDtype.

SimpleImputer(columns[, strategy, fill_value])

Replace missing values with imputed values.

Categorical Encoders#

Categorizer(columns[, dtypes])

Convert columns to pd.CategoricalDtype.

LabelEncoder(label_column)

Encode labels as integer targets.

MultiHotEncoder(columns, *[, max_categories])

Multi-hot encode categorical data.

OneHotEncoder(columns, *[, max_categories])

One-hot encode categorical data.

OrdinalEncoder(columns, *[, encode_lists])

Encode values within columns as ordered integer values.

Feature Scalers#

MaxAbsScaler(columns)

Scale each column by its absolute max value.

MinMaxScaler(columns)

Scale each column by its range.

Normalizer(columns[, norm])

Scales each sample to have unit norm.

PowerTransformer(columns, power[, method])

Apply a power transform to make your data more normally distributed.

RobustScaler(columns[, quantile_range])

Scale and translate each column using quantiles.

StandardScaler(columns)

Translate and scale each column by its mean and standard deviation, respectively.

K-Bins Discretizers#

CustomKBinsDiscretizer(columns, bins, *[, ...])

Bin values into discrete intervals using custom bin edges.

UniformKBinsDiscretizer(columns, bins, *[, ...])

Bin values into discrete intervals (bins) of uniform width.