Base preprocessor that builds an easily modifiable pipeline based on feature data types.
Type
Default
Details
task
Optional[str]
None
scaler
Optional[Union[str, TransformerMixin]]
None
Numeric scaler method. Either “standard”, “minmax”, “robust” or scikit-learn Transformer.
high_cardinality_encoder
Optional[Union[str, TransformerMixin]]
None
Encoder for categorical features with high cardinality. Either “target” or “ordinal”, or scikit-learn Transformer.
numeric_imputer
Optional[Union[str, TransformerMixin]]
None
Imputation method. Either “simple”, “iterative” or scikit-learn Transformer.
custom_preprocessor
Union[None, Pipeline, TransformerMixin]
None
numeric_threshold
Union[int, float]
0.1
Number features with unique values above a certain threshold will be treated as numeric. If float, the threshold is numeric_threshold * samples.
cardinality_threshold
Union[int, float]
20
Non-number features with cardinality above a certain threshold will be treated as ordinal encoded instead of one-hot encoded. If float, the threshold is cardinality_threshold * samples.
verbose
int
0
Verbosity level. Propagated to every scikit-learn function and estimator.
random_state
Optional[int]
None
RNG. Propagated to every scikit-learn function and estimator. The default None sets random_state to 0 so that cross_validate results are comparable.
n_jobs
Optional[int]
None
Controls parallel processing. -1 uses all cores. Propagated to every scikit-learn function.
cache_transformations
bool
False
Whether to cache transformations and set the memory parameter for Pipelines. This can speed up slow transformations as they are not recalculated for each estimator.
PoniardPreprocessor’s job is to build a preprocessing pipeline that fits the input data, both features and target. It does this by inferring the types of the features and selecting appropiate family of transformers for each group. The user is free to select which particular transformer to choose for each group, for example, by changing the default numeric scaler from StandardScaler to RobustScaler.
Customization is done through 3 parameters related to transformers (scaler, high_cardinality_encoder and numeric_imputer), which take standard sklearn-compatible transformers, and 2 parameters related to type inference (numeric_threshold and cardinality_threshold).
The latter work by separating features into buckets. In particular, numeric (int, float) features can be left as numeric or cast to a high cardinality categorical (if the number of unique values is below numeric_threshold), while categoricals can either by low or high cardinality (if the number of unique values exceeds categorical_threshold).
Builds the preprocessor according to the input data.
Gets the data from the main PoniardBaseEstimator (if available) or processes the input data, calls the type inference method, sets up the transformers and builds the pipeline.
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.