Base estimator

PoniardBaseEstimator sets up 95% of the funcionality for PoniardClassifier and PoniardRegressor.

source

PoniardBaseEstimator

 PoniardBaseEstimator (estimators:Optional[Union[Sequence[ClassifierMixin]
                       ,Dict[str,ClassifierMixin],Sequence[RegressorMixin]
                       ,Dict[str,RegressorMixin]]]=None, metrics:Optional[
                       Union[str,Dict[str,Callable],Sequence[str]]]=None,
                       preprocess:bool=True, custom_preprocessor:Union[Non
                       e,Pipeline,TransformerMixin,PoniardPreprocessor]=No
                       ne, cv:Union[int,BaseCrossValidator,BaseShuffleSpli
                       t,Sequence]=None, verbose:int=0,
                       random_state:Optional[int]=None,
                       n_jobs:Optional[int]=None,
                       plugins:Optional[Sequence[Any]]=None,
                       plot_options:Optional[PoniardPlotFactory]=None)

Base estimator that sets up all the functionality for the classifier and regressor.

Type Default Details
estimators Optional[Union[Sequence[ClassifierMixin], Dict[str, ClassifierMixin], Sequence[RegressorMixin], Dict[str, RegressorMixin]]] None Estimators to evaluate.
metrics Optional[Union[str, Dict[str, Callable], Sequence[str]]] None Metrics to compute for each estimator. This is more restrictive than sklearn’s scoring
parameter, as it does not allow callable scorers. Single strings are cast to lists
automatically.
preprocess bool True If True, impute missing values, standard scale numeric data and one-hot or ordinal
encode categorical data.
custom_preprocessor Union[None, Pipeline, TransformerMixin, PoniardPreprocessor] None Preprocessor used instead of the default preprocessing pipeline. It must be able to be
included directly in a scikit-learn Pipeline.
cv Union[int, BaseCrossValidator, BaseShuffleSplit, Sequence] None Cross validation strategy. Either an integer, a scikit-learn cross validation object,
or an iterable.
verbose int 0 Verbosity level. Propagated to every scikit-learn function and estimator.
random_state Optional[int] None RNG. Propagated to every scikit-learn function and estimator. The default None sets
random_state to 0 so that cross_validate results are comparable.
n_jobs Optional[int] None Controls parallel processing. -1 uses all cores. Propagated to every scikit-learn
function.
plugins Optional[Sequence[Any]] None Plugin instances that run in set moments of setup, fit and plotting.
plot_options Optional[PoniardPlotFactory] None :class:poniard.plot.plot_factory.PoniardPlotFactory instance specifying Plotly format
options or None, which sets the default factory.

See the guides on Getting started, Main parameters and Preprocessing for examples on how the constructor parameters work.

Main methods


source

PoniardBaseEstimator.setup

 PoniardBaseEstimator.setup
                             (X:Union[pandas.core.frame.DataFrame,numpy.nd
                             array,List], y:Union[pandas.core.frame.DataFr
                             ame,numpy.ndarray,List], show_info:bool=True)

Acts as an orchestrator for Poniard estimators by setting up everything neeeded for PoniardBaseEstimator.fit.

Converts inputs to arrays if necessary, sets metrics, preprocessor, cv and pipelines.

After running PoniardBaseEstimator.setup, both X and y will be held as attributes.

Type Default Details
X Union[pd.DataFrame, np.ndarray, List] Features
y Union[pd.DataFrame, np.ndarray, List] Target.
show_info bool True Whether to print information about the target, metrics and type inference.
Returns PoniardBaseEstimator

PoniardBaseEstimator.setup takes features and target as parameters, while PoniardBaseEstimator.fit does not accept any. This runs contrary to the established convention defined by scikit-learn where there is no setting up to do and fit takes the data as params.

This is because Poniard does not only fit the models, but also infer features types and create the preprocessor based on these types. While this could all be stuffed inside PoniardBaseEstimator.fit (that was the case initially), having it separated allows the user to check whether Poniard’s assumptions are correct and adjust if needed before running fit, which can take long depending on how many models were passed to estimators, the cross validation strategy and the size of the dataset.

PoniardBaseEstimator by default includes a PoniardPreprocessor that handles building the preprocessor that will go into final estimation pipelines. However, a PoniardPreprocessor with custom parameters can be used as a custom_preprocessor.

An example

Let’s load some random data and setup a PoniardClassifier, which inherits from PoniardBaseEstimator.

from poniard import PoniardClassifier
random.seed(0)
rng = np.random.default_rng(0)

data = pd.DataFrame(
    {
        "type": random.choices(["house", "apartment"], k=500),
        "age": rng.uniform(1, 200, 500).astype(int),
        "date": pd.date_range("2022-01-01", freq="M", periods=500),
        "rating": random.choices(range(50), k=500),
        "target": random.choices([0, 1], k=500),
    }
)
data.head()
type age date rating target
0 apartment 127 2022-01-31 1 1
1 apartment 54 2022-02-28 17 1
2 house 9 2022-03-31 0 1
3 house 4 2022-04-30 48 1
4 apartment 162 2022-05-31 40 0

Information about the data will be shown so it can be reviewed and changes can be made.

X, y = data.drop("target", axis=1), data["target"]
pnd = PoniardClassifier()
pnd.setup(X, y)

Setup info

Target

Type: binary

Shape: (500,)

Unique values: 2

Metrics

Main metric: roc_auc

Feature type inference

Minimum unique values to consider a number-like feature numeric: 50

Minimum unique values to consider a categorical feature high cardinality: 20

Inferred feature types:

numeric categorical_high categorical_low datetime
0 age rating type date
PoniardClassifier()

After passing data to Poniard estimators through setup, multiple attributes become available.

feature_types is a dict that sorts features in 4 categories (numeric, categorical_high, categorical_low and datetime) using some basic heuristics. This attribute is computed in PoniardPreprocessor.build, and will not be available if a non-PoniardPreprocessor transformer is passed to custom_preprocessor.

Feature types depend on the feature dtypes, and numeric_threshold and cardinality_threshold which are used in PoniardPreprocessor’s construction.

pnd.feature_types
{'numeric': ['age'],
 'categorical_high': ['rating'],
 'categorical_low': ['type'],
 'datetime': ['date']}

The preprocessor can be the transformer produced by a PoniardPreprocessor, which in turn depends on feature_types, and the scaler, numeric_imputer and high_cardinality_encoder parameters, or a user-supplied scikit-learn compatible transformer.

As will be seen further on, the PoniardPreprocessor can be modified significantly to fit multiple use cases and datasets.

pnd.preprocessor
Pipeline(steps=[('type_preprocessor',
                 ColumnTransformer(transformers=[('numeric_preprocessor',
                                                  Pipeline(steps=[('numeric_imputer',
                                                                   SimpleImputer()),
                                                                  ('scaler',
                                                                   StandardScaler())]),
                                                  ['age']),
                                                 ('categorical_low_preprocessor',
                                                  Pipeline(steps=[('categorical_imputer',
                                                                   SimpleImputer(strategy='most_frequent')),
                                                                  ('one-hot_encoder',
                                                                   OneHotEncoder(drop='if_binary',
                                                                                 hand...
                                                                   SimpleImputer(strategy='most_frequent')),
                                                                  ('high_cardinality_encoder',
                                                                   TargetEncoder(handle_unknown='ignore',
                                                                                 task='classification'))]),
                                                  ['rating']),
                                                 ('datetime_preprocessor',
                                                  Pipeline(steps=[('datetime_encoder',
                                                                   DatetimeEncoder()),
                                                                  ('datetime_imputer',
                                                                   SimpleImputer(strategy='most_frequent'))]),
                                                  ['date'])])),
                ('remove_invariant', VarianceThreshold())],
         verbose=0)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Each estimator has a set of default metrics, but others can be passed during construction.

pnd.metrics
['roc_auc', 'accuracy', 'precision', 'recall', 'f1']

Likewise, cv has sane defaults but can be modified accordingly.

pnd.cv
StratifiedKFold(n_splits=5, random_state=0, shuffle=True)

target_info lists information about y.

pnd.target_info
{'type_': 'binary', 'ndim': 1, 'shape': (500,), 'nunique': 2}

pipelines is a dict containing each pipeline which will be trained during fit. Each Poniard estimator has a limited set of default estimators that are used if none are specified during initialization.

pnd.pipelines["SVC"]
Pipeline(steps=[('preprocessor',
                 Pipeline(steps=[('type_preprocessor',
                                  ColumnTransformer(transformers=[('numeric_preprocessor',
                                                                   Pipeline(steps=[('numeric_imputer',
                                                                                    SimpleImputer()),
                                                                                   ('scaler',
                                                                                    StandardScaler())]),
                                                                   ['age']),
                                                                  ('categorical_low_preprocessor',
                                                                   Pipeline(steps=[('categorical_imputer',
                                                                                    SimpleImputer(strategy='most_frequent')),
                                                                                   ('one-hot_encoder',
                                                                                    One...
                                                                                    TargetEncoder(handle_unknown='ignore',
                                                                                                  task='classification'))]),
                                                                   ['rating']),
                                                                  ('datetime_preprocessor',
                                                                   Pipeline(steps=[('datetime_encoder',
                                                                                    DatetimeEncoder()),
                                                                                   ('datetime_imputer',
                                                                                    SimpleImputer(strategy='most_frequent'))]),
                                                                   ['date'])])),
                                 ('remove_invariant', VarianceThreshold())],
                          verbose=0)),
                ('SVC',
                 SVC(kernel='linear', probability=True, random_state=0,
                     verbose=0))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

source

PoniardBaseEstimator.fit

 PoniardBaseEstimator.fit ()

This is the main Poniard method. It uses scikit-learn’s cross_validate function to score all metrics for every pipelines, using cv for cross validation.

pnd.fit()
PoniardClassifier()

Because features and target are passed to the Poniard estimator, fit does not take any parameters.

After fitting pipelines, cross validated results can be accessed by running get_results


source

PoniardBaseEstimator.get_results

 PoniardBaseEstimator.get_results (return_train_scores:bool=False,
                                   std:bool=False, wrt_dummy:bool=False)

Return dataframe containing scoring results. By default returns the mean score and fit and score times. Optionally returns standard deviations as well.

Type Default Details
return_train_scores bool False If False, only return test scores.
std bool False Whether to return standard deviation of the scores. Default False.
wrt_dummy bool False Whether to compute each score/time with respect to the dummy estimator results. Default
False.
Returns Union[Tuple[pd.DataFrame, pd.DataFrame], pd.DataFrame] Results
pnd.get_results()
test_roc_auc test_accuracy test_precision test_recall test_f1 fit_time score_time
DecisionTreeClassifier 0.510256 0.510 0.531145 0.503846 0.516707 0.010714 0.007243
DummyClassifier 0.500000 0.520 0.520000 1.000000 0.684211 0.009618 0.007332
KNeighborsClassifier 0.496675 0.492 0.509150 0.534615 0.519465 0.009883 0.008536
SVC 0.472356 0.476 0.499007 0.688462 0.575907 0.715862 0.008426
LogisticRegression 0.468990 0.488 0.509234 0.573077 0.536862 0.019850 0.007661
XGBClassifier 0.460417 0.486 0.502401 0.500000 0.499330 0.046362 0.009421
HistGradientBoostingClassifier 0.456571 0.488 0.505975 0.484615 0.494283 0.405131 0.019346
RandomForestClassifier 0.435056 0.462 0.479861 0.476923 0.477449 0.070931 0.014314
GaussianNB 0.423317 0.468 0.492473 0.565385 0.525371 0.010134 0.007401
means, stds = pnd.get_results(std=True, return_train_scores=True)
stds
test_roc_auc train_roc_auc test_accuracy train_accuracy test_precision train_precision test_recall train_recall test_f1 train_f1 fit_time score_time
DecisionTreeClassifier 0.060706 0.000000e+00 0.060332 0.000000 0.059942 0.000000 0.058835 0.000000 0.057785 0.000000 0.000303 0.000047
DummyClassifier 0.000000 0.000000e+00 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000404 0.000100
KNeighborsClassifier 0.021105 8.429609e-03 0.019391 0.010840 0.019140 0.008157 0.081043 0.022053 0.049760 0.012869 0.000341 0.000070
SVC 0.038609 3.600720e-02 0.042708 0.032496 0.031965 0.028405 0.085485 0.073140 0.036968 0.026864 0.110736 0.000229
LogisticRegression 0.068079 2.545484e-02 0.041183 0.027946 0.037992 0.024759 0.065948 0.021371 0.036585 0.022583 0.004623 0.000269
XGBClassifier 0.065278 0.000000e+00 0.035553 0.000000 0.033315 0.000000 0.091826 0.000000 0.061108 0.000000 0.001688 0.000196
HistGradientBoostingClassifier 0.059681 7.749323e-04 0.041183 0.007483 0.039938 0.011912 0.070291 0.005607 0.054859 0.007046 0.049279 0.005965
RandomForestClassifier 0.060809 7.021667e-17 0.039192 0.000000 0.038392 0.000000 0.077307 0.000000 0.056132 0.000000 0.000342 0.000267
GaussianNB 0.045845 2.494438e-02 0.042143 0.018303 0.037330 0.015830 0.031246 0.038051 0.025456 0.018727 0.000729 0.000126

get_estimator is a convenience method that gets a pipeline from pipelines by name, and optionally trains it on X and y.


source

PoniardBaseEstimator.get_estimator

 PoniardBaseEstimator.get_estimator (estimator_name:str,
                                     include_preprocessor:bool=True,
                                     retrain:bool=False)

Obtain an estimator in pipelines by name. This is useful for extracting default estimators or hyperparmeter-optimized estimators (after using PoniardBaseEstimator.tune_estimator).

Type Default Details
estimator_name str Estimator name.
include_preprocessor bool True Whether to return a pipeline with a preprocessor or just the estimator. Default True.
retrain bool False Whether to retrain with full data. Default False.
Returns Union[Pipeline, ClassifierMixin, RegressorMixin] Estimator.

source

PoniardBaseEstimator.analyze_estimator

 PoniardBaseEstimator.analyze_estimator (estimator_name:str,
                                         height:int=800, width:int=800)

Print a selection of metrics and plots for a given estimator.

By default, orders estimators according to the first metric.

Type Default Details
estimator_name str Name of estimator to analyze.
height int 800 Height of output Figure.
width int 800 Width of output Figure.
Returns Figure Figure

PoniardBaseEstimator.analyze_estimator provides a quick overview of an estimator’s performance.

from sklearn.datasets import load_breast_cancer
from poniard import PoniardClassifier
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
pnd = PoniardClassifier().setup(X, y, show_info=False)
pnd.fit()
PoniardClassifier()
pnd.analyze_estimator("SVC", height=1000, width=1000)

Modifying estimators after initialization

Estimators can be added and removed directly. Note that if other estimators have been already fit, only the added ones will be processed during PoniardBaseEstimator.fit.


source

PoniardBaseEstimator.add_estimators

 PoniardBaseEstimator.add_estimators (estimators:Union[Dict[str,sklearn.ba
                                      se.ClassifierMixin],Sequence[sklearn
                                      .base.ClassifierMixin]])

Include new estimator. This is the recommended way of adding an estimator (as opposed to modifying pipelines directly), since it also injects random state, n_jobs and verbosity.

Type Details
estimators Union[Dict[str, ClassifierMixin], Sequence[ClassifierMixin]] Estimators to add.
Returns PoniardBaseEstimator Self.

source

PoniardBaseEstimator.remove_estimators

 PoniardBaseEstimator.remove_estimators (estimator_names:Sequence[str],
                                         drop_results:bool=True)

Remove estimators. This is the recommended way of removing an estimator (as opposed to modifying pipelines directly), since it also removes the associated rows from the results tables.

Type Default Details
estimator_names Sequence[str] Estimators to remove.
drop_results bool True Whether to remove the results associated with the estimators. Default True.
Returns PoniardBaseEstimator Self.
pnd.add_estimators(ExtraTreesClassifier())
pnd.remove_estimators("RandomForestClassifier")
pnd.fit()
pnd.get_results()
test_roc_auc test_accuracy test_precision test_recall test_f1 fit_time score_time
LogisticRegression 0.995456 0.978916 0.975411 0.991549 0.983351 0.007645 0.002424
SVC 0.994139 0.975408 0.975111 0.985955 0.980477 0.008037 0.003919
HistGradientBoostingClassifier 0.994128 0.970129 0.967263 0.985955 0.976433 0.539054 0.016192
XGBClassifier 0.994123 0.970129 0.967554 0.985915 0.976469 0.049444 0.004278
ExtraTreesClassifier 0.991055 0.968359 0.969925 0.980321 0.974955 0.042767 0.008918
GaussianNB 0.988730 0.929700 0.940993 0.949413 0.944300 0.003169 0.004466
KNeighborsClassifier 0.980610 0.964881 0.955018 0.991628 0.972746 0.002539 0.016843
DecisionTreeClassifier 0.920983 0.926223 0.941672 0.941080 0.941054 0.005269 0.002359
DummyClassifier 0.500000 0.627418 0.627418 1.000000 0.771052 0.001970 0.002919
pnd.pipelines.keys()
dict_keys(['LogisticRegression', 'GaussianNB', 'SVC', 'KNeighborsClassifier', 'DecisionTreeClassifier', 'HistGradientBoostingClassifier', 'XGBClassifier', 'DummyClassifier', 'ExtraTreesClassifier'])

Modifying the preprocessor after initializaiton

The preprocessor can be modified from within PoniardBaseEstimator in two ways after PoniardBaseEstimator.setup:

  1. reassign_types so that features are processed by other transformers, i.e., a numeric feature could be cast to a high cardinality categorical (for example, a store ID).
  2. add_preprocessing_step adds a transformer or pipeline to the existing preprocessor.

See the Preprocessing guide for examples.


source

PoniardBaseEstimator.reassign_types

 PoniardBaseEstimator.reassign_types
                                      (numeric:Optional[List[Union[str,int
                                      ]]]=None, categorical_high:Optional[
                                      List[Union[str,int]]]=None, categori
                                      cal_low:Optional[List[Union[str,int]
                                      ]]=None, datetime:Optional[List[Unio
                                      n[str,int]]]=None,
                                      keep_remainder:bool=True)

Reassign feature types. By default, leaves ommitted features as they were.

Type Default Details
numeric Optional[List[Union[str, int]]] None List of column names or indices. Default None.
categorical_high Optional[List[Union[str, int]]] None List of column names or indices. Default None.
categorical_low Optional[List[Union[str, int]]] None List of column names or indices. Default None.
datetime Optional[List[Union[str, int]]] None List of column names or indices. Default None.
keep_remainder bool True Whether to keep features not specified in the method parameters
as is or drop them
Returns PoniardBaseEstimator self.

source

PoniardBaseEstimator.add_preprocessing_step

 PoniardBaseEstimator.add_preprocessing_step (step:Union[sklearn.pipeline.
                                              Pipeline,sklearn.base.Transf
                                              ormerMixin,sklearn.compose._
                                              column_transformer.ColumnTra
                                              nsformer,Tuple[str,Union[skl
                                              earn.pipeline.Pipeline,sklea
                                              rn.base.TransformerMixin,skl
                                              earn.compose._column_transfo
                                              rmer.ColumnTransformer]]], p
                                              osition:Union[str,int]='end'
                                              )

Add a preprocessing step.

Type Default Details
step Union[Union[Pipeline, TransformerMixin, ColumnTransformer], Tuple[str, Union[Pipeline, TransformerMixin, ColumnTransformer]]] A tuple of (str, transformer) or a scikit-learn transformer. Note that
the transformer can also be a Pipeline or ColumnTransformer.
position Union[str, int] end Either an integer denoting before which step in the existing preprocessing pipeline
the new step should be added, or ‘start’ or ‘end’.
Returns Pipeline self

Prediction methods

Cross validated predictions (using scikit-learn’s cross_val_predict) can be obtained by calling the predict, predict_proba, decision_function or predict_all methods. Each of them takes an estimator_names parameter that specifies which models should be used.

Estimators without specific prediction methods

Not every scikit-learn model includes the predict_proba method. Poniard will set them to numpy.nan instead of throwing an error.


source

PoniardBaseEstimator.predict

 PoniardBaseEstimator.predict
                               (estimator_names:Optional[Sequence[str]]=No
                               ne)

Get cross validated target predictions where each sample belongs to a single test set.

Type Default Details
estimator_names Optional[Sequence[str]] None Estimators to include. If None, predict all estimators.
Returns Dict[str, np.ndarray] Dict where keys are estimator names and values are numpy arrays of predictions.

source

PoniardBaseEstimator.predict_proba

 PoniardBaseEstimator.predict_proba
                                     (estimator_names:Optional[Sequence[st
                                     r]]=None)

Get cross validated target probability predictions where each sample belongs to a single test set.

Type Default Details
estimator_names Optional[Sequence[str]] None
Returns Dict[str, np.ndarray] Dict where keys are estimator names and values are numpy arrays of prediction
probabilities.

source

PoniardBaseEstimator.decision_function

 PoniardBaseEstimator.decision_function
                                         (estimator_names:Optional[Sequenc
                                         e[str]]=None)

Get cross validated decision function predictions where each sample belongs to a single test set.

Type Default Details
estimator_names Optional[Sequence[str]] None Estimators to include. If None, predict all estimators.
Returns Dict[str, np.ndarray] Dict where keys are estimator names and values are numpy arrays of decision functions.

source

PoniardBaseEstimator.predict_all

 PoniardBaseEstimator.predict_all
                                   (estimator_names:Optional[Sequence[str]
                                   ]=None)

Get cross validated target predictions, probabilities and decision functions where each sample belongs to a test set.

Type Default Details
estimator_names Optional[Sequence[str]] None Estimators to include. If None, predict all estimators.
Returns Tuple[Dict[str, np.ndarray]] Tuple of dicts where keys are estimator names and values are numpy arrays of
predictions.

Ensembles and hyperparameter tuning


source

PoniardBaseEstimator.build_ensemble

 PoniardBaseEstimator.build_ensemble (method:str='stacking',
                                      estimator_names:Optional[Sequence[st
                                      r]]=None, top_n:Optional[int]=3,
                                      sort_by:Optional[str]=None,
                                      ensemble_name:Optional[str]=None,
                                      **kwargs)

Combine estimators into an ensemble.

By default, orders estimators according to the first metric.

Type Default Details
method str stacking Ensemble method. Either “stacking” or “voting”. Default “stacking”.
estimator_names Optional[Sequence[str]] None Names of estimators to include. Default None, which uses top_n
top_n Optional[int] 3 How many of the best estimators to include.
sort_by Optional[str] None Which metric to consider for ordering results. Default None, which uses the first metric.
ensemble_name Optional[str] None Ensemble name when adding to pipelines. Default None.
kwargs Passed to the ensemble class constructor.
Returns PoniardBaseEstimator Self.
pnd.build_ensemble(
    method="stacking",
    estimator_names=["DecisionTreeClassifier", "KNeighborsClassifier", "SVC"],
)
pnd.get_estimator("StackingClassifier")
Pipeline(steps=[('preprocessor',
                 Pipeline(steps=[('type_preprocessor',
                                  Pipeline(steps=[('numeric_imputer',
                                                   SimpleImputer()),
                                                  ('scaler',
                                                   StandardScaler())])),
                                 ('remove_invariant', VarianceThreshold())],
                          verbose=0)),
                ('StackingClassifier',
                 StackingClassifier(cv=StratifiedKFold(n_splits=5, random_state=0, shuffle=True),
                                    estimators=[('DecisionTreeClassifier',
                                                 DecisionTreeClassifier(random_state=0)),
                                                ('KNeighborsClassifier',
                                                 KNeighborsClassifier()),
                                                ('SVC',
                                                 SVC(kernel='linear',
                                                     probability=True,
                                                     random_state=0,
                                                     verbose=0))]))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
pnd.fit()
pnd.get_results()
test_roc_auc test_accuracy test_precision test_recall test_f1 fit_time score_time
LogisticRegression 0.995456 0.978916 0.975411 0.991549 0.983351 0.007645 0.002424
SVC 0.994139 0.975408 0.975111 0.985955 0.980477 0.008037 0.003919
HistGradientBoostingClassifier 0.994128 0.970129 0.967263 0.985955 0.976433 0.539054 0.016192
XGBClassifier 0.994123 0.970129 0.967554 0.985915 0.976469 0.049444 0.004278
StackingClassifier 0.993999 0.973653 0.967485 0.991588 0.979308 0.053218 0.005176
ExtraTreesClassifier 0.991055 0.968359 0.969925 0.980321 0.974955 0.042767 0.008918
GaussianNB 0.988730 0.929700 0.940993 0.949413 0.944300 0.003169 0.004466
KNeighborsClassifier 0.980610 0.964881 0.955018 0.991628 0.972746 0.002539 0.016843
DecisionTreeClassifier 0.920983 0.926223 0.941672 0.941080 0.941054 0.005269 0.002359
DummyClassifier 0.500000 0.627418 0.627418 1.000000 0.771052 0.001970 0.002919

Use get_predictions_similarity to compute how correlated the estimators’ predictions are. This can be useful for building ensembles with PoniardBaseEstimator.build_ensemble.


source

PoniardBaseEstimator.get_predictions_similarity

 PoniardBaseEstimator.get_predictions_similarity (on_errors:bool=True)

Compute correlation/association between cross validated predictions for each estimator.

This can be useful for ensembling.

Type Default Details
on_errors bool True Whether to compute similarity on prediction errors instead of predictions. Default
True.
Returns pd.DataFrame Similarity.
pnd.get_predictions_similarity()
LogisticRegression GaussianNB SVC KNeighborsClassifier DecisionTreeClassifier HistGradientBoostingClassifier XGBClassifier ExtraTreesClassifier StackingClassifier
LogisticRegression 1.000000 0.315978 0.726194 0.401876 0.211925 0.367325 0.294833 0.426033 0.547327
GaussianNB 0.315978 1.000000 0.331160 0.524911 0.354022 0.454955 0.495528 0.518582 0.489758
SVC 0.726194 0.331160 1.000000 0.368042 0.277664 0.403438 0.336311 0.390700 0.574735
KNeighborsClassifier 0.401876 0.524911 0.368042 1.000000 0.363762 0.497702 0.497702 0.482094 0.712582
DecisionTreeClassifier 0.211925 0.354022 0.277664 0.363762 1.000000 0.362908 0.521706 0.427338 0.392178
HistGradientBoostingClassifier 0.367325 0.454955 0.403438 0.497702 0.362908 1.000000 0.726570 0.645759 0.582252
XGBClassifier 0.294833 0.495528 0.336311 0.497702 0.521706 0.726570 1.000000 0.704906 0.517572
ExtraTreesClassifier 0.426033 0.518582 0.390700 0.482094 0.427338 0.645759 0.704906 1.000000 0.564618
StackingClassifier 0.547327 0.489758 0.574735 0.712582 0.392178 0.582252 0.517572 0.564618 1.000000

Poniard offers light hyperparameter tuning through tune_estimator, as well as hyperparameter grids for its default estimators. You are however free to specify whichever grid you want.


source

PoniardBaseEstimator.tune_estimator

 PoniardBaseEstimator.tune_estimator (estimator_name:str,
                                      grid:Optional[Dict]=None,
                                      mode:str='grid', tuned_estimator_nam
                                      e:Optional[str]=None, **kwargs)

Hyperparameter tuning for a single estimator.

Type Default Details
estimator_name str Estimator to tune.
grid Optional[Dict] None Hyperparameter grid. Default None, which uses the grids available for default
estimators.
mode str grid Type of search. Eitherr “grid”, “halving” or “random”. Default “grid”.
tuned_estimator_name Optional[str] None Estimator name when adding to pipelines. Default None.
kwargs Passed to the search class constructor.
Returns Union[GridSearchCV, RandomizedSearchCV] Self.