Error analysis

Understand your model’s shortcomings.

Inspecting which classes or target ranges a model struggles with the most is a vital step in the model building iterative process. ErrorAnalyzer aims at streamlining this process.


source

ErrorAnalyzer

 ErrorAnalyzer (task:str)

An error analyzer for predictive models.

Compare ground truth and predicted target and rank the largest deviations (either by probabilities for classifiers and actual values for regressors).

This class is tightly integrated with PoniardBaseEstimator, but does not require it.

Type Details
task str The machine learning task. Either ‘regression’ or ‘classification’.

source

ErrorAnalyzer.from_poniard

 ErrorAnalyzer.from_poniard
                             (poniard:poniard.estimators.core.PoniardBaseE
                             stimator,
                             estimator_names:Union[str,Sequence[str]])

Use a Poniard instance to instantiate ErrorAnalyzer.

Automatically sets the task and gives access to the underlying data.

Type Details
poniard PoniardBaseEstimator A PoniardClassifier or PoniardRegressor instance.
estimator_names typing.Union[str, typing.Sequence[str]] Array of estimators for which to compute errors.
Returns An instance of the class.

source

ErrorAnalyzer.rank_errors

 ErrorAnalyzer.rank_errors (y:Union[numpy.ndarray,pandas.core.series.Serie
                            s,pandas.core.frame.DataFrame,NoneType]=None, 
                            predictions:Union[numpy.ndarray,pandas.core.se
                            ries.Series,pandas.core.frame.DataFrame,NoneTy
                            pe]=None, probas:Union[numpy.ndarray,pandas.co
                            re.series.Series,pandas.core.frame.DataFrame,N
                            oneType]=None, exclude_correct:bool=True)

Compare the y ground truth with predictions and probas and sort by the largest deviations.

If ErrorAnalyzer.from_poniard was used, no data needs to be passed to this method.

In this context “error” refers to:

  • misclassified samples in binary and multiclass problems.
  • misclassified samples in any of the labels for multilabel problems.
  • samples with predicted values outside the truth - 1SD <-> truth + 1SD range for regression problems.
  • samples with predicted values outside the truth - 1SD <-> truth + 1SD range in any of the targets for multioutput regression problems.
Type Default Details
y typing.Union[numpy.ndarray, pandas.core.series.Series, pandas.core.frame.DataFrame, NoneType] None Ground truth target.
predictions typing.Union[numpy.ndarray, pandas.core.series.Series, pandas.core.frame.DataFrame, NoneType] None Predicted target.
probas typing.Union[numpy.ndarray, pandas.core.series.Series, pandas.core.frame.DataFrame, NoneType] None Predicted probabilities for each class in classification tasks.
exclude_correct bool True Whether to exclude correctly predicted samples in the output ranking. Default True.
Returns Dict Ranked errors

ErrorAnalyzer.rank_errors works for simple classification…

from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier

from poniard import PoniardClassifier
x, y = load_breast_cancer(return_X_y=True, as_frame=True)
pnd = (
    PoniardClassifier(estimators=[KNeighborsClassifier(), LogisticRegression()])
    .setup(x, y, show_info=False)
    .fit()
)
error_analysis = ErrorAnalyzer.from_poniard(
    pnd, ["KNeighborsClassifier", "LogisticRegression"]
)
ranked_errors = error_analysis.rank_errors()
ranked_errors["LogisticRegression"]["values"]
y prediction proba_0 proba_1 error
297 0 1 0.002394 0.997606 0.997606
73 0 1 0.060207 0.939793 0.939793
40 0 1 0.062019 0.937981 0.937981
135 0 1 0.115223 0.884777 0.884777
190 0 1 0.215570 0.784430 0.784430
263 0 1 0.278617 0.721383 0.721383
68 1 0 0.694271 0.305729 0.694271
213 0 1 0.344298 0.655702 0.655702
146 0 1 0.397514 0.602486 0.602486
541 1 0 0.585451 0.414549 0.585451
363 1 0 0.524384 0.475616 0.524384
255 0 1 0.490481 0.509519 0.509519

As well as more complicated setups, such as multioutput regression.

from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.multioutput import MultiOutputRegressor

from poniard import PoniardRegressor
from poniard.preprocessing import PoniardPreprocessor
x, y = make_regression(
    n_samples=1000,
    n_features=10,
    n_targets=2,
    n_informative=3,
    noise=50,
    random_state=0,
)
x += np.random.normal()
prep = PoniardPreprocessor(numeric_threshold=10)
pnd = (
    PoniardRegressor(
        estimators={
            "lr": MultiOutputRegressor(LinearRegression()),
            "knn": MultiOutputRegressor(KNeighborsRegressor()),
        },
        custom_preprocessor=prep,
    )
    .setup(x, y, show_info=False)
    .fit()
)
error_analysis = ErrorAnalyzer.from_poniard(pnd, ["lr", "knn"])
ranked_errors = error_analysis.rank_errors()
ranked_errors["knn"]["values"]
/Users/rafxavier/Documents/Repos/personal/poniard/poniard/preprocessing/core.py:145: UserWarning: TargetEncoder is not supported for multilabel or multioutput targets. Switching to OrdinalEncoder.
  ) = self._setup_transformers()
y_0 y_1 prediction_0 prediction_1 error_0 error_1 error
679 -285.183722 -361.210314 -83.567331 -174.502727 201.616391 186.707587 194.161989
580 -206.914531 -316.893779 -68.562490 -119.157106 138.352042 197.736673 168.044357
466 -193.559624 -270.201613 -47.648875 -83.285570 145.910749 186.916043 166.413396
543 166.559570 307.797957 44.254690 110.607538 122.304880 197.190419 159.747649
110 199.955678 175.857068 0.293321 62.923251 199.662358 112.933817 156.298087
... ... ... ... ... ... ... ...
563 138.150171 34.382971 60.332156 27.100214 77.818015 7.282756 42.550386
911 -94.526886 -32.665129 -15.892835 -27.263218 78.634051 5.401912 42.017982
441 -6.393304 65.144850 73.175490 61.991206 79.568794 3.153644 41.361219
582 1.808688 -64.975064 -76.728970 -61.481938 78.537658 3.493126 41.015392
794 60.170481 -36.460608 -17.018031 -35.620963 77.188512 0.839645 39.014078

302 rows × 7 columns

It can also be used without Poniard.

from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression

from poniard import PoniardRegressor
from poniard.preprocessing import PoniardPreprocessor
x, y = make_regression(
    n_samples=2000,
    n_features=10,
    n_targets=1,
    n_informative=3,
    noise=50,
    random_state=0,
)
x += np.random.normal()
y_pred = y.copy()
y_pred[np.random.randint(0, y.shape[0], 50)] = np.random.normal()

error_analysis = ErrorAnalyzer(task="regression")
ranked_errors = error_analysis.rank_errors(y, y_pred)
ranked_errors["values"]
y prediction error
1050 -444.569706 -0.686456 443.883250
765 370.930877 -0.686456 371.617332
1228 263.743690 -0.686456 264.430145
1596 -256.521733 -0.686456 255.835277
1585 231.575997 -0.686456 232.262453
274 -224.219953 -0.686456 223.533497
1580 -214.900591 -0.686456 214.214136
379 -196.567862 -0.686456 195.881407
1376 188.622308 -0.686456 189.308763
87 -183.694218 -0.686456 183.007762
189 -165.369727 -0.686456 164.683272
780 162.247316 -0.686456 162.933772
655 -159.524622 -0.686456 158.838167
1421 155.527671 -0.686456 156.214126
277 -142.867757 -0.686456 142.181301

source

ErrorAnalyzer.merge_errors

 ErrorAnalyzer.merge_errors (errors:Dict[str,Dict[str,Union[pandas.core.fr
                             ame.DataFrame,pandas.core.series.Series]]])

Merge multiple error rankings. This is particularly useful when evaluating multiple estimators.

Compute how many estimators had the specific error and the average error between them.

This method works best when using ErrorAnalyzer.from_poniard, since errors can be the output of ErrorAnalyzer.rank_errors. However, this is not required; as long as errors is properly defined ({str: {str: pd.DataFrame, str: pd.Series}})

Type Details
errors typing.Dict[str, typing.Dict[str, typing.Union[pandas.core.frame.DataFrame, pandas.core.series.Series]]] Dictionary of errors and error indexes.
Returns Dict Merged errors
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, HistGradientBoostingClassifier


from poniard import PoniardClassifier
x, y = load_iris(return_X_y=True, as_frame=True)
pnd = (
    PoniardClassifier(
        estimators=[
            LogisticRegression(),
            RandomForestClassifier(),
            HistGradientBoostingClassifier(),
        ]
    )
    .setup(x, y, show_info=False)
    .fit()
)
error_analysis = ErrorAnalyzer.from_poniard(
    pnd,
    ["RandomForestClassifier", "LogisticRegression", "HistGradientBoostingClassifier"],
)
ranked_errors = error_analysis.rank_errors()
merged_errors = error_analysis.merge_errors(ranked_errors)
merged_errors["values"]
mean_error freq estimators
index
106 0.848222 3 [RandomForestClassifier, LogisticRegression, H...
70 0.842455 3 [RandomForestClassifier, LogisticRegression, H...
77 0.824483 3 [RandomForestClassifier, LogisticRegression, H...
119 0.817890 3 [RandomForestClassifier, LogisticRegression, H...
133 0.741060 3 [RandomForestClassifier, LogisticRegression, H...
83 0.909402 2 [RandomForestClassifier, HistGradientBoostingC...
72 0.793324 2 [RandomForestClassifier, HistGradientBoostingC...
129 0.763608 2 [RandomForestClassifier, HistGradientBoostingC...
138 0.610000 1 [RandomForestClassifier]
134 0.529012 1 [LogisticRegression]

source

ErrorAnalyzer.analyze_target

 ErrorAnalyzer.analyze_target (errors_idx:pandas.core.series.Series, y:Uni
                               on[numpy.ndarray,pandas.core.series.Series,
                               pandas.core.frame.DataFrame,NoneType]=None,
                               reg_bins:int=5, as_ratio:bool=False,
                               wrt_target:bool=False)

Analyze which target classes/ranges have the most errors and compare with observed target distribution.

Type Default Details
errors_idx Series Index of ranked errors.
y typing.Union[numpy.ndarray, pandas.core.series.Series, pandas.core.frame.DataFrame, NoneType] None Ground truth. Not needed if using ErrorAnalyzer.from_poniard.
reg_bins int 5 Number of bins in which to place ground truth targets for regression tasks.
as_ratio bool False Whether to show error ratios instead of error counts per class/bin. Default False.
wrt_target bool False Whether to compute counts of errors or error ratios with respect
to the ground truth. Default False.
Returns pd.DataFrame Counts per error.
from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression

from poniard import PoniardRegressor
x, y = load_diabetes(return_X_y=True, as_frame=True)
pnd = PoniardRegressor(estimators=LinearRegression()).setup(x, y, show_info=False).fit()
error_analysis = ErrorAnalyzer.from_poniard(pnd, ["LinearRegression"])
ranked_errors = error_analysis.rank_errors()

error_analysis.analyze_target(ranked_errors["LinearRegression"]["idx"])
0_errors 0_target
bins
(232.0, 346.0] 33 88
(24.999, 77.0] 18 91
(77.0, 115.0] 9 87
(168.0, 232.0] 9 87
(115.0, 168.0] 5 89
error_analysis.analyze_target(ranked_errors["LinearRegression"]["idx"], wrt_target=True)
bins
(232.0, 346.0]    0.375000
(24.999, 77.0]    0.197802
(77.0, 115.0]     0.103448
(168.0, 232.0]    0.103448
(115.0, 168.0]    0.056180
dtype: float64

source

ErrorAnalyzer.analyze_features

 ErrorAnalyzer.analyze_features (errors_idx:pandas.core.series.Series, X:U
                                 nion[numpy.ndarray,pandas.core.series.Ser
                                 ies,pandas.core.frame.DataFrame,NoneType]
                                 =None, features:Optional[Sequence[Union[s
                                 tr,int]]]=None, estimator_name:Union[str,
                                 sklearn.base.BaseEstimator,NoneType]=None
                                 , n_features:Union[int,float,NoneType]=No
                                 ne)

Cross tabulate features with prediction errors.

Type Default Details
errors_idx Series Index of ranked errors.
X typing.Union[numpy.ndarray, pandas.core.series.Series, pandas.core.frame.DataFrame, NoneType] None Features array. Not needed if using ErrorAnalyzer.from_poniard.
features typing.Optional[typing.Sequence[typing.Union[str, int]]] None Array of features to analyze. If None, all features will be analyzed.
estimator_name typing.Union[str, sklearn.base.BaseEstimator, NoneType] None Only valid if using ErrorAnalyzer.from_poniard. Allows using an estimator to
compute permutation importances and analyzing only the top n_features.
n_features typing.Union[int, float, NoneType] None How many features to analyze based on permutation importances.
Returns Dict[str, pd.DataFrame] Per feature summary.
error_analysis.analyze_features(ranked_errors["LinearRegression"]["idx"])["age"]
count mean std min 25% 50% 75% max
error
0 368.0 0.000467 0.047873 -0.107226 -0.035483 0.005383 0.038076 0.110727
1 74.0 -0.002324 0.046585 -0.096328 -0.037299 0.005383 0.026270 0.110727