from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from poniard import PoniardClassifier
Error analysis
Inspecting which classes or target ranges a model struggles with the most is a vital step in the model building iterative process. ErrorAnalyzer
aims at streamlining this process.
ErrorAnalyzer
ErrorAnalyzer (task:str)
An error analyzer for predictive models.
Compare ground truth and predicted target and rank the largest deviations (either by probabilities for classifiers and actual values for regressors).
This class is tightly integrated with PoniardBaseEstimator
, but does not require it.
Type | Details | |
---|---|---|
task | str | The machine learning task. Either ‘regression’ or ‘classification’. |
ErrorAnalyzer.from_poniard
ErrorAnalyzer.from_poniard (poniard:poniard.estimators.core.PoniardBaseE stimator, estimator_names:Union[str,Sequence[str]])
Use a Poniard instance to instantiate ErrorAnalyzer
.
Automatically sets the task and gives access to the underlying data.
Type | Details | |
---|---|---|
poniard | PoniardBaseEstimator | A PoniardClassifier or PoniardRegressor instance. |
estimator_names | typing.Union[str, typing.Sequence[str]] | Array of estimators for which to compute errors. |
Returns | An instance of the class. |
ErrorAnalyzer.rank_errors
ErrorAnalyzer.rank_errors (y:Union[numpy.ndarray,pandas.core.series.Serie s,pandas.core.frame.DataFrame,NoneType]=None, predictions:Union[numpy.ndarray,pandas.core.se ries.Series,pandas.core.frame.DataFrame,NoneTy pe]=None, probas:Union[numpy.ndarray,pandas.co re.series.Series,pandas.core.frame.DataFrame,N oneType]=None, exclude_correct:bool=True)
Compare the y
ground truth with predictions
and probas
and sort by the largest deviations.
If ErrorAnalyzer.from_poniard
was used, no data needs to be passed to this method.
In this context “error” refers to:
- misclassified samples in binary and multiclass problems.
- misclassified samples in any of the labels for multilabel problems.
- samples with predicted values outside the
truth - 1SD <-> truth + 1SD
range for regression problems. - samples with predicted values outside the
truth - 1SD <-> truth + 1SD
range in any of the targets for multioutput regression problems.
Type | Default | Details | |
---|---|---|---|
y | typing.Union[numpy.ndarray, pandas.core.series.Series, pandas.core.frame.DataFrame, NoneType] | None | Ground truth target. |
predictions | typing.Union[numpy.ndarray, pandas.core.series.Series, pandas.core.frame.DataFrame, NoneType] | None | Predicted target. |
probas | typing.Union[numpy.ndarray, pandas.core.series.Series, pandas.core.frame.DataFrame, NoneType] | None | Predicted probabilities for each class in classification tasks. |
exclude_correct | bool | True | Whether to exclude correctly predicted samples in the output ranking. Default True. |
Returns | Dict | Ranked errors |
ErrorAnalyzer.rank_errors
works for simple classification…
= load_breast_cancer(return_X_y=True, as_frame=True)
x, y = (
pnd =[KNeighborsClassifier(), LogisticRegression()])
PoniardClassifier(estimators=False)
.setup(x, y, show_info
.fit()
)= ErrorAnalyzer.from_poniard(
error_analysis "KNeighborsClassifier", "LogisticRegression"]
pnd, [
)= error_analysis.rank_errors()
ranked_errors "LogisticRegression"]["values"] ranked_errors[
y | prediction | proba_0 | proba_1 | error | |
---|---|---|---|---|---|
297 | 0 | 1 | 0.002394 | 0.997606 | 0.997606 |
73 | 0 | 1 | 0.060207 | 0.939793 | 0.939793 |
40 | 0 | 1 | 0.062019 | 0.937981 | 0.937981 |
135 | 0 | 1 | 0.115223 | 0.884777 | 0.884777 |
190 | 0 | 1 | 0.215570 | 0.784430 | 0.784430 |
263 | 0 | 1 | 0.278617 | 0.721383 | 0.721383 |
68 | 1 | 0 | 0.694271 | 0.305729 | 0.694271 |
213 | 0 | 1 | 0.344298 | 0.655702 | 0.655702 |
146 | 0 | 1 | 0.397514 | 0.602486 | 0.602486 |
541 | 1 | 0 | 0.585451 | 0.414549 | 0.585451 |
363 | 1 | 0 | 0.524384 | 0.475616 | 0.524384 |
255 | 0 | 1 | 0.490481 | 0.509519 | 0.509519 |
As well as more complicated setups, such as multioutput regression.
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.multioutput import MultiOutputRegressor
from poniard import PoniardRegressor
from poniard.preprocessing import PoniardPreprocessor
= make_regression(
x, y =1000,
n_samples=10,
n_features=2,
n_targets=3,
n_informative=50,
noise=0,
random_state
)+= np.random.normal()
x = PoniardPreprocessor(numeric_threshold=10)
prep = (
pnd
PoniardRegressor(={
estimators"lr": MultiOutputRegressor(LinearRegression()),
"knn": MultiOutputRegressor(KNeighborsRegressor()),
},=prep,
custom_preprocessor
)=False)
.setup(x, y, show_info
.fit()
)= ErrorAnalyzer.from_poniard(pnd, ["lr", "knn"])
error_analysis = error_analysis.rank_errors()
ranked_errors "knn"]["values"] ranked_errors[
/Users/rafxavier/Documents/Repos/personal/poniard/poniard/preprocessing/core.py:145: UserWarning: TargetEncoder is not supported for multilabel or multioutput targets. Switching to OrdinalEncoder.
) = self._setup_transformers()
y_0 | y_1 | prediction_0 | prediction_1 | error_0 | error_1 | error | |
---|---|---|---|---|---|---|---|
679 | -285.183722 | -361.210314 | -83.567331 | -174.502727 | 201.616391 | 186.707587 | 194.161989 |
580 | -206.914531 | -316.893779 | -68.562490 | -119.157106 | 138.352042 | 197.736673 | 168.044357 |
466 | -193.559624 | -270.201613 | -47.648875 | -83.285570 | 145.910749 | 186.916043 | 166.413396 |
543 | 166.559570 | 307.797957 | 44.254690 | 110.607538 | 122.304880 | 197.190419 | 159.747649 |
110 | 199.955678 | 175.857068 | 0.293321 | 62.923251 | 199.662358 | 112.933817 | 156.298087 |
... | ... | ... | ... | ... | ... | ... | ... |
563 | 138.150171 | 34.382971 | 60.332156 | 27.100214 | 77.818015 | 7.282756 | 42.550386 |
911 | -94.526886 | -32.665129 | -15.892835 | -27.263218 | 78.634051 | 5.401912 | 42.017982 |
441 | -6.393304 | 65.144850 | 73.175490 | 61.991206 | 79.568794 | 3.153644 | 41.361219 |
582 | 1.808688 | -64.975064 | -76.728970 | -61.481938 | 78.537658 | 3.493126 | 41.015392 |
794 | 60.170481 | -36.460608 | -17.018031 | -35.620963 | 77.188512 | 0.839645 | 39.014078 |
302 rows × 7 columns
It can also be used without Poniard.
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from poniard import PoniardRegressor
from poniard.preprocessing import PoniardPreprocessor
= make_regression(
x, y =2000,
n_samples=10,
n_features=1,
n_targets=3,
n_informative=50,
noise=0,
random_state
)+= np.random.normal()
x = y.copy()
y_pred 0, y.shape[0], 50)] = np.random.normal()
y_pred[np.random.randint(
= ErrorAnalyzer(task="regression")
error_analysis = error_analysis.rank_errors(y, y_pred)
ranked_errors "values"] ranked_errors[
y | prediction | error | |
---|---|---|---|
1050 | -444.569706 | -0.686456 | 443.883250 |
765 | 370.930877 | -0.686456 | 371.617332 |
1228 | 263.743690 | -0.686456 | 264.430145 |
1596 | -256.521733 | -0.686456 | 255.835277 |
1585 | 231.575997 | -0.686456 | 232.262453 |
274 | -224.219953 | -0.686456 | 223.533497 |
1580 | -214.900591 | -0.686456 | 214.214136 |
379 | -196.567862 | -0.686456 | 195.881407 |
1376 | 188.622308 | -0.686456 | 189.308763 |
87 | -183.694218 | -0.686456 | 183.007762 |
189 | -165.369727 | -0.686456 | 164.683272 |
780 | 162.247316 | -0.686456 | 162.933772 |
655 | -159.524622 | -0.686456 | 158.838167 |
1421 | 155.527671 | -0.686456 | 156.214126 |
277 | -142.867757 | -0.686456 | 142.181301 |
ErrorAnalyzer.merge_errors
ErrorAnalyzer.merge_errors (errors:Dict[str,Dict[str,Union[pandas.core.fr ame.DataFrame,pandas.core.series.Series]]])
Merge multiple error rankings. This is particularly useful when evaluating multiple estimators.
Compute how many estimators had the specific error and the average error between them.
This method works best when using ErrorAnalyzer.from_poniard
, since errors
can be the output of ErrorAnalyzer.rank_errors
. However, this is not required; as long as errors
is properly defined ({str: {str: pd.DataFrame, str: pd.Series}}
)
Type | Details | |
---|---|---|
errors | typing.Dict[str, typing.Dict[str, typing.Union[pandas.core.frame.DataFrame, pandas.core.series.Series]]] | Dictionary of errors and error indexes. |
Returns | Dict | Merged errors |
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, HistGradientBoostingClassifier
from poniard import PoniardClassifier
= load_iris(return_X_y=True, as_frame=True)
x, y = (
pnd
PoniardClassifier(=[
estimators
LogisticRegression(),
RandomForestClassifier(),
HistGradientBoostingClassifier(),
]
)=False)
.setup(x, y, show_info
.fit()
)= ErrorAnalyzer.from_poniard(
error_analysis
pnd,"RandomForestClassifier", "LogisticRegression", "HistGradientBoostingClassifier"],
[
)= error_analysis.rank_errors()
ranked_errors = error_analysis.merge_errors(ranked_errors)
merged_errors "values"] merged_errors[
mean_error | freq | estimators | |
---|---|---|---|
index | |||
106 | 0.848222 | 3 | [RandomForestClassifier, LogisticRegression, H... |
70 | 0.842455 | 3 | [RandomForestClassifier, LogisticRegression, H... |
77 | 0.824483 | 3 | [RandomForestClassifier, LogisticRegression, H... |
119 | 0.817890 | 3 | [RandomForestClassifier, LogisticRegression, H... |
133 | 0.741060 | 3 | [RandomForestClassifier, LogisticRegression, H... |
83 | 0.909402 | 2 | [RandomForestClassifier, HistGradientBoostingC... |
72 | 0.793324 | 2 | [RandomForestClassifier, HistGradientBoostingC... |
129 | 0.763608 | 2 | [RandomForestClassifier, HistGradientBoostingC... |
138 | 0.610000 | 1 | [RandomForestClassifier] |
134 | 0.529012 | 1 | [LogisticRegression] |
ErrorAnalyzer.analyze_target
ErrorAnalyzer.analyze_target (errors_idx:pandas.core.series.Series, y:Uni on[numpy.ndarray,pandas.core.series.Series, pandas.core.frame.DataFrame,NoneType]=None, reg_bins:int=5, as_ratio:bool=False, wrt_target:bool=False)
Analyze which target classes/ranges have the most errors and compare with observed target distribution.
Type | Default | Details | |
---|---|---|---|
errors_idx | Series | Index of ranked errors. | |
y | typing.Union[numpy.ndarray, pandas.core.series.Series, pandas.core.frame.DataFrame, NoneType] | None | Ground truth. Not needed if using ErrorAnalyzer.from_poniard . |
reg_bins | int | 5 | Number of bins in which to place ground truth targets for regression tasks. |
as_ratio | bool | False | Whether to show error ratios instead of error counts per class/bin. Default False. |
wrt_target | bool | False | Whether to compute counts of errors or error ratios with respect to the ground truth. Default False. |
Returns | pd.DataFrame | Counts per error. |
from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression
from poniard import PoniardRegressor
= load_diabetes(return_X_y=True, as_frame=True)
x, y = PoniardRegressor(estimators=LinearRegression()).setup(x, y, show_info=False).fit()
pnd = ErrorAnalyzer.from_poniard(pnd, ["LinearRegression"])
error_analysis = error_analysis.rank_errors()
ranked_errors
"LinearRegression"]["idx"]) error_analysis.analyze_target(ranked_errors[
0_errors | 0_target | |
---|---|---|
bins | ||
(232.0, 346.0] | 33 | 88 |
(24.999, 77.0] | 18 | 91 |
(77.0, 115.0] | 9 | 87 |
(168.0, 232.0] | 9 | 87 |
(115.0, 168.0] | 5 | 89 |
"LinearRegression"]["idx"], wrt_target=True) error_analysis.analyze_target(ranked_errors[
bins
(232.0, 346.0] 0.375000
(24.999, 77.0] 0.197802
(77.0, 115.0] 0.103448
(168.0, 232.0] 0.103448
(115.0, 168.0] 0.056180
dtype: float64
ErrorAnalyzer.analyze_features
ErrorAnalyzer.analyze_features (errors_idx:pandas.core.series.Series, X:U nion[numpy.ndarray,pandas.core.series.Ser ies,pandas.core.frame.DataFrame,NoneType] =None, features:Optional[Sequence[Union[s tr,int]]]=None, estimator_name:Union[str, sklearn.base.BaseEstimator,NoneType]=None , n_features:Union[int,float,NoneType]=No ne)
Cross tabulate features with prediction errors.
Type | Default | Details | |
---|---|---|---|
errors_idx | Series | Index of ranked errors. | |
X | typing.Union[numpy.ndarray, pandas.core.series.Series, pandas.core.frame.DataFrame, NoneType] | None | Features array. Not needed if using ErrorAnalyzer.from_poniard . |
features | typing.Optional[typing.Sequence[typing.Union[str, int]]] | None | Array of features to analyze. If None , all features will be analyzed. |
estimator_name | typing.Union[str, sklearn.base.BaseEstimator, NoneType] | None | Only valid if using ErrorAnalyzer.from_poniard . Allows using an estimator tocompute permutation importances and analyzing only the top n_features . |
n_features | typing.Union[int, float, NoneType] | None | How many features to analyze based on permutation importances. |
Returns | Dict[str, pd.DataFrame] | Per feature summary. |
"LinearRegression"]["idx"])["age"] error_analysis.analyze_features(ranked_errors[
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
error | ||||||||
0 | 368.0 | 0.000467 | 0.047873 | -0.107226 | -0.035483 | 0.005383 | 0.038076 | 0.110727 |
1 | 74.0 | -0.002324 | 0.046585 | -0.096328 | -0.037299 | 0.005383 | 0.026270 | 0.110727 |