# Version 0.22.0¶

**In Development**

## Changed models¶

The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.

`cluster.KMeans`

when`n_jobs=1`

. Fix`decomposition.SparseCoder`

,`decomposition.DictionaryLearning`

, and`decomposition.MiniBatchDictionaryLearning`

Fix`decomposition.SparseCoder`

with`algorithm='lasso_lars'`

Fix`decomposition.SparsePCA`

where`normalize_components`

has no effect due to deprecation.`ensemble.HistGradientBoostingClassifier`

and`ensemble.HistGradientBoostingRegressor`

Fix , Feature , Enhancement .`linear_model.Ridge`

when`X`

is sparse. Fix`model_selection.StratifiedKFold`

and any use of`cv=int`

with a classifier. Fix`impute.IterativeImputer`

when`X`

has features with no missing values. Feature

Details are listed in the changelog below.

(While we are trying to better inform users by providing this information, we cannot assure that this list is complete.)

## Changelog¶

`sklearn.base`

¶

API Change From version 0.24

`BaseEstimator.get_params`

will raise an AttributeError rather than return None for parameters that are in the estimator’s constructor but not stored as attributes on the instance. #14464 by Joel Nothman.

`sklearn.calibration`

¶

Fix Fixed a bug that made

`calibration.CalibratedClassifierCV`

fail when given a`sample_weight`

parameter of type`list`

(in the case where`sample_weights`

are not supported by the wrapped estimator). #13575 by William de Vazelhes.

`sklearn.compose`

¶

Fix Fixed a bug in

`compose.ColumnTransformer`

which failed to select the proper columns when using a boolean list, with NumPy older than 1.12. #14510 by Guillaume Lemaitre.Fix Fixed a bug in

`compose.TransformedTargetRegrssor`

which did not pass`**fit_params`

to the underlying regressor. #14890 by Miguel Cabrera.

`sklearn.datasets`

¶

Feature

`datasets.fetch_openml`

now supports heterogeneous data using pandas by setting`as_frame=True`

. #13902 by Thomas Fan.Enhancement The parameter

`return_X_y`

was added to`datasets.fetch_20newsgroups`

and`datasets.fetch_olivetti_faces`

. #14259 by Sourav Singh.Fix Fixed a bug in

`datasets.fetch_openml`

, which failed to load an OpenML dataset that contains an ignored feature. #14623 by Sarra Habchi.

`sklearn.decomposition`

¶

Fix

`decomposition.sparse_encode`

now passes the`max_iter`

to the underlying`LassoLars`

when`algorithm='lasso_lars'`

. #12650 by Adrin Jalali.Enhancement

`decomposition.dict_learning`

and`decomposition.dict_learning_online`

now accept`method_max_iter`

and pass it to`sparse_encode`

. #12650 by Adrin Jalali.Enhancement

`decomposition.SparseCoder`

,`decomposition.DictionaryLearning`

, and`decomposition.MiniBatchDictionaryLearning`

now take a`transform_max_iter`

parameter and pass it to either`decomposition.dict_learning`

or`decomposition.sparse_encode`

. #12650 by Adrin Jalali.Enhancement

`decomposition.IncrementalPCA`

now accepts sparse matrices as input, converting them to dense in batches thereby avoiding the need to store the entire dense matrix at once. #13960 by Scott Gigante.

`sklearn.dummy`

¶

Fix

`dummy.DummyClassifier`

now handles checking the existence of the provided constant in multiouput cases. #14908 by Martina G. Vilas.

`sklearn.ensemble`

¶

Many improvements were made to

`ensemble.HistGradientBoostingClassifier`

and`ensemble.HistGradientBoostingRegressor`

:Major Feature Estimators now natively support dense data with missing values both for training and predicting. They also support infinite values. #13911 and #14406 by Nicolas Hug, Adrin Jalali and Olivier Grisel.

Feature Estimators now have an additional

`warm_start`

parameter that enables warm starting. #14012 by Johann Faouzi.Enhancement for

`ensemble.HistGradientBoostingClassifier`

the training loss or score is now monitored on a class-wise stratified subsample to preserve the class balance of the original training set. #14194 by Johann Faouzi.Feature

`inspection.partial_dependence`

and`inspection.plot_partial_dependence`

now support the fast ‘recursion’ method for both estimators. #13769 by Nicolas Hug.Enhancement

`ensemble.HistGradientBoostingRegressor`

now supports the ‘least_absolute_deviation’ loss. #13896 by Nicolas Hug.Fix Estimators now bin the training and validation data separately to avoid any data leak. #13933 by Nicolas Hug.

Fix Fixed a bug where early stopping would break with string targets. #14710 by Guillaume Lemaitre.

Note that pickles from 0.21 will not work in 0.22.

Fix

`ensemble.VotingClassifier.predict_proba`

will no longer be present when`voting='hard'`

. #14287 by Thomas Fan.Fix Run by default

`utils.estimator_checks.check_estimator`

on both`ensemble.VotingClassifier`

and`ensemble.VotingRegressor`

. It leads to solve issues regarding shape consistency during`predict`

which was failing when the underlying estimators were not outputting consistent array dimensions. Note that it should be replaced by refactoring the common tests in the future. #14305 by Guillaume Lemaitre.Efficiency

`ensemble.MissingIndicator.fit_transform`

the _get_missing_features_info function is now called once when calling fit_transform for MissingIndicator class. #14356 by :user:`Harsh Soni <harsh020>`

.Fix

`ensemble.AdaBoostClassifier`

computes probabilities based on the decision function as in the literature. Thus,`predict`

and`predict_proba`

give consistent results. #14114 by Guillaume Lemaitre.

`sklearn.feature_extraction`

¶

Enhancement A warning will now be raised if a parameter choice means that another parameter will be unused on calling the fit() method for

`feature_extraction.text.HashingVectorizer`

,`feature_extraction.text.CountVectorizer`

and`feature_extraction.text.TfidfVectorizer`

. #14602 by Gaurav Chawla.Fix Functions created by build_preprocessor and build_analyzer of

`feature_extraction.text.VectorizerMixin`

can now be pickled. #14430 by Dillon Niederhut.API Change Deprecated unused

`copy`

param for :meth:`feature_extraction.text.TfidfVectorizer.transform`

it will be removed in v0.24. #14520 by Guillem G. Subies.

`sklearn.gaussian_process`

¶

Feature

`gaussian_process.GaussianProcessClassifier.log_marginal_likelihood`

and`gaussian_process.GaussianProcessRegressor.log_marginal_likelihood`

now accept a`clone_kernel=True`

keyword argument. When set to`False`

, the kernel attribute is modified, but may result in a performance improvement. #14378 by Masashi Shibata.API Change From version 0.24

`Kernel.get_params`

will raise an AttributeError rather than return None for parameters that are in the estimator’s constructor but not stored as attributes on the instance. #14464 by Joel Nothman.

`sklearn.impute`

¶

Major Feature Added

`impute.KNNImputer`

, to impute missing values using k-Nearest Neighbors. #12852 by Ashim Bhattarai and Thomas Fan.Fix

`impute.IterativeImputer`

now works when there is only one feature. By Sergey Feldman.Feature

`impute.IterativeImputer`

has new`skip_compute`

flag that is False by default, which, when True, will skip computation on features that have no missing values during the fit phase. #13773 by Sergey Feldman.

`sklearn.inspection`

¶

Major Feature

`inspection.permutation_importance`

has been added to measure the importance of each feature in an arbitrary trained model with respect to a given scoring function. #13146 by Thomas Fan.Feature

`inspection.partial_dependence`

and`inspection.plot_partial_dependence`

now support the fast ‘recursion’ method for`ensemble.HistGradientBoostingClassifier`

and`ensemble.HistGradientBoostingRegressor`

. #13769 by Nicolas Hug.

`sklearn.kernel_approximation`

¶

- -Fix Fixed a bug where
`kernel_approximation.Nystroem`

raised a `KeyError`

when using`kernel="precomputed"`

. #14706 by Venkatachalam N.

`sklearn.linear_model`

¶

Enhancement

`linear_model.BayesianRidge`

now accepts hyperparameters`alpha_init`

and`lambda_init`

which can be used to set the initial value of the maximization procedure in fit. #13618 by Yoshihiro Uchida.Fix

`linear_model.Ridge`

now correctly fits an intercept when`X`

is sparse,`solver="auto"`

and`fit_intercept=True`

, because the default solver in this configuration has changed to`sparse_cg`

, which can fit an intercept with sparse data. #13995 by Jérôme Dockès.Efficiency The ‘liblinear’ logistic regression solver is now faster and requires less memory. #14108, pr:

`14170`

, pr:`14296`

by Alex Henrie.Fix

`linear_model.Ridge`

with`solver='sag'`

now accepts F-ordered and non-contiguous arrays and makes a conversion instead of failing. #14458 by Guillaume Lemaitre.Fix

`linear_model.LassoCV`

no longer forces`precompute=False`

when fitting the final model. #14591 by Andreas Müller.Fix

`linear_model.RidgeCV`

and`linear_model.RidgeClassifierCV`

now correctly scores when`cv=None`

. #14864 by Venkatachalam N.

`sklearn.manifold`

¶

Fix Fixed a bug where

`manifold.spectral_embedding`

(and therefore`manifold.SpectralEmedding`

and`clustering.SpectralClustering`

) computed wrong eigenvalues with`eigen_solver='amg'`

when`n_samples < 5 * n_components`

. #14647 by Andreas Müller.Fix Fixed a bug in

`manifold.spectral_embedding`

used in`manifold.SpectralEmbedding`

and`cluster.spectral.SpectralClustering`

where`eigen_solver="amg"`

would sometimes result in a LinAlgError. #13393 by Andrew Knyazev #13707 by Scott White

`sklearn.metrics`

¶

Feature Added the

`metrics.nan_euclidean_distances`

metric, which calculates euclidean distances in the presence of missing values. #12852 by Ashim Bhattarai and Thomas Fan.Feature New ranking metrics

`metrics.ndcg_score`

and`metrics.dcg_score`

have been added to compute Discounted Cumulative Gain and Normalized Discounted Cumulative Gain. #9951 by Jérôme Dockès.Major Feature

`metrics.plot_roc_curve`

has been added to plot roc curves. This function introduces the visualization API described in the User Guide. #14357 by Thomas Fan.Feature Added multiclass support to

`metrics.roc_auc_score`

. #12789 by Kathy Chen, Mohamed Maskani, and Thomas Fan.Feature Add

`metrics.mean_tweedie_deviance`

measuring the Tweedie deviance for a power parameter`p`

. Also add mean Poisson deviance`metrics.mean_poisson_deviance`

and mean Gamma deviance`metrics.mean_gamma_deviance`

that are special cases of the Tweedie deviance for`p=1`

and`p=2`

respectively. #13938 by Christian Lorentzen and Roman Yurchak.Enhancement The parameter

`beta`

in`metrics.fbeta_score`

is updated to accept the zero and`float('+inf')`

value. #13231 by Dong-hee Na.Enhancement Added parameter

`squared`

in`metrics.mean_squared_error`

to return root mean squared error. #13467 by Urvang Patel.Enhancement Allow computing averaged metrics in the case of no true positives. #14595 by Andreas Müller.

Fix Raise a ValueError in

`metrics.silhouette_score`

when a precomputed distance matrix contains non-zero diagonal entries. #12258 by Stephen Tierney.

`sklearn.model_selection`

¶

Enhancement

`model_selection.learning_curve`

now accepts parameter`return_times`

which can be used to retrieve computation times in order to plot model scalability (see learning_curve example). #13938 by Hadrien Reboul.Enhancement

`model_selection.RandomizedSearchCV`

now accepts lists of parameter distributions. #14549 by Andreas Müller.Fix Reimplemented

`model_selection.StratifiedKFold`

to fix an issue where one test set could be`n_classes`

larger than another. Test sets should now be near-equally sized. #14704 by Joel Nothman.

`sklearn.multioutput`

¶

Fix

`multioutput.MultiOutputClassifier`

now has attribute`classes_`

. #14629 by Agamemnon Krasoulis.

`sklearn.pipeline`

¶

Enhancement

`pipeline.Pipeline`

now supports score_samples if the final estimator does. #13806 by Anaël Beaugnon.

`sklearn.svm`

¶

Enhancement

`svm.SVC`

and`svm.NuSVC`

now accept a`break_ties`

parameter. This parameter results in predict breaking the ties according to the confidence values of decision_function, if`decision_function_shape='ovr'`

, and the number of target classes > 2. #12557 by Adrin Jalali.Enhancement SVM now throws more specific error when fit on non-square data and kernel = precomputed.

`svm.BaseLibSVM`

#14336 by Gregory Dexter.

`sklearn.tree`

¶

Feature Adds minimal cost complexity pruning, controlled by

`ccp_alpha`

, to`tree.DecisionTreeClassifier`

,`tree.DecisionTreeRegressor`

,`tree.ExtraTreeClassifier`

,`tree.ExtraTreeRegressor`

,`ensemble.RandomForestClassifier`

,`ensemble.RandomForestRegressor`

,`ensemble.ExtraTreesClassifier`

,`ensemble.ExtraTreesRegressor`

,`ensemble.RandomTreesEmbedding`

,`ensemble.GradientBoostingClassifier`

, and`ensemble.GradientBoostingRegressor`

. #12887 by Thomas Fan.

`sklearn.preprocessing`

¶

Enhancement Avoid unnecessary data copy when fitting preprocessors

`preprocessing.StandardScaler`

,`preprocessing.MinMaxScaler`

,`preprocessing.MaxAbsScaler`

,`preprocessing.RobustScaler`

and`preprocessing.QuantileTransformer`

which results in a slight performance improvement. #13987 by Roman Yurchak.Fix KernelCenterer now throws error when fit on non-square class:

`preprocessing.KernelCenterer`

#14336 by Gregory Dexter.

`sklearn.cluster`

¶

Enhancement

`cluster.SpectralClustering`

now accepts a`n_components`

parameter. This parameter extends`SpectralClustering`

class functionality to match`spectral_clustering`

. #13726 by Shuzhe Xiao.Fix Fixed a bug where

`cluster.KMeans`

produced inconsistent results between`n_jobs=1`

and`n_jobs>1`

due to the handling of the random state. #9288 by Bryan Yang.

`sklearn.feature_selection`

¶

Fix Fixed a bug where

`VarianceThreshold`

with`threshold=0`

did not remove constant features due to numerical instability, by using range rather than variance in this case. #13704 by Roddy MacSween.

`sklearn.utils`

¶

Enhancement

`utils.safe_indexing`

accepts an`axis`

parameter to index array-like across rows and columns. The column indexing can be done on NumPy array, SciPy sparse matrix, and Pandas DataFrame. An additional refactoring was done. #14035 and #14475 by`Guillaume Lemaitre <glemaitre>`

.Feature

`check_estimator`

can now generate checks by setting`generate_only=True`

. Previously, running`check_estimator`

will stop when the first check fails. With`generate_only=True`

, all checks can run independently and report the ones that are failing. Read more in rolling_your_own_estimator. #14381 by Thomas Fan.Feature Added a pytest specific decorator,

`parametrize_with_checks`

, to parametrize estimator checks for a list of estimators. #14381 by Thomas Fan.API Change

`requires_positive_X`

estimator tag (for models that require X to be non-negative) is now used by`check_estimator`

to make sure a proper error message is raised if X contains some negative entries. #14680 by Alex Gramfort.Enhancement

`utils.safe_sparse_dot`

works between 3D+ ndarray and sparse matrix. #14538 by Jérémie du Boisberranger.

`sklearn.neighbors`

¶

Feature

`neighbors.RadiusNeighborsClassifier`

now supports predicting probabilities by using`predict_proba`

and supports more outlier_label options: ‘most_frequent’, or different outlier_labels for multi-outputs. #9597 by Wenbo Zhao.Efficiency Efficiency improvements for

`neighbors.RadiusNeighborsClassifier.predict`

. #9597 by Wenbo Zhao.Fix KNearestRegressor now throws error when fit on non-square data and metric = precomputed.

`neighbors.NeighborsBase`

#14336 by Gregory Dexter.

`sklearn.neural_network`

¶

Feature Add

`max_fun`

parameter in`neural_network.BaseMultilayerPerceptron`

,`neural_network.MLPRegressor`

, and`neural_network.MLPClassifier`

to give control over maximum number of function evaluation to not meet`tol`

improvement. #9274 by Daniel Perry.

`sklearn.cross_decomposition`

¶

Fix Fixed a bug where

`cross_decomposition.PLSCanonical`

and`cross_decomposition.PLSRegression`

were raising an error when fitted with a target matrix`Y`

in which the first column was constant. #13609 by Camila Williamson.

### Miscellaneous¶

API Change Replace manual checks with

`check_is_fitted`

. Errors thrown when using a non-fitted estimators are now more uniform. #13013 by Agamemnon Krasoulis.Fix Port

`lobpcg`

from SciPy which implement some bug fixes but only available in 1.3+. #13609 by Guillaume Lemaitre.

## Changes to estimator checks¶

These changes mostly affect library developers.

Estimators are now expected to raise a

`NotFittedError`

if`predict`

or`transform`

is called before`fit`

; previously an`AttributeError`

or`ValueError`

was acceptable. #13013 by by Agamemnon Krasoulis.Fix Added check_transformer_data_not_an_array to checks where missing Fix Added check that pairwise estimators raise error on non-square data #14336 by Gregory Dexter.

Enhancement Binary only classifiers are now supported in estimator checks. Such classifiers need to have the

`binary_only=True`

estimator tag. #13875 by Trevor Stephens.