Python – Randomized stratified k-fold cross-validation in scikit-learn

cross-validationmachine-learningpythonscikit-learn

Is there any built-in way to get scikit-learn to perform shuffled stratified k-fold cross-validation? This is one of the most common CV methods, and I am surprised I couldn't find a built-in method to do this.

I saw that cross_validation.KFold() has a shuffling flag, but it is not stratified. Unfortunately cross_validation.StratifiedKFold() does not have such an option, and cross_validation.StratifiedShuffleSplit() does not produce disjoint folds.

Am I missing something? Is this planned?

(obviously I can implement this by myself)

Best Solution

The shuffling flag for cross_validation.StratifiedKFold has been introduced in the current version 0.15:

http://scikit-learn.org/0.15/modules/generated/sklearn.cross_validation.StratifiedKFold.html

This can be found in the Changelog:

http://scikit-learn.org/stable/whats_new.html#new-features

Shuffle option for cross_validation.StratifiedKFold. By Jeffrey Blackburne.

Related Question