@@ -143,7 +143,7 @@ As noted above, this code demonstrates a nested K-Fold cross validation example
[K-Fold Cross validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)) is a method that splits a dataset into k-pieces that are then rotated between test sets and training sets. The idea is to minimize potential bias in any single test dataset so as to best generalize how a model or statistical analysis will generalize to an independent dataset.
While K-fold CV can be used as a method to generalize the results of a statistical model on an independent dataset, it is also commonly used as a method to select the "best model" from a selection of potential models (either choosing the "best" hyperparameter set for a model and/or comparing completely different model types). However, when K-Fold CV is used both for accuracy assessment as well as model selection, bias/"data leakage" can occur. A useful starting point for reading more about these discussions is [this paper](https://www.jmlr.org/papers/volume11/cawley10a/cawley10a) and [this paper](https://www.sciencedirect.com/science/article/pii/S0957417421006540).
While K-fold CV can be used as a method to generalize the results of a statistical model on an independent dataset, it is also commonly used as a method to select the "best model" from a selection of potential models (either choosing the "best" hyperparameter set for a model and/or comparing completely different model types). However, when K-Fold CV is used both for accuracy assessment as well as model selection, bias/"data leakage" can occur. A useful starting point for reading more about these discussions is [this paper](https://www.sciencedirect.com/science/article/pii/S0957417421006540).
To reduce the effects of such bias, nested K-Fold cross validation can be used instead. In this method, a dataset is partitioned into k-folds that are then each used as train, validate, and test sets. This results in many more model iterations than with a "flat" CV procedure.