Skip to content
Snippets Groups Projects
Commit 01e494d1 authored by Devin Routh's avatar Devin Routh
Browse files

Fixed link in README

parent 077920e7
No related branches found
No related tags found
No related merge requests found
......@@ -143,7 +143,7 @@ As noted above, this code demonstrates a nested K-Fold cross validation example
[K-Fold Cross validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)) is a method that splits a dataset into k-pieces that are then rotated between test sets and training sets. The idea is to minimize potential bias in any single test dataset so as to best generalize how a model or statistical analysis will generalize to an independent dataset.
While K-fold CV can be used as a method to generalize the results of a statistical model on an independent dataset, it is also commonly used as a method to select the "best model" from a selection of potential models (either choosing the "best" hyperparameter set for a model and/or comparing completely different model types). However, when K-Fold CV is used both for accuracy assessment as well as model selection, bias/"data leakage" can occur. A useful starting point for reading more about these discussions is [this paper](https://www.jmlr.org/papers/volume11/cawley10a/cawley10a) and [this paper](https://www.sciencedirect.com/science/article/pii/S0957417421006540).
While K-fold CV can be used as a method to generalize the results of a statistical model on an independent dataset, it is also commonly used as a method to select the "best model" from a selection of potential models (either choosing the "best" hyperparameter set for a model and/or comparing completely different model types). However, when K-Fold CV is used both for accuracy assessment as well as model selection, bias/"data leakage" can occur. A useful starting point for reading more about these discussions is [this paper](https://www.sciencedirect.com/science/article/pii/S0957417421006540).
To reduce the effects of such bias, nested K-Fold cross validation can be used instead. In this method, a dataset is partitioned into k-folds that are then each used as train, validate, and test sets. This results in many more model iterations than with a "flat" CV procedure.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment