What is a cross-validation and what is an overfitting ?

A model, which has a very high accuracy on the training set but a very poor performance on the test set is consider to have over-fit the data. This generally means that a highly complex model was chosen to reduce training bias to almost zero, which could've violated the bias-variance trade-off. To avoid over-fitting, data scientists employ cross-validation. This technique essentially divides the training data-set into several parts, say N, and in each iteration trains the model on different (N-1) parts as well as tests the accuracy on the remaining training data part called the validation data. This considers the performance of the model on new data (i.e. validation data) and avoids over-fitting.

Submitted by fenil.doshi

 

Cross-validation is a technique for testing a model by using different slices of the data and comparing its results on different sets of data. Overfitting is when the model has "memorized" the training data and produces what looks like an accurate model but cannot be generalized or as reliable with new data it is not familiar with.

Submitted by john.rosenfelder