Inference, estimation, and decision-making from data
You can't judge a model by its training error; it has already seen that data, so it can cheat by memorizing. You need its error on data it has never seen. But holding out a single test set wastes data and gives a noisy estimate. Cross-validation solves both problems.
In k-fold cross-validation, split the data into k equal folds. Train on k−1 of them, validate on the held-out one, and rotate so every fold serves as the validation set exactly once. Average the k validation errors for a stable estimate of how the model generalizes.
Cross-validation is like sitting several practice exams to predict your real-exam score. If you only graded yourself on questions you'd already memorized the answers to, you'd overestimate wildly, so you set aside a fresh batch of questions each time, score yourself on those, and rotate which batch is held back. Averaging your scores across all the practice sittings gives a far steadier forecast of how you'll do on the day than any single mock exam would.