Inference, estimation, and decision-making from data
Hypothesis testing is a disciplined way to answer "is this effect real, or could it just be noise?", which is the exact question "is model A actually better than model B?" You start by assuming there's nothing going on and ask how surprising your data would be if that were true.
Two competing claims. The null hypothesis H₀ is the boring default: no effect, no difference. The alternative H₁ is what you suspect: there is an effect. You compute a test statistic from the data and ask: if H₀ were true, how extreme is this value?
If the statistic is so extreme that it would rarely happen under H₀, you reject H₀. Otherwise you fail to reject it (note: never "accept", since absence of evidence isn't evidence of absence).