Model specification is the process of deciding which variables, transformations, and functional forms should be included in a regression model. This chapter introduces specification and F-tests using actual Milk Data model comparisons.
Applied Question
How do we decide whether additional variables belong in a model?
Model Comparison
Model
R²
Volume only
0.274
Volume + Size + Pieces
0.302
Volume + Brand + Fat
0.490
Log-Log Model
0.655
Model selection should not rely only on R². Researchers must also consider theory, interpretation, and parsimony.
Restricted and Unrestricted Models
An F-test compares a restricted model with an unrestricted model.
The restricted model is simpler. The unrestricted model contains additional variables. The test asks whether the added variables jointly improve the model.
Why We Need F-tests
A t-test evaluates a single coefficient. An F-test evaluates several restrictions simultaneously.
Example:
\[
H_0: Brand = 0 \; \text{and} \; Fat = 0
\]
The alternative is that at least one added coefficient is nonzero.
If important variables are omitted, coefficients may be biased. The Milk Data provide a clear example. The volume coefficient changes from 417.0 in the simple model to 261.4 in the multiple regression with Size and Pieces.
Irrelevant Variables
Including variables that do not belong in the model may increase complexity and reduce precision. Good models balance explanatory power with simplicity.
Parsimony
Prefer the simplest model that adequately explains the data. Each variable should have theoretical justification and empirical relevance.
What the Milk Data Teaches Us
Additional variables matter.
Brand and Fat significantly improve the model.
Size and Pieces provide additional explanatory power.
The relationship between Volume and Price is nonlinear.
F-tests provide a formal method for evaluating model improvements.
Common Mistakes
WarningCommon Mistake 1
Selecting variables solely based on statistical significance.
WarningCommon Mistake 2
Ignoring omitted variable bias.
WarningCommon Mistake 3
Using R² as the only model selection criterion.
Key Takeaways
Model specification is one of the most important tasks in econometrics.
Brand and Fat significantly improve the Milk Data model.
Size and Pieces provide additional explanatory power.
The quadratic term is statistically significant.
Good models balance theory, evidence, and simplicity.
Part III Summary
You can now estimate simple regression models, conduct hypothesis tests, evaluate prediction performance, estimate multiple regression models, interpret elasticity estimates, use dummy variables, and evaluate model specification using F-tests.