A practical comparison of econometric regression and machine learning, focusing on explanation, prediction, interpretation, and model choice.
Chapter purpose
Regression models and machine learning models are often presented as competing approaches. In practice, they are designed to answer different questions.
Econometric models are commonly used to understand relationships, test hypotheses, and evaluate economic mechanisms. Machine learning models are usually designed to maximize predictive accuracy.
This chapter compares the two approaches and explains when each is appropriate.
Applied question
If our goal is to understand milk prices, should we use regression or machine learning?
Key idea
The best model depends on the objective. If the goal is explanation, interpretation, and policy analysis, regression is often preferred. If the goal is accurate prediction, machine learning may perform better.
Neither approach is universally superior.
Minimal comparison
Objective
Regression
Machine learning
Explain relationships
Strong
Limited
Test hypotheses
Strong
Rare
Estimate causal effects
Possible with design
Difficult by default
Predict new observations
Good
Often stronger
Interpret coefficients
Strong
Often weak
Handle complex patterns
Limited
Strong
23.1 Why economists use regression
Regression models are closely connected to economic theory. Suppose we estimate:
[ Price_i = _0 + _1 Volume_i + u_i ]
The coefficient (_1) has a direct interpretation. For example, a one-liter increase in package volume may be associated with a specific change in price, on average.
Regression allows researchers to estimate economic relationships, test hypotheses, construct confidence intervals, and communicate findings clearly.
OLS Regression Results
==============================================================================
Dep. Variable: Price R-squared: 0.274
Model: OLS Adj. R-squared: 0.271
Method: Least Squares F-statistic: 96.56
Date: Thu, 11 Jun 2026 Prob (F-statistic): 1.53e-19
Time: 06:55:16 Log-Likelihood: -2081.5
No. Observations: 258 AIC: 4167.
Df Residuals: 256 BIC: 4174.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 467.9297 74.683 6.266 0.000 320.858 615.002
Volume 0.4170 0.042 9.826 0.000 0.333 0.501
==============================================================================
Omnibus: 221.760 Durbin-Watson: 1.652
Prob(Omnibus): 0.000 Jarque-Bera (JB): 2757.298
Skew: 3.610 Prob(JB): 0.00
Kurtosis: 17.295 Cond. No. 2.72e+03
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.72e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
Interpretation
The primary output of interest is usually the coefficient estimate. Researchers ask whether the coefficient is positive or negative, statistically significant, and economically meaningful.
Prediction may be useful, but explanation is usually the main objective.
23.2 Why data scientists use machine learning
Machine learning begins with a different question:
Can we accurately predict future observations?
Instead of focusing on coefficients, machine learning focuses on predictive performance.
A retailer may want to predict milk prices next month. The retailer may not care whether volume or package type causes the price change. The retailer mainly cares whether the prediction is accurate.
Model
RMSE
Linear Regression
0.35
Random Forest
0.22
If the Random Forest has lower RMSE, it predicts more accurately on the test sample.
23.3 Same data, different objectives
The same dataset can be used for different purposes.
A researcher may ask:
Does package volume affect milk prices?
This is an explanatory question.
A retailer may ask:
What price should we expect for this product?
This is a predictive question.
The same data can support both analyses, but the model choice depends on the question.
23.4 Interpretable models versus black boxes
Regression models are generally transparent. Coefficients can be interpreted in words. Machine learning models are often more complex. They may predict well but be harder to explain.
Regression
Advantages:
Easy to explain.
Coefficients have economic meaning.
Supports hypothesis testing.
Useful for policy analysis.
Disadvantages:
Requires functional form choices.
May miss nonlinear patterns.
Machine learning
Advantages:
Flexible.
Captures nonlinear relationships.
Often improves prediction.
Disadvantages:
Harder to interpret.
Coefficients may not exist.
Less useful for causal analysis by itself.
23.5 A practical comparison
A useful applied workflow is to estimate both a regression model and one or more machine learning models.
If the Random Forest has lower RMSE, it predicts more accurately. This does not mean it provides better economic insight. Prediction accuracy and economic explanation are different objectives.
23.6 Why economists should learn both
Regression remains central for policy evaluation, impact assessment, academic research, and economic interpretation. Machine learning is increasingly useful for forecasting, large datasets, nonlinear relationships, and high-dimensional prediction problems.
The strongest applied analysts understand both traditions.
WarningCommon mistake
Do not assume that the model with the highest prediction accuracy provides the strongest economic evidence. A highly accurate predictive model may provide little insight into economic mechanisms.
Key takeaway
Regression and machine learning serve different purposes.
Regression focuses on explanation and interpretation.
Machine learning focuses on prediction accuracy.
Better prediction does not imply better causal understanding.
Economic theory remains essential for empirical analysis.
Looking ahead
In the next chapter, we introduce decision trees and Random Forests, two machine learning methods that can capture nonlinear relationships and often improve predictive performance.