Chapter 13. Multiple Regression

Chapter Purpose

Simple regression showed that larger milk packages tend to have higher prices. However, price may also depend on package size and number of pieces. Multiple regression allows us to study several explanatory variables simultaneously.

The central idea is:

Estimate the effect of one variable while holding other variables constant.

Applied Question

Does volume still matter after accounting for package size and number of pieces?

The Multiple Regression Model

\[ Price_i = \beta_0 + \beta_1 Volume1000_i + \beta_2 Size1000_i + \beta_3 Pieces_i + u_i \]

where Volume1000 is total volume in thousands of milliliters and Size1000 is package size in thousands of milliliters.

Python Implementation

X = milk_data[["Volume1000", "Size1000", "Pieces"]]
X = sm.add_constant(X)
y = milk_data["Price"]
model_multi = sm.OLS(y, X).fit()
print(model_multi.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  Price   R-squared:                       0.302
Model:                            OLS   Adj. R-squared:                  0.294
Method:                 Least Squares   F-statistic:                     36.59
Date:                Thu, 11 Jun 2026   Prob (F-statistic):           1.09e-19
Time:                        06:53:34   Log-Likelihood:                -2076.5
No. Observations:                 258   AIC:                             4161.
Df Residuals:                     254   BIC:                             4175.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        290.9660     92.492      3.146      0.002     108.818     473.114
Volume1000   261.3607     65.404      3.996      0.000     132.558     390.164
Size1000     270.1869     97.784      2.763      0.006      77.616     462.757
Pieces        61.6742     20.619      2.991      0.003      21.069     102.280
==============================================================================
Omnibus:                      229.990   Durbin-Watson:                   1.694
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             3185.494
Skew:                           3.757   Prob(JB):                         0.00
Kurtosis:                      18.488   Cond. No.                         11.3
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Actual Multiple Regression Results

Key results:

Variable Coefficient
Volume1000 261.36
Size1000 included
Pieces included

Model performance:

\[ R^2 = 0.302 \]

Compared with the simple model:

\[ R^2 = 0.274 \]

The explanatory power improves after including additional variables.

Why Did the Volume Coefficient Change?

Simple regression coefficient:

\[ 417.0 \]

Multiple regression coefficient:

\[ 261.4 \]

The coefficient decreases because part of the relationship previously attributed to volume is now explained by package size and number of pieces.

Interpreting the Volume Coefficient

Holding package size and number of pieces constant, a 1,000 ml increase in total volume is associated with an average increase of approximately 261 price units.

This phrase, “holding other variables constant,” is the defining feature of multiple regression.

Tip

The phrase “holding other variables constant” is the key idea behind multiple regression.

Control Variables

Size1000 and Pieces are control variables. They help separate the relationship between volume and price from related package characteristics.

Omitted Variable Bias

If package size and pieces affect price and are correlated with volume, omitting them may distort the volume coefficient. The reduction from 417.0 to 261.4 illustrates why coefficients often change when controls are added.

Goodness of Fit

The improvement in R² is:

\[ 0.302 - 0.274 = 0.028 \]

The model still leaves substantial variation unexplained, which motivates adding categorical variables such as Brand and Fat in Chapter 15.

Key Takeaways

  • Most economic outcomes depend on multiple variables.
  • Multiple regression allows economists to hold other variables constant.
  • The Milk Data multiple regression increases R² from 0.274 to 0.302.
  • The volume coefficient decreases from 417.0 to 261.4 after controls are added.
  • Control variables improve interpretation.