Part I. Data, Python, and Economic Questions

NREC4107 - Applied Econometrics

Purpose of Part I

Part I introduces the basic workflow of applied econometrics using the actual course milk dataset:

Milk_Data_S2025n.csv

The purpose is not to estimate a model yet. The purpose is to learn how to move from a real dataset to clear empirical questions, careful data inspection, reproducible Python work, and documented cleaning decisions.

The attached dataset contains 258 observations and 11 variables. It includes numeric variables such as Price, Size, Pieces, and Volume, and categorical variables such as Location, Type, Brand, Fat, Fresh, Package, and Flavor.

Why this part matters

Applied econometrics begins before regression. A researcher must first understand:

  • what each row represents
  • what each variable measures
  • which variables are numeric or categorical
  • whether values are missing
  • whether constructed variables are consistent
  • which question the data can reasonably help answer

The course uses Python in Google Colab. The website shows the code, but students run the code in Colab.

Chapters in Part I

Chapter Topic
Chapter 1 What Is Applied Econometrics?
Chapter 2 Economic Questions and Data Types
Chapter 3 Python, Google Colab, and Reproducible Analysis
Chapter 4 Cleaning and Preparing Agricultural Data

Dataset used in Part I

The running dataset is:

Milk_Data_S2025n.csv

The dataset has the following observed columns:

['Location', 'Type', 'Brand', 'Fat', 'Fresh', 'Price', 'Package', 'Size', 'Pieces', 'Flavor', 'Volume']

The code examples assume the file is saved in Google Drive as:

../data/Milk_Data_S2025n.csv

Students should change the path if they save the file somewhere else.

What students should be able to do after Part I

After completing Part I, students should be able to:

  • explain applied econometrics as a connection between questions, data, methods, and interpretation
  • identify numeric and categorical variables in the milk dataset
  • open Google Colab and load a dataset from Google Drive
  • inspect missing values, duplicated rows, data types, and constructed variables
  • explain why cleaning decisions must be documented
  • distinguish description, association, prediction, and causality

Key message

Good applied econometrics starts with careful data understanding. Software can help us calculate results, but it cannot decide whether the question is meaningful or whether the interpretation is credible.