Tutorial 6: Autonomous replication of an IV paper
Arceo, Hanna, and Oliva (The Economic Journal, 2015)
Does air pollution affect child mortality? Is this relationship linear? Air pollution is an important subject and leads to various deseases. Most estimates from the literature are from developed countries, leading to low external validity. In this paper, the authors propose a novel estimation of the effect of air pollution on infant mortality, in a developing context, Mexico.
For this PC, you are asked to upload a .pdf
file at the end of the day. You can work in group. This is not graded but the output quality will be taken into account for the participation grade. Please upload the file on Moodle with the following naming convention: “PC6_GR1_NAME1_NAME2.pdf” or “PC6_GR2_NAME1_NAME2.pdf” (in alphabetical order).
Variable | Description |
---|---|
w_tmp_mean |
Average temperature |
w_precip |
Precipitation |
w_evap |
Evaporation |
w_invterm |
Thermal inversion |
rw_infant_1y |
Child mortality (aged 1) in Mexico |
grw_infant_1y |
Child mortality (aged 1) in Guadalajara |
pm10_max24hr |
PM10 pollution |
co_max8hr |
Co pollution |
so2_mean |
Sulfure dioxyde pollution |
o3_mean |
Ozone pollution |
m |
Municipal ID |
week , month , year |
Time ID |
Exercise 1: Intuition
Most answers are in the introduction of the paper.
- Why a simple OLS regression of child mortality on pollution would lead to a biased estimation?
- A common IV strategy for pollution is to use regulation, why would it lead to a weak first stage?
- The authors argue that the external validity of the results found in developed countries is low. Why? Would we over- or under-estimate the real effect if we were to use the coefficients find in developed/less polluted countries?
- Instead of running a simple OLS regression, the authors suggest to add month and month-area fixed effect. Why does it improve the quality of the estimation?
- The authors suggest using thermal inversion as an instrument for air pollution. Discuss the exogeneity and the relevance conditions of this instrument.
Exercise 2: Data cleaning and visual representation
- Open the raw data
- Control variables include
w_tmp_mean
,w_precip
,w_cloud
,w_evap
,w_invterm
. Keep only observations for which those controls are not missing. - Remove if
w_tmp_impute
is 1 (ie, if the temperature is imputed). - Compute the monthly average of termal inversion (
w_invterm
) and the monthly average temperature (w_tmp_mean
). - Replicate Figure 3 using
ggplot
. The mortality variables arerw_infant_1y
andgrw_infant_1y
. Export to your.tex
file.
Exercise 3: Empirical analysis
Pay close attention to the footnote of the tables you want to replicate. Notice that the authors drop the bottom 99 and top 1 percentile observations (to account for outliers). They also weight the regressions by the number of births. Pollution variables are pm10_max24hr
, co_max8hr
, so2_mean
, o3_mean
.
Notice you will need to create some variables (the polynomials and the fixed effects). The explained variables are scaled by 1000 in the paper.
- Replicate Table 2. Predict the value of pollution. Export. Interpret. Does the IV seem valid?
- Replicate Table 3 (Columns 1 to 4 only). Export. Interpret. Are you confident with these results?
- [Bonus] Rerun the analysis without dropping the outliers. Interpret.
- [Bonus] Plot the main point estimates and the confidence intervals. Which pollutant is the worst?