Recap
Last week, we discovered the OLS estimator and what it does. It allows to assess, under certain conditions, the linear relation between an explained variable \(y\) and explanatory variables \(X\), using \(n\) individual observations.
Recall that OLS estimator is biased. If the estimator is biased \(E(\epsilon|X) \neq 0\). Bias may arise in three main cases:
- Measurement error. If the observed value \(\tilde{X}\) of the real variable \(X\) is biased, then it writes \(\tilde{X} = X + \mu\). Then the exogeneity assumption does not hold.
- Omitted variable bias. The true model is \(y_i = \alpha + \beta_1 x_i + \beta_2 z_i + \epsilon_i\). If we estimate \(y_i = \alpha + \beta_1 x_i + \epsilon_i\), we may under- or over-estimate the effect of \(x\) and \(y\).
- Reverse causality (or simultaneity). The true model is \[
\begin{align*}
y_i &= \alpha_0 + \alpha_1 x_i + \alpha_2 z_i + u_i \\
x_i &= \beta_0 + \beta_1 y_i + \beta_2 z_i + v_i
\end{align*}
\]
Rearranging yields a reduced-form model \(y_i = \pi_0 + \pi_1 z_i + \pi_2 x_i + e_i\), but \(e_i\) contains both \(u_i\) and \(z_i\) and the exogeneity assumption might be violated.
Exercise 1: Solow model and OVB
Dataset MRW_QJE1992.xlsx
can be downloaded on Moodle.
Baseline Solow model
- Open the dataset with the function
read_xlsx
from the package readxl
- Describe the dataset
- Using
ggplot2
package, make a graph to plot on the \(x\) axis the GDP growth and on the \(y\) axis the log GDP in 1965. Export in pdf
.
- In the paper, different country groups are defined. Create the grouping variable, depending on country types. Hint: use
ifelse(test,value if true, value if false)
. Notice that countries \(o\) are a subset of countries \(i\) which are a subset of countries \(n\).
- Estimate this model and store in an object called
reg0
\[
\log (Y_i/L_i) = \beta_0 + \beta_1 \log s_i + \beta_2 \log(n + g + \delta) + \epsilon_i
\]
We define \(g + \delta = 0.05\).
- Estimate the same model but for each country subgroup.
- Bonus: do the latter with a loop
- This is the result we find. Interpret it (notice the log-log specification)
Call:
lm(formula = log(rgdpw85) ~ 1 + log(i_y) + log(popgrowth + constant),
data = dat)
Residuals:
Min 1Q Median 3Q Max
-1.89396 -0.49251 -0.03161 0.52177 3.12361
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.4330 0.4929 8.993 1.19e-14 ***
log(i_y) 1.4083 0.1617 8.711 5.01e-14 ***
log(popgrowth + constant) -0.2991 0.1431 -2.090 0.0391 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.7799 on 104 degrees of freedom
(14 observations deleted due to missingness)
Multiple R-squared: 0.4949, Adjusted R-squared: 0.4851
F-statistic: 50.94 on 2 and 104 DF, p-value: 3.782e-16
- Previous work estimated that the elasticity of production with respect to investment is 1/3. Is this verified here?
Adding school as omitted variable
In the extension of the Solow model, we saw that human capital has a role in explaining GDP per capita.
- Run the model again but adding the
school
variable. Interpret.
- Bonus: Using
linearHypothesis
test if \(\beta_1\) and \(\beta_2\) are equal.
Exercise 2: Acemoglu, Johnson, Robinson and instrumental variable
Recap
In this very influential paper, AJR estimates the effects of institution on GDP growth. They in particular test whether good institutions, hat protect entrepreneurs, enhance the GDP per capita growth in the African context.
However, there is a clear endogeneity issue. Can you see it?
Part 1
- Download the dataset and describe the data
- Create a scatter plot of mortality rate against GDP per capita in 1995, and a second scatter plot with the log mortality rate and log GDP per capita in 1995. Notice the difference.
Table 2 of Acemoglu et al. (2001) presents the results of an OLS regression of log GDP per capita in 1995 on average protection against expropriation, and a some covariates: \[
\log y_i = \mu + \alpha R_i + \mathbf{X}_i'\gamma + \epsilon_i
\]
- Identify the covariates in the results table.
- Reproduce the results for the columns (2), (5), and (6). Export them to your answer sheet. Interpret the results clearly.
- What is the effect of an increase of 1 on the risk scale on the GDP?
Part 2
So far, we used OLS to estimate the effect of risk on GDP. However, the relationship is likely to be endogenous. Hence, we can risk with mortality to aleviate this endogeneity concern. We run two different methods:
- Run the regression of risk on log mortality (using only latitude as a covariate).
- Run the regression of predicted risk on GDP (using only latitude as a covariate). To do so, you need to estimate the predicted risk based on the previous regression result using the
predict
function.
A good instrument has to check two assumptions. The first one is the relevance, meaning that the instrument must be correlated with the instrumented variable. The second one is exogeneity, meaning that \(z\) must not cause \(y\). This cannot be directly tested for.
- Does the instrument seem valid? Comment the results.
- Discover the function
ivreg
and do the IV regression again. Do the results differ?
Solutions are here.