Multicollinearity

Multicollinearity

It's good to have a relationship between dependent and independent variables, but it's bad to have a relationship between independent variables. Effect of single variable hard to measure.

Exact Multicollinearity (X₂ = a + bX₁):
Can NOT get unique equations for b ₂and b ₃ (can NOT estimate the coefficients)
The two normal equations reduce to the same equation (write out the two normal equations and substitute in the exact linear relationship between X₂and X₃.)
Can still estimate linear combination of parameters.

Near Multicollinearity
OLS still BLUE because 3.2 - 3.7 still hold
Can still do t-tests, hypothesis tests (3.8 still holds)
OLS still consistent (still MLE so still consistent)
Forecasts are unbiased
Standard Errors are higher è t-stats are lower è coefficients appear to be less significant.
Can NOT capture separate effects of X₂and X₃
Note: X₂and X₃ are what affect y, and b ₂and b ₃show how much.

Absence of Multicollinearity
y = b ₂X₂+ b ₃X₃and y = b ₂X₂will give the same estimate for b ₂

Identifying Multicollinearity

High R² with low t-stats
High pairwise correlation among explanatory variables (won't tell you if three are correlated though.)
Coefficients change when variables are added and dropped from model.

Solutions

Ignore if forecasting, or theory dictates variable should be there, or if already significant and right sign.
Drop explanatory variable (with lowest t-stat): SEs will improve (¯ ) causing t-stats to increase making variable appear more significant. Don't drop theoretical variable if t-stat near at least 1.0 (or p-val < .25)

Note: remaining variable will capture the effect of itself as well as dropped one.

Reformulate Model: ex) change to per capita measures and drop population.

Multicollinearity is still there, just lessened.

Note: Can NOT compare R² of two models if dependent variables are different.

First difference (in time-series): levels may be correlated (tend to grow together)
Use linear combo of independent variables (principal component analysis can aid in weighting scheme)
Increase Sample Size: SEs will drop (estimates more precise), coefficients more significant. S_ii increases. Will work if r² does not increase.

Example

Estimate y = b ₀ + b ₂X₂+ b ₃X₃
Then estimate X₃ = a ₀ + a ₁X₂
Then plug the second into the first: y = b ₀ + b ₂X₂+ b ₃(a ₀ + a ₁X₂) to calculate the true effect X₂ has on y (b ₂+ (b ₃ a ₁))

Computer Output Example

Unexpected sign on coefficient hints of multicollinearity present.
It's Ok if you drop a variable, get the expected sign but lower the R².
If you drop a variable b/c you think there is correlation, then run the regression of the dropped variable on the other vars in the dataset.

DISCLAIMER – DISCLAIMER -- DISCLAIMER

These are the notes that I took as I was reading the chapter. They are not intended to be your sole source of information about multicollinearity. Rather, they can be used to highlight and outline the main topics presented in Ch. 5. If something doesn't make sense, be sure to read about it in the chapter.