Multicollinearity

It's good to have a relationship between dependent and independent variables, but it's bad to have a relationship between independent variables. Effect of single variable hard to measure.

Exact Multicollinearity (X2 = a + bX1):
Can NOT get unique equations for b 2 and b 3 (can NOT estimate the coefficients)
The two normal equations reduce to the same equation (write out the two normal equations and substitute in the exact linear relationship between X2 and X3.)
Can still estimate linear combination of parameters.

Near Multicollinearity
OLS still BLUE because 3.2 - 3.7 still hold
Can still do t-tests, hypothesis tests (3.8 still holds)
OLS still consistent (still MLE so still consistent)
Forecasts are unbiased
Standard Errors are higher è t-stats are lower è coefficients appear to be less significant. 
Can NOT capture separate effects of X2 and X3
Note: X2 and X3 are what affect y, and b 2 and b 3 show how much.

Absence of Multicollinearity
y = b 2 X2 + b 3X3 and y = b 2 X2 will give the same estimate for b 2
 

Identifying Multicollinearity

  1. High R2 with low t-stats
  2. High pairwise correlation among explanatory variables (won't tell you if three are correlated though.)
  3. Coefficients change when variables are added and dropped from model.
Solutions
  1. Ignore if forecasting, or theory dictates variable should be there, or if already significant and right sign.
  2. Drop explanatory variable (with lowest t-stat): SEs will improve (¯ ) causing t-stats to increase making variable appear more significant. Don't drop theoretical variable if t-stat near at least 1.0 (or p-val < .25)

  3. Note: remaining variable will capture the effect of itself as well as dropped one.
  4. Reformulate Model: ex) change to per capita measures and drop population.

  5. Multicollinearity is still there, just lessened.
    Note: Can NOT compare R2 of two models if dependent variables are different.
  6. First difference (in time-series): levels may be correlated (tend to grow together)
  7. Use linear combo of independent variables (principal component analysis can aid in weighting scheme)
  8. Increase Sample Size: SEs will drop (estimates more precise), coefficients more significant. Sii increases. Will work if r2 does not increase.
 Example
  1. Estimate y = b 0 + b 2 X2 + b 3X3
  2. Then estimate X3 = a 0 + a 1X2
  3. Then plug the second into the first: y = b 0 + b 2 X2 + b 3(a 0 + a 1X2) to calculate the true effect X2 has on y (b 2 + (b 3 a 1))
Computer Output Example
  1. Unexpected sign on coefficient hints of multicollinearity present.
  2. It's Ok if you drop a variable, get the expected sign but lower the R2.
  3. If you drop a variable b/c you think there is correlation, then run the regression of the dropped variable on the other vars in the dataset.
 
DISCLAIMER – DISCLAIMER -- DISCLAIMER

These are the notes that I took as I was reading the chapter. They are not intended to be your sole source of information about multicollinearity. Rather, they can be used to highlight and outline the main topics presented in Ch. 5. If something doesn't make sense, be sure to read about it in the chapter.