Multicollinearity
It's good to have a relationship between dependent
and independent variables, but it's bad to have a relationship between
independent variables. Effect of single variable hard to measure.
Exact Multicollinearity (X2
= a + bX1):
Can NOT get unique equations for b
2 and b
3 (can NOT estimate the coefficients)
The two normal equations reduce to the same equation
(write out the two normal equations and substitute in the exact linear
relationship between X2 and X3.)
Can still estimate linear combination of parameters.
Near Multicollinearity
OLS still BLUE because 3.2 - 3.7 still hold
Can still do t-tests, hypothesis tests (3.8 still
holds)
OLS still consistent (still MLE so still consistent)
Forecasts are unbiased
Standard Errors are higher è
t-stats are lower è
coefficients appear to be less significant.
Can NOT capture separate effects of X2 and
X3
Note: X2 and X3 are what
affect y, and b 2
and b 3
show how much.
Absence of Multicollinearity
y = b
2 X2 + b
3X3 and y = b
2 X2 will give the same estimate for b
2
Identifying Multicollinearity
-
High R2 with low t-stats
-
High pairwise correlation among explanatory variables
(won't tell you if three are correlated though.)
-
Coefficients change when variables are added and
dropped from model.
Solutions
-
Ignore if forecasting, or theory dictates variable
should be there, or if already significant and right sign.
-
Drop explanatory variable (with lowest t-stat): SEs
will improve (¯
) causing t-stats to increase making variable appear more significant.
Don't drop theoretical variable if t-stat near at least 1.0 (or p-val <
.25)
Note: remaining variable will capture the effect
of itself as well as dropped one.
-
Reformulate Model: ex) change to per capita measures
and drop population.
Multicollinearity is still there, just lessened.
Note: Can NOT compare R2 of two models
if dependent variables are different.
-
First difference (in time-series): levels may be
correlated (tend to grow together)
-
Use linear combo of independent variables (principal
component analysis can aid in weighting scheme)
-
Increase Sample Size: SEs will drop (estimates more
precise), coefficients more significant. Sii increases.
Will work if r2 does not increase.
Example
-
Estimate y = b
0 + b
2 X2 + b
3X3
-
Then estimate X3 = a
0 + a 1X2
-
Then plug the second into the first: y = b
0 + b
2 X2 + b
3(a 0
+ a 1X2)
to calculate the true effect X2
has on y (b 2
+ (b
3 a 1))
Computer Output Example
-
Unexpected sign on coefficient hints of multicollinearity
present.
-
It's Ok if you drop a variable, get the expected
sign but lower the R2.
-
If you drop a variable b/c you think there is correlation,
then run the regression of the dropped variable on the other vars in the
dataset.
DISCLAIMER – DISCLAIMER -- DISCLAIMER
These are the notes that I took as I was reading
the chapter. They are not intended to be your sole source of information
about multicollinearity. Rather, they can be used to highlight and outline
the main topics presented in Ch. 5. If something doesn't make sense, be
sure to read about it in the chapter.