Draft answer sheet for homework one
Fall ECON 120 C
Part one
I.1
Reading header file data4-15.hdr
List of variables
0) const 1) Y 2) gnp 3) gdp 4) pop
5) urb 6) lit 7) edu 8) agr
period: 1, maxobs: 41, obs range: full 1-41, current 1-41
Reading datafile data4-15 BY OBSERVATIONS
?summary ;
SUMMARY STATISTICS (USING THE OBSERVATIONS 1-41)
(missing values denoted by -999 will be skipped)
Variable MEAN S.D. C.V. MIN
Y 0.38088 0.116678 0.306339 0.183
gnp 7024.878049 8142.01453 1.159026 200
gdp 3.397561 4.405082 1.296542 -8.9
pop 1.387805 0.97396 0.701799 0.1
urb 62.739024 22.712781 0.36202 10.9
lit 87.207317 17.169834 0.196885 29.2
edu 67.536585 26.689977 0.395193 12
agr 13.195122 12.34275 0.935402 0.7
Variable MEDIAN MAX SKEW EXCSKURT
Y 0.3732 0.596 0.221479 -1.065527
gnp 2590 25450 0.968924 -0.681408
gdp 2.9 12.7 -0.419326 0.869507
pop 1.4 3.3 0.362972 -1.19148
urb 67.1 100 -0.624772 -0.544407
lit 95 99 -1.737704 2.327503
edu 75 99 -0.528941 -1.027911
agr 9.6 56.5 1.785652 3.096763
?square gnp gdp pop urb lit edu agr ;
Created sq_gnp = gnp squared as var no. 9
Created sq_gdp = gdp squared as var no. 10
Created sq_pop = pop squared as var no. 11
Created sq_urb = urb squared as var no. 12
Created sq_lit = lit squared as var no. 13
Created sq_edu = edu squared as var no. 14
Created sq_agr = agr squared as var no. 15
List of variables
0) const 1) Y 2) gnp 3) gdp 4) pop
5) urb 6) lit 7) edu 8) agr 9) sq_gnp
10) sq_gdp 11) sq_pop 12) sq_urb 13) sq_lit 14) sq_edu
15) sq_agr
[Model 1A]
?ols Y const gnp gdp pop urb lit edu agr sq_gnp sq_gdp sq_pop sq_urb
sq_lit sq_edu sq_agr ;
OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41
Dependent variable - Y
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)
0) constant -0.182317 0.189029 -0.964492 0.343685
2) gnp -1.066273e-05 8.208931e-06 -1.298918 0.205374
3) gdp 0.004786 0.003182 1.5042 0.144582
4) pop 0.157835 0.050219 3.142899 0.004149 ***
5) urb 0.00409 0.004462 0.91654 0.367807
6) lit 0.012676 0.006189 2.048229 0.050761 *
7) edu -0.001908 0.003106 -0.614457 0.544254
8) agr 3.928915e-04 0.003183 0.123442 0.902706
9) sq_gnp 4.370790e-10 3.088420e-10 1.415219 0.168871
10) sq_gdp -3.617565e-04 3.872316e-04 -0.934212 0.35879
11) sq_pop -0.030838 0.015053 -2.048618 0.05072 *
12) sq_urb -1.432770e-05 3.581574e-05 -0.400039 0.692395
13) sq_lit -8.845356e-05 4.558500e-05 -1.94041 0.063247 *
14) sq_edu -9.977581e-07 2.364934e-05 -0.04219 0.96667
15) sq_agr -6.011323e-05 5.627975e-05 -1.068115 0.295286
Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678
Error Sum of Sq (ESS) 0.097731 Std Err of Resid. (sgmahat) 0.06131
Unadjusted R-squared 0.821 Adjusted R-squared 0.724
F-statistic (14, 26) 8.490811 pvalue = Prob(F > 8.491) is < 0.0001
Durbin-Watson Stat. 2.279825 First-order auto corr coeff -0.219
MODEL SELECTION STATISTICS
SGMASQ 0.003759 AIC 0.004955 FPE 0.005134
HQ 0.006225 SCHWARZ 0.009275 SHIBATA 0.004128
GCV 0.005927 RICE 0.008885
Excluding the constant, p-value was highest for variable 14 (sq_edu)
I.2
?omit sq_edu ;
sq_edu is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).
[Model 1B]
OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41
Dependent variable - Y
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)
0) constant -0.182853 0.185083 -0.98795 0.331949
2) gnp -1.065729e-05 8.054762e-06 -1.323105 0.196897
3) gdp 0.004816 0.003043 1.582488 0.125183
4) pop 0.157458 0.048495 3.246878 0.003111 ***
5) urb 0.004123 0.004311 0.956392 0.34736
6) lit 0.01277 0.00566 2.256187 0.032364 **
7) edu -0.002034 9.067779e-04 -2.242585 0.033335 **
8) agr 4.139877e-04 0.003085 0.13421 0.894232
9) sq_gnp 4.354337e-10 3.006532e-10 1.448292 0.159049
10) sq_gdp -3.617811e-04 3.800056e-04 -0.952041 0.349522
11) sq_pop -0.03079 0.014731 -2.090173 0.046152 **
12) sq_urb -1.449289e-05 3.493677e-05 -0.414832 0.681543
13) sq_lit -8.915318e-05 4.166956e-05 -2.139528 0.041587 **
15) sq_agr -6.065308e-05 5.378316e-05 -1.127734 0.269355
Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678
Error Sum of Sq (ESS) 0.097738 Std Err of Resid. (sgmahat) 0.060166
Unadjusted R-squared 0.821 Adjusted R-squared 0.734
F-statistic (13, 27) 9.494849 pvalue = Prob(F > 9.495) is < 0.0001
Durbin-Watson Stat. 2.279141 First-order auto corr coeff -0.219
MODEL SELECTION STATISTICS
SGMASQ 0.00362 AIC 0.004719 FPE 0.004856
HQ 0.00584 SCHWARZ 0.008472 SHIBATA 0.004012
GCV 0.005497 RICE 0.007518
Excluding the constant, p-value was highest for variable 8 (agr)
Model selection statistics have decreased (i.e. improved) for 8 criteria
?omit agr ;
agr is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).
[Model 1C]
OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41
Dependent variable - Y
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)
0) constant -0.173385 0.168084 -1.031536 0.311118
2) gnp -1.086977e-05 7.757925e-06 -1.401118 0.172166
3) gdp 0.004836 0.002986 1.619366 0.116579
4) pop 0.157497 0.047636 3.306237 0.002599 ***
5) urb 0.004089 0.004227 0.967238 0.341706
6) lit 0.012712 0.005544 2.293097 0.029559 **
7) edu -0.002046 8.861458e-04 -2.308723 0.028556 **
9) sq_gnp 4.412936e-10 2.922032e-10 1.510228 0.142189
10) sq_gdp -3.611284e-04 3.732520e-04 -0.967519 0.341568
11) sq_pop -0.030802 0.01447 -2.128673 0.042216 **
12) sq_urb -1.426842e-05 3.427932e-05 -0.41624 0.680408
13) sq_lit -8.891922e-05 4.089650e-05 -2.17425 0.038294 **
15) sq_agr -5.468481e-05 2.971522e-05 -1.840296 0.076347 *
Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678
Error Sum of Sq (ESS) 0.097803 Std Err of Resid. (sgmahat) 0.059101
Unadjusted R-squared 0.820 Adjusted R-squared 0.743
F-statistic (12, 28) 10.658386 pvalue = Prob(F > 10.658) is < 0.0001
Durbin-Watson Stat. 2.288252 First-order auto corr coeff -0.227
MODEL SELECTION STATISTICS
SGMASQ 0.003493 AIC 0.004498 FPE 0.0046
HQ 0.005481 SCHWARZ 0.007744 SHIBATA 0.003898
GCV 0.005115 RICE 0.00652
Excluding the constant, p-value was highest for variable 12 (sq_urb)
Model selection statistics have decreased (i.e. improved) for 8 criteria
?omit sq_urb ;
sq_urb is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).
[Model 1D]
OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41
Dependent variable - Y
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)
0) constant -0.159301 0.16228 -0.981647 0.334393
2) gnp -1.233683e-05 6.811839e-06 -1.811087 0.080498 *
3) gdp 0.005028 0.002908 1.729026 0.094438 *
4) pop 0.149741 0.043212 3.465298 0.00167 ***
5) urb 0.002365 8.347003e-04 2.833244 0.008299 ***
6) lit 0.013662 0.00498 2.743428 0.010317 **
7) edu -0.001839 7.237923e-04 -2.541356 0.016649 **
9) sq_gnp 4.861328e-10 2.677216e-10 1.815815 0.079752 *
10) sq_gdp -4.333227e-04 3.257609e-04 -1.330186 0.193824
11) sq_pop -0.027966 0.012583 -2.222619 0.034198 **
13) sq_lit -9.575821e-05 3.691315e-05 -2.594149 0.014715 **
15) sq_agr -6.285884e-05 2.198119e-05 -2.859664 0.00778 ***
Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678
Error Sum of Sq (ESS) 0.098408 Std Err of Resid. (sgmahat) 0.058253
Unadjusted R-squared 0.819 Adjusted R-squared 0.751
F-statistic (11, 29) 11.952321 pvalue = Prob(F > 11.952) is < 0.0001
Durbin-Watson Stat. 2.295352 First-order auto corr coeff -0.228
MODEL SELECTION STATISTICS
SGMASQ 0.003393 AIC 0.00431 FPE 0.004387
HQ 0.005173 SCHWARZ 0.007117 SHIBATA 0.003805
GCV 0.004798 RICE 0.005789
Excluding the constant, p-value was highest for variable 10 (sq_gdp)
Model selection statistics have decreased (i.e. improved) for 8 criteria
?omit sq_gdp ;
sq_gdp is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).
[Model 1E]
OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41
Dependent variable - Y
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)
0) constant -0.126216 0.162405 -0.777163 0.443148
2) gnp -1.014652e-05 6.694049e-06 -1.515753 0.140049
3) gdp 0.003015 0.002515 1.198806 0.239988
4) pop 0.137445 0.042749 3.215154 0.003114 ***
5) urb 0.002179 8.333881e-04 2.6145 0.013843 **
6) lit 0.012746 0.004995 2.55184 0.01605 **
7) edu -0.001796 7.322524e-04 -2.452037 0.020242 **
9) sq_gnp 4.196390e-10 2.663647e-10 1.57543 0.125645
11) sq_pop -0.024132 0.012404 -1.945501 0.061138 *
13) sq_lit -8.981328e-05 3.710849e-05 -2.420289 0.021773 **
15) sq_agr -6.125829e-05 2.222790e-05 -2.755919 0.009857 ***
Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678
Error Sum of Sq (ESS) 0.104412 Std Err of Resid. (sgmahat) 0.058995
Unadjusted R-squared 0.808 Adjusted R-squared 0.744
F-statistic (10, 30) 12.646281 pvalue = Prob(F > 12.646) is < 0.0001
Durbin-Watson Stat. 2.137629 First-order auto corr coeff -0.187
MODEL SELECTION STATISTICS
SGMASQ 0.00348 AIC 0.004355 FPE 0.004414
HQ 0.005149 SCHWARZ 0.006897 SHIBATA 0.003913
GCV 0.004757 RICE 0.005495
Excluding the constant, p-value was highest for variable 3 (gdp)
Model selection statistics have decreased (i.e. improved) for 4 criteria
?omit gdp ;
gdp is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).
[Model 1F]
OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41
Dependent variable - Y
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)
0) constant -0.167886 0.159756 -1.050889 0.301433
2) gnp -8.909629e-06 6.660526e-06 -1.337676 0.190734
4) pop 0.154093 0.040715 3.784713 0.000662 ***
5) urb 0.001859 7.949185e-04 2.337979 0.026017 **
6) lit 0.01447 0.004817 3.003861 0.005237 ***
7) edu -0.001609 7.205727e-04 -2.233035 0.032904 **
9) sq_gnp 3.877973e-10 2.668992e-10 1.452973 0.15628
11) sq_pop -0.028615 0.01191 -2.40263 0.022453 **
13) sq_lit -1.031931e-04 3.563894e-05 -2.895516 0.006879 ***
15) sq_agr -6.333251e-05 2.231615e-05 -2.837967 0.007938 ***
Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678
Error Sum of Sq (ESS) 0.109414 Std Err of Resid. (sgmahat) 0.059409
Unadjusted R-squared 0.799 Adjusted R-squared 0.741
F-statistic (9, 31) 13.698576 pvalue = Prob(F > 13.699) is < 0.0001
Durbin-Watson Stat. 2.188829 First-order auto corr coeff -0.261
MODEL SELECTION STATISTICS
SGMASQ 0.003529 AIC 0.004347 FPE 0.00439
HQ 0.005061 SCHWARZ 0.006602 SHIBATA 0.00397
GCV 0.004668 RICE 0.00521
Excluding the constant, p-value was highest for variable 2 (gnp)
Model selection statistics have decreased (i.e. improved) for 6 criteria
?omit gnp ;
gnp is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).
[Model 1G]
OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41
Dependent variable - Y
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)
0) constant -0.177032 0.161567 -1.095718 0.281381
4) pop 0.15611 0.041186 3.7904 0.000629 ***
5) urb 0.001313 6.905051e-04 1.90085 0.066361 *
6) lit 0.014976 0.004861 3.080763 0.004222 ***
7) edu -0.001564 7.285928e-04 -2.145997 0.039559 **
9) sq_gnp 4.154543e-11 6.586736e-11 0.630744 0.532687
11) sq_pop -0.02821 0.012052 -2.340684 0.025643 **
13) sq_lit -1.069878e-04 3.596136e-05 -2.975076 0.005535 ***
15) sq_agr -6.615711e-05 2.248837e-05 -2.941836 0.006023 ***
Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678
Error Sum of Sq (ESS) 0.11573 Std Err of Resid. (sgmahat) 0.060138
Unadjusted R-squared 0.787 Adjusted R-squared 0.734
F-statistic (8, 32) 14.821605 pvalue = Prob(F > 14.822) is < 0.0001
Durbin-Watson Stat. 2.318174 First-order auto corr coeff -0.301
MODEL SELECTION STATISTICS
SGMASQ 0.003617 AIC 0.004379 FPE 0.00441
HQ 0.005021 SCHWARZ 0.006378 SHIBATA 0.004062
GCV 0.004634 RICE 0.005032
Excluding the constant, p-value was highest for variable 9 (sq_gnp)
Model selection statistics have decreased (i.e. improved) for 4 criteria
?omit sq_gnp ;
sq_gnp is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).
[Model 1H]
OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41
Dependent variable - Y
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)
0) constant -0.165729 0.159098 -1.041675 0.305134
4) pop 0.152028 0.040301 3.772298 0.000639 ***
5) urb 0.001393 6.725382e-04 2.070568 0.046297 **
6) lit 0.014475 0.004752 3.046213 0.004533 ***
7) edu -0.001439 6.948961e-04 -2.070826 0.046271 **
11) sq_pop -0.027168 0.011829 -2.296734 0.028117 **
13) sq_lit -1.037193e-04 3.525987e-05 -2.941567 0.005931 ***
15) sq_agr -6.449861e-05 2.212942e-05 -2.91461 0.006352 ***
Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678
Error Sum of Sq (ESS) 0.117168 Std Err of Resid. (sgmahat) 0.059587
Unadjusted R-squared 0.785 Adjusted R-squared 0.739
F-statistic (7, 33) 17.195923 pvalue = Prob(F > 17.196) is < 0.0001
Durbin-Watson Stat. 2.252419 First-order auto corr coeff -0.259
MODEL SELECTION STATISTICS
SGMASQ 0.003551 AIC 0.004222 FPE 0.004243
HQ 0.004769 SCHWARZ 0.005898 SHIBATA 0.003973
GCV 0.004411 RICE 0.004687
Model selection statistics have decreased (i.e. improved) for 8 criteria
?
I.3
Using the significance of coefficients as the criterion, we could conclude that the last model [Model 1H] is the best model. All the variables in the model are significant at 5% level. It also has the lowest values for most of the model selection statistics among all the models estimated. However, some could argue that there exists omitted variable bias. If we compare this model [Model 1D] with the 5th model [Model 1E], we can see that the omission of variable 10 has worsened the P-value for three other variables. It improves only 4 out of the 8 model selection statistics. Both R-square and adjusted R-squared decreased as a result. Thus, some could conclude that the 4th model is the best model.
Part two
II.1
Reading header file data4-15.hdr
List of variables
0) const 1) Y 2) gnp 3) gdp 4) pop
5) urb 6) lit 7) edu 8) agr
period: 1, maxobs: 41, obs range: full 1-41, current 1-41
Reading datafile data4-15 BY OBSERVATIONS
?square gnp gdp pop urb lit edu agr ;
Created sq_gnp = gnp squared as var no. 9
Created sq_gdp = gdp squared as var no. 10
Created sq_pop = pop squared as var no. 11
Created sq_urb = urb squared as var no. 12
Created sq_lit = lit squared as var no. 13
Created sq_edu = edu squared as var no. 14
Created sq_agr = agr squared as var no. 15
List of variables
0) const 1) Y 2) gnp 3) gdp 4) pop
5) urb 6) lit 7) edu 8) agr 9) sq_gnp
10) sq_gdp 11) sq_pop 12) sq_urb 13) sq_lit 14) sq_edu
15) sq_agr
?ols Y const gnp gdp pop urb lit edu agr ;
OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41
Dependent variable - Y
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)
0) constant 0.27196 0.119047 2.284477 0.028907 **
2) gnp -1.913184e-06 1.809138e-06 -1.057511 0.297958
3) gdp 0.005726 0.002632 2.175465 0.03686 **
4) pop 0.077401 0.017472 4.429995 < 0.0001 ***
5) urb 0.00198 8.379782e-04 2.363365 0.024153 **
6) lit 6.131706e-04 0.001167 0.525254 0.602918
7) edu -0.002043 8.230743e-04 -2.482524 0.018306 **
8) agr -0.003353 0.001543 -2.172969 0.037063 **
Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678
Error Sum of Sq (ESS) 0.150798 Std Err of Resid. (sgmahat) 0.067599
Unadjusted R-squared 0.723 Adjusted R-squared 0.664
F-statistic (7, 33) 12.309711 pvalue = Prob(F > 12.310) is < 0.0001
Durbin-Watson Stat. 2.027037 First-order auto corr coeff -0.095
MODEL SELECTION STATISTICS
SGMASQ 0.00457 AIC 0.005434 FPE 0.005461
HQ 0.006137 SCHWARZ 0.007591 SHIBATA 0.005113
GCV 0.005677 RICE 0.006032
Excluding the constant, p-value was highest for variable 6 (lit)
?genr ut = uhat
Generated var. no. 16 (ut)
II.2
?ols ut const gnp gdp pop urb lit edu agr sq_gnp sq_gdp sq_pop sq_urb
sq_lit sq_edu sq_agr ;
OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41
Dependent variable - ut
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)
0) constant -0.454278 0.189029 -2.403213 0.023681 **
2) gnp -8.749547e-06 8.208931e-06 -1.065857 0.296286
3) gdp -9.398834e-04 0.003182 -0.295382 0.770048
4) pop 0.080434 0.050219 1.60164 0.121317
5) urb 0.002109 0.004462 0.472726 0.640353
6) lit 0.012063 0.006189 1.949149 0.062143 *
7) edu 1.348771e-04 0.003106 0.043426 0.965693
8) agr 0.003746 0.003183 1.176831 0.249925
9) sq_gnp 4.370790e-10 3.088420e-10 1.415219 0.168871
10) sq_gdp -3.617565e-04 3.872316e-04 -0.934212 0.35879
11) sq_pop -0.030838 0.015053 -2.048618 0.05072 *
12) sq_urb -1.432770e-05 3.581574e-05 -0.400039 0.692395
13) sq_lit -8.845356e-05 4.558500e-05 -1.94041 0.063247 *
14) sq_edu -9.977581e-07 2.364934e-05 -0.04219 0.96667
15) sq_agr -6.011323e-05 5.627975e-05 -1.068115 0.295286
Mean of dep. var. -1.356410e-17 S.D. of dep. variable 6.139990e-02
Error Sum of Sq (ESS) 0.097731 Std Err of Resid. (sgmahat) 0.06131
Unadjusted R-squared 0.352 Adjusted R-squared 0.003
F-statistic (14, 26) 1.008413 pvalue = Prob(F > 1.008) is 0.474137
Durbin-Watson Stat. 2.279825 First-order auto corr coeff -0.219
MODEL SELECTION STATISTICS
SGMASQ 0.003759 AIC 0.004955 FPE 0.005134
HQ 0.006225 SCHWARZ 0.009275 SHIBATA 0.004128
GCV 0.005927 RICE 0.008885
Excluding the constant, p-value was highest for variable 14 (sq_edu)
?genr lm = $nrsq
Generated var. no. 17 (lm)
?pvalue 3 7 lm
For Chi-square (7), area to the right of 14.428242 is 0.044069
II.3
The null hypothesis for the LM test is that the coefficients for the 7 added squared variables (variable 9 through 15) are all zero. The LM test statistics is distributed as chi-square with 7 (the number of restrictions) degrees of freedom. The p-value of 0.044069 suggests that we are "safe" in rejecting the null hypothesis at 5% level (The probability of make a Type I error is small). You can also get the LM* for 5% and 7 d.f. by looking up the Chi-square table (LM*=14.0671). Since LM (14.4283) > LM*, we can then reject the null with 95% of confidence. We can conclude that at least one of the seven added square terms belong to the model.
II.4
Using the above auxiliary regression, we can select new variables to be added to the basic model we estimated at the beginning of Part two. If we use 0.5 p-value as the cut off rule, we would include variables 9 10 11 13 and 15. We will next re-estimate the model using Y (Gini, variable 1) as dependent variable rather than ut (variable 16). The independent variables include those in the basic models (2 3 4 5 6 7 8) and those we decide to add (9 10 11 13 and 15).
?ols Y const gnp gdp pop urb lit edu agr sq_gnp sq_gdp sq_pop sq_lit
sq_agr ;
[Model 2A]
OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41
Dependent variable - Y
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)
0) constant -0.167179 0.178487 -0.936648 0.356949
2) gnp -1.217546e-05 7.068469e-06 -1.722504 0.096008 *
3) gdp 0.005014 0.002961 1.693201 0.101516
4) pop 0.149604 0.043982 3.401499 0.002035 ***
5) urb 0.002371 8.508392e-04 2.786557 0.009456 ***
6) lit 0.013724 0.005095 2.693606 0.011806 **
7) edu -0.001826 7.452389e-04 -2.450402 0.020778 **
8) agr 3.527296e-04 0.003035 0.116213 0.908313
9) sq_gnp 4.817411e-10 2.750036e-10 1.751763 0.090762 *
10) sq_gdp -4.348464e-04 3.317064e-04 -1.310938 0.200529
11) sq_pop -0.027918 0.012809 -2.179594 0.037856 **
13) sq_lit -9.604922e-05 3.764087e-05 -2.551727 0.016465 **
15) sq_agr -6.805355e-05 4.998254e-05 -1.361546 0.184197
Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678
Error Sum of Sq (ESS) 0.09836 Std Err of Resid. (sgmahat) 0.05927
Unadjusted R-squared 0.819 Adjusted R-squared 0.742
F-statistic (12, 28) 10.584719 pvalue = Prob(F > 10.585) is < 0.0001
Durbin-Watson Stat. 2.289221 First-order auto corr coeff -0.223
MODEL SELECTION STATISTICS
SGMASQ 0.003513 AIC 0.004523 FPE 0.004627
HQ 0.005513 SCHWARZ 0.007788 SHIBATA 0.00392
GCV 0.005144 RICE 0.006557
Excluding the constant, p-value was highest for variable 8 (agr)
II.5
?omit agr ;
agr is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).
[Model 2B]
OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41
Dependent variable - Y
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)
0) constant -0.159301 0.16228 -0.981647 0.334393
2) gnp -1.233683e-05 6.811839e-06 -1.811087 0.080498 *
3) gdp 0.005028 0.002908 1.729026 0.094438 *
4) pop 0.149741 0.043212 3.465298 0.00167 ***
5) urb 0.002365 8.347003e-04 2.833244 0.008299 ***
6) lit 0.013662 0.00498 2.743428 0.010317 **
7) edu -0.001839 7.237923e-04 -2.541356 0.016649 **
9) sq_gnp 4.861328e-10 2.677216e-10 1.815815 0.079752 *
10) sq_gdp -4.333227e-04 3.257609e-04 -1.330186 0.193824
11) sq_pop -0.027966 0.012583 -2.222619 0.034198 **
13) sq_lit -9.575821e-05 3.691315e-05 -2.594149 0.014715 **
15) sq_agr -6.285884e-05 2.198119e-05 -2.859664 0.00778 ***
Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678
Error Sum of Sq (ESS) 0.098408 Std Err of Resid. (sgmahat) 0.058253
Unadjusted R-squared 0.819 Adjusted R-squared 0.751
F-statistic (11, 29) 11.952321 pvalue = Prob(F > 11.952) is < 0.0001
Durbin-Watson Stat. 2.295352 First-order auto corr coeff -0.228
MODEL SELECTION STATISTICS
SGMASQ 0.003393 AIC 0.00431 FPE 0.004387
HQ 0.005173 SCHWARZ 0.007117 SHIBATA 0.003805
GCV 0.004798 RICE 0.005789
Excluding the constant, p-value was highest for variable 10 (sq_gdp)
Model selection statistics have decreased (i.e. improved) for 8 criteria
?omit sq_gdp ;
sq_gdp is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).
[Model 2C]
OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41
Dependent variable - Y
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)
0) constant -0.126216 0.162405 -0.777163 0.443148
2) gnp -1.014652e-05 6.694049e-06 -1.515753 0.140049
3) gdp 0.003015 0.002515 1.198806 0.239988
4) pop 0.137445 0.042749 3.215154 0.003114 ***
5) urb 0.002179 8.333881e-04 2.6145 0.013843 **
6) lit 0.012746 0.004995 2.55184 0.01605 **
7) edu -0.001796 7.322524e-04 -2.452037 0.020242 **
9) sq_gnp 4.196390e-10 2.663647e-10 1.57543 0.125645
11) sq_pop -0.024132 0.012404 -1.945501 0.061138 *
13) sq_lit -8.981328e-05 3.710849e-05 -2.420289 0.021773 **
15) sq_agr -6.125829e-05 2.222790e-05 -2.755919 0.009857 ***
Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678
Error Sum of Sq (ESS) 0.104412 Std Err of Resid. (sgmahat) 0.058995
Unadjusted R-squared 0.808 Adjusted R-squared 0.744
F-statistic (10, 30) 12.646281 pvalue = Prob(F > 12.646) is < 0.0001
Durbin-Watson Stat. 2.137629 First-order auto corr coeff -0.187
MODEL SELECTION STATISTICS
SGMASQ 0.00348 AIC 0.004355 FPE 0.004414
HQ 0.005149 SCHWARZ 0.006897 SHIBATA 0.003913
GCV 0.004757 RICE 0.005495
Excluding the constant, p-value was highest for variable 3 (gdp)
Model selection statistics have decreased (i.e. improved) for 4 criteria
?omit gdp ;
gdp is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).
[Model 2D]
OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41
Dependent variable - Y
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)
0) constant -0.167886 0.159756 -1.050889 0.301433
2) gnp -8.909629e-06 6.660526e-06 -1.337676 0.190734
4) pop 0.154093 0.040715 3.784713 0.000662 ***
5) urb 0.001859 7.949185e-04 2.337979 0.026017 **
6) lit 0.01447 0.004817 3.003861 0.005237 ***
7) edu -0.001609 7.205727e-04 -2.233035 0.032904 **
9) sq_gnp 3.877973e-10 2.668992e-10 1.452973 0.15628
11) sq_pop -0.028615 0.01191 -2.40263 0.022453 **
13) sq_lit -1.031931e-04 3.563894e-05 -2.895516 0.006879 ***
15) sq_agr -6.333251e-05 2.231615e-05 -2.837967 0.007938 ***
Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678
Error Sum of Sq (ESS) 0.109414 Std Err of Resid. (sgmahat) 0.059409
Unadjusted R-squared 0.799 Adjusted R-squared 0.741
F-statistic (9, 31) 13.698576 pvalue = Prob(F > 13.699) is < 0.0001
Durbin-Watson Stat. 2.188829 First-order auto corr coeff -0.261
MODEL SELECTION STATISTICS
SGMASQ 0.003529 AIC 0.004347 FPE 0.00439
HQ 0.005061 SCHWARZ 0.006602 SHIBATA 0.00397
GCV 0.004668 RICE 0.00521
Excluding the constant, p-value was highest for variable 2 (gnp)
Model selection statistics have decreased (i.e. improved) for 6 criteria
?omit gnp ;
gnp is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).
[Model 2E]
OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41
Dependent variable - Y
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)
0) constant -0.177032 0.161567 -1.095718 0.281381
4) pop 0.15611 0.041186 3.7904 0.000629 ***
5) urb 0.001313 6.905051e-04 1.90085 0.066361 *
6) lit 0.014976 0.004861 3.080763 0.004222 ***
7) edu -0.001564 7.285928e-04 -2.145997 0.039559 **
9) sq_gnp 4.154543e-11 6.586736e-11 0.630744 0.532687
11) sq_pop -0.02821 0.012052 -2.340684 0.025643 **
13) sq_lit -1.069878e-04 3.596136e-05 -2.975076 0.005535 ***
15) sq_agr -6.615711e-05 2.248837e-05 -2.941836 0.006023 ***
Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678
Error Sum of Sq (ESS) 0.11573 Std Err of Resid. (sgmahat) 0.060138
Unadjusted R-squared 0.787 Adjusted R-squared 0.734
F-statistic (8, 32) 14.821605 pvalue = Prob(F > 14.822) is < 0.0001
Durbin-Watson Stat. 2.318174 First-order auto corr coeff -0.301
MODEL SELECTION STATISTICS
SGMASQ 0.003617 AIC 0.004379 FPE 0.00441
HQ 0.005021 SCHWARZ 0.006378 SHIBATA 0.004062
GCV 0.004634 RICE 0.005032
Excluding the constant, p-value was highest for variable 9 (sq_gnp)
Model selection statistics have decreased (i.e. improved) for 4 criteria
?omit sq_gnp ;
sq_gnp is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).
[Model 2F]
OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41
Dependent variable - Y
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)
0) constant -0.165729 0.159098 -1.041675 0.305134
4) pop 0.152028 0.040301 3.772298 0.000639 ***
5) urb 0.001393 6.725382e-04 2.070568 0.046297 **
6) lit 0.014475 0.004752 3.046213 0.004533 ***
7) edu -0.001439 6.948961e-04 -2.070826 0.046271 **
11) sq_pop -0.027168 0.011829 -2.296734 0.028117 **
13) sq_lit -1.037193e-04 3.525987e-05 -2.941567 0.005931 ***
15) sq_agr -6.449861e-05 2.212942e-05 -2.91461 0.006352 ***
Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678
Error Sum of Sq (ESS) 0.117168 Std Err of Resid. (sgmahat) 0.059587
Unadjusted R-squared 0.785 Adjusted R-squared 0.739
F-statistic (7, 33) 17.195923 pvalue = Prob(F > 17.196) is < 0.0001
Durbin-Watson Stat. 2.252419 First-order auto corr coeff -0.259
MODEL SELECTION STATISTICS
SGMASQ 0.003551 AIC 0.004222 FPE 0.004243
HQ 0.004769 SCHWARZ 0.005898 SHIBATA 0.003973
GCV 0.004411 RICE 0.004687
Model selection statistics have decreased (i.e. improved) for 8 criteria
?
II.6
Notice that the second model [Model 2B] is the same as the 4th model [model 1D] in part one. By using the same criterion to omit variables, we reach the same final model as we did in part one. Based on different believes, some could conclude that the last model in both parts is the best model. Others may argue that the 4th model in part one (or equivalently the second model in part two) is the best model.