Draft answer sheet for homework one

Fall ECON 120 C

Part one

I.1

Reading header file data4-15.hdr

List of variables

0) const 1) Y 2) gnp 3) gdp 4) pop

5) urb 6) lit 7) edu 8) agr

period: 1, maxobs: 41, obs range: full 1-41, current 1-41

Reading datafile data4-15 BY OBSERVATIONS

?summary ;

SUMMARY STATISTICS (USING THE OBSERVATIONS 1-41)

(missing values denoted by -999 will be skipped)

 

Variable MEAN S.D. C.V. MIN

Y 0.38088 0.116678 0.306339 0.183

gnp 7024.878049 8142.01453 1.159026 200

gdp 3.397561 4.405082 1.296542 -8.9

pop 1.387805 0.97396 0.701799 0.1

urb 62.739024 22.712781 0.36202 10.9

lit 87.207317 17.169834 0.196885 29.2

edu 67.536585 26.689977 0.395193 12

agr 13.195122 12.34275 0.935402 0.7

 

Variable MEDIAN MAX SKEW EXCSKURT

Y 0.3732 0.596 0.221479 -1.065527

gnp 2590 25450 0.968924 -0.681408

gdp 2.9 12.7 -0.419326 0.869507

pop 1.4 3.3 0.362972 -1.19148

urb 67.1 100 -0.624772 -0.544407

lit 95 99 -1.737704 2.327503

edu 75 99 -0.528941 -1.027911

agr 9.6 56.5 1.785652 3.096763

?square gnp gdp pop urb lit edu agr ;

Created sq_gnp = gnp squared as var no. 9

Created sq_gdp = gdp squared as var no. 10

Created sq_pop = pop squared as var no. 11

Created sq_urb = urb squared as var no. 12

Created sq_lit = lit squared as var no. 13

Created sq_edu = edu squared as var no. 14

Created sq_agr = agr squared as var no. 15

List of variables

0) const 1) Y 2) gnp 3) gdp 4) pop

5) urb 6) lit 7) edu 8) agr 9) sq_gnp

10) sq_gdp 11) sq_pop 12) sq_urb 13) sq_lit 14) sq_edu

15) sq_agr

[Model 1A]

?ols Y const gnp gdp pop urb lit edu agr sq_gnp sq_gdp sq_pop sq_urb

sq_lit sq_edu sq_agr ;

OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41

Dependent variable - Y

VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)

0) constant -0.182317 0.189029 -0.964492 0.343685

2) gnp -1.066273e-05 8.208931e-06 -1.298918 0.205374

3) gdp 0.004786 0.003182 1.5042 0.144582

4) pop 0.157835 0.050219 3.142899 0.004149 ***

5) urb 0.00409 0.004462 0.91654 0.367807

6) lit 0.012676 0.006189 2.048229 0.050761 *

7) edu -0.001908 0.003106 -0.614457 0.544254

8) agr 3.928915e-04 0.003183 0.123442 0.902706

9) sq_gnp 4.370790e-10 3.088420e-10 1.415219 0.168871

10) sq_gdp -3.617565e-04 3.872316e-04 -0.934212 0.35879

11) sq_pop -0.030838 0.015053 -2.048618 0.05072 *

12) sq_urb -1.432770e-05 3.581574e-05 -0.400039 0.692395

13) sq_lit -8.845356e-05 4.558500e-05 -1.94041 0.063247 *

14) sq_edu -9.977581e-07 2.364934e-05 -0.04219 0.96667

15) sq_agr -6.011323e-05 5.627975e-05 -1.068115 0.295286

Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678

Error Sum of Sq (ESS) 0.097731 Std Err of Resid. (sgmahat) 0.06131

Unadjusted R-squared 0.821 Adjusted R-squared 0.724

F-statistic (14, 26) 8.490811 pvalue = Prob(F > 8.491) is < 0.0001

Durbin-Watson Stat. 2.279825 First-order auto corr coeff -0.219

MODEL SELECTION STATISTICS

SGMASQ 0.003759 AIC 0.004955 FPE 0.005134

HQ 0.006225 SCHWARZ 0.009275 SHIBATA 0.004128

GCV 0.005927 RICE 0.008885

Excluding the constant, p-value was highest for variable 14 (sq_edu)

I.2

?omit sq_edu ;

sq_edu is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).

[Model 1B]

OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41

Dependent variable - Y

VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)

0) constant -0.182853 0.185083 -0.98795 0.331949

2) gnp -1.065729e-05 8.054762e-06 -1.323105 0.196897

3) gdp 0.004816 0.003043 1.582488 0.125183

4) pop 0.157458 0.048495 3.246878 0.003111 ***

5) urb 0.004123 0.004311 0.956392 0.34736

6) lit 0.01277 0.00566 2.256187 0.032364 **

7) edu -0.002034 9.067779e-04 -2.242585 0.033335 **

8) agr 4.139877e-04 0.003085 0.13421 0.894232

9) sq_gnp 4.354337e-10 3.006532e-10 1.448292 0.159049

10) sq_gdp -3.617811e-04 3.800056e-04 -0.952041 0.349522

11) sq_pop -0.03079 0.014731 -2.090173 0.046152 **

12) sq_urb -1.449289e-05 3.493677e-05 -0.414832 0.681543

13) sq_lit -8.915318e-05 4.166956e-05 -2.139528 0.041587 **

15) sq_agr -6.065308e-05 5.378316e-05 -1.127734 0.269355

Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678

Error Sum of Sq (ESS) 0.097738 Std Err of Resid. (sgmahat) 0.060166

Unadjusted R-squared 0.821 Adjusted R-squared 0.734

F-statistic (13, 27) 9.494849 pvalue = Prob(F > 9.495) is < 0.0001

Durbin-Watson Stat. 2.279141 First-order auto corr coeff -0.219

MODEL SELECTION STATISTICS

SGMASQ 0.00362 AIC 0.004719 FPE 0.004856

HQ 0.00584 SCHWARZ 0.008472 SHIBATA 0.004012

GCV 0.005497 RICE 0.007518

Excluding the constant, p-value was highest for variable 8 (agr)

Model selection statistics have decreased (i.e. improved) for 8 criteria

?omit agr ;

agr is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).

[Model 1C]

OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41

Dependent variable - Y

VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)

0) constant -0.173385 0.168084 -1.031536 0.311118

2) gnp -1.086977e-05 7.757925e-06 -1.401118 0.172166

3) gdp 0.004836 0.002986 1.619366 0.116579

4) pop 0.157497 0.047636 3.306237 0.002599 ***

5) urb 0.004089 0.004227 0.967238 0.341706

6) lit 0.012712 0.005544 2.293097 0.029559 **

7) edu -0.002046 8.861458e-04 -2.308723 0.028556 **

9) sq_gnp 4.412936e-10 2.922032e-10 1.510228 0.142189

10) sq_gdp -3.611284e-04 3.732520e-04 -0.967519 0.341568

11) sq_pop -0.030802 0.01447 -2.128673 0.042216 **

12) sq_urb -1.426842e-05 3.427932e-05 -0.41624 0.680408

13) sq_lit -8.891922e-05 4.089650e-05 -2.17425 0.038294 **

15) sq_agr -5.468481e-05 2.971522e-05 -1.840296 0.076347 *

Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678

Error Sum of Sq (ESS) 0.097803 Std Err of Resid. (sgmahat) 0.059101

Unadjusted R-squared 0.820 Adjusted R-squared 0.743

F-statistic (12, 28) 10.658386 pvalue = Prob(F > 10.658) is < 0.0001

Durbin-Watson Stat. 2.288252 First-order auto corr coeff -0.227

MODEL SELECTION STATISTICS

SGMASQ 0.003493 AIC 0.004498 FPE 0.0046

HQ 0.005481 SCHWARZ 0.007744 SHIBATA 0.003898

GCV 0.005115 RICE 0.00652

Excluding the constant, p-value was highest for variable 12 (sq_urb)

Model selection statistics have decreased (i.e. improved) for 8 criteria

?omit sq_urb ;

sq_urb is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).

[Model 1D]

OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41

Dependent variable - Y

VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)

0) constant -0.159301 0.16228 -0.981647 0.334393

2) gnp -1.233683e-05 6.811839e-06 -1.811087 0.080498 *

3) gdp 0.005028 0.002908 1.729026 0.094438 *

4) pop 0.149741 0.043212 3.465298 0.00167 ***

5) urb 0.002365 8.347003e-04 2.833244 0.008299 ***

6) lit 0.013662 0.00498 2.743428 0.010317 **

7) edu -0.001839 7.237923e-04 -2.541356 0.016649 **

9) sq_gnp 4.861328e-10 2.677216e-10 1.815815 0.079752 *

10) sq_gdp -4.333227e-04 3.257609e-04 -1.330186 0.193824

11) sq_pop -0.027966 0.012583 -2.222619 0.034198 **

13) sq_lit -9.575821e-05 3.691315e-05 -2.594149 0.014715 **

15) sq_agr -6.285884e-05 2.198119e-05 -2.859664 0.00778 ***

Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678

Error Sum of Sq (ESS) 0.098408 Std Err of Resid. (sgmahat) 0.058253

Unadjusted R-squared 0.819 Adjusted R-squared 0.751

F-statistic (11, 29) 11.952321 pvalue = Prob(F > 11.952) is < 0.0001

Durbin-Watson Stat. 2.295352 First-order auto corr coeff -0.228

MODEL SELECTION STATISTICS

SGMASQ 0.003393 AIC 0.00431 FPE 0.004387

HQ 0.005173 SCHWARZ 0.007117 SHIBATA 0.003805

GCV 0.004798 RICE 0.005789

Excluding the constant, p-value was highest for variable 10 (sq_gdp)

Model selection statistics have decreased (i.e. improved) for 8 criteria

?omit sq_gdp ;

sq_gdp is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).

[Model 1E]

OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41

Dependent variable - Y

VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)

0) constant -0.126216 0.162405 -0.777163 0.443148

2) gnp -1.014652e-05 6.694049e-06 -1.515753 0.140049

3) gdp 0.003015 0.002515 1.198806 0.239988

4) pop 0.137445 0.042749 3.215154 0.003114 ***

5) urb 0.002179 8.333881e-04 2.6145 0.013843 **

6) lit 0.012746 0.004995 2.55184 0.01605 **

7) edu -0.001796 7.322524e-04 -2.452037 0.020242 **

9) sq_gnp 4.196390e-10 2.663647e-10 1.57543 0.125645

11) sq_pop -0.024132 0.012404 -1.945501 0.061138 *

13) sq_lit -8.981328e-05 3.710849e-05 -2.420289 0.021773 **

15) sq_agr -6.125829e-05 2.222790e-05 -2.755919 0.009857 ***

Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678

Error Sum of Sq (ESS) 0.104412 Std Err of Resid. (sgmahat) 0.058995

Unadjusted R-squared 0.808 Adjusted R-squared 0.744

F-statistic (10, 30) 12.646281 pvalue = Prob(F > 12.646) is < 0.0001

Durbin-Watson Stat. 2.137629 First-order auto corr coeff -0.187

MODEL SELECTION STATISTICS

SGMASQ 0.00348 AIC 0.004355 FPE 0.004414

HQ 0.005149 SCHWARZ 0.006897 SHIBATA 0.003913

GCV 0.004757 RICE 0.005495

Excluding the constant, p-value was highest for variable 3 (gdp)

Model selection statistics have decreased (i.e. improved) for 4 criteria

?omit gdp ;

gdp is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).

[Model 1F]

OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41

Dependent variable - Y

VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)

0) constant -0.167886 0.159756 -1.050889 0.301433

2) gnp -8.909629e-06 6.660526e-06 -1.337676 0.190734

4) pop 0.154093 0.040715 3.784713 0.000662 ***

5) urb 0.001859 7.949185e-04 2.337979 0.026017 **

6) lit 0.01447 0.004817 3.003861 0.005237 ***

7) edu -0.001609 7.205727e-04 -2.233035 0.032904 **

9) sq_gnp 3.877973e-10 2.668992e-10 1.452973 0.15628

11) sq_pop -0.028615 0.01191 -2.40263 0.022453 **

13) sq_lit -1.031931e-04 3.563894e-05 -2.895516 0.006879 ***

15) sq_agr -6.333251e-05 2.231615e-05 -2.837967 0.007938 ***

Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678

Error Sum of Sq (ESS) 0.109414 Std Err of Resid. (sgmahat) 0.059409

Unadjusted R-squared 0.799 Adjusted R-squared 0.741

F-statistic (9, 31) 13.698576 pvalue = Prob(F > 13.699) is < 0.0001

Durbin-Watson Stat. 2.188829 First-order auto corr coeff -0.261

MODEL SELECTION STATISTICS

SGMASQ 0.003529 AIC 0.004347 FPE 0.00439

HQ 0.005061 SCHWARZ 0.006602 SHIBATA 0.00397

GCV 0.004668 RICE 0.00521

Excluding the constant, p-value was highest for variable 2 (gnp)

Model selection statistics have decreased (i.e. improved) for 6 criteria

?omit gnp ;

gnp is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).

[Model 1G]

OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41

Dependent variable - Y

VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)

0) constant -0.177032 0.161567 -1.095718 0.281381

4) pop 0.15611 0.041186 3.7904 0.000629 ***

5) urb 0.001313 6.905051e-04 1.90085 0.066361 *

6) lit 0.014976 0.004861 3.080763 0.004222 ***

7) edu -0.001564 7.285928e-04 -2.145997 0.039559 **

9) sq_gnp 4.154543e-11 6.586736e-11 0.630744 0.532687

11) sq_pop -0.02821 0.012052 -2.340684 0.025643 **

13) sq_lit -1.069878e-04 3.596136e-05 -2.975076 0.005535 ***

15) sq_agr -6.615711e-05 2.248837e-05 -2.941836 0.006023 ***

Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678

Error Sum of Sq (ESS) 0.11573 Std Err of Resid. (sgmahat) 0.060138

Unadjusted R-squared 0.787 Adjusted R-squared 0.734

F-statistic (8, 32) 14.821605 pvalue = Prob(F > 14.822) is < 0.0001

Durbin-Watson Stat. 2.318174 First-order auto corr coeff -0.301

MODEL SELECTION STATISTICS

SGMASQ 0.003617 AIC 0.004379 FPE 0.00441

HQ 0.005021 SCHWARZ 0.006378 SHIBATA 0.004062

GCV 0.004634 RICE 0.005032

Excluding the constant, p-value was highest for variable 9 (sq_gnp)

Model selection statistics have decreased (i.e. improved) for 4 criteria

?omit sq_gnp ;

sq_gnp is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).

[Model 1H]

OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41

Dependent variable - Y

VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)

0) constant -0.165729 0.159098 -1.041675 0.305134

4) pop 0.152028 0.040301 3.772298 0.000639 ***

5) urb 0.001393 6.725382e-04 2.070568 0.046297 **

6) lit 0.014475 0.004752 3.046213 0.004533 ***

7) edu -0.001439 6.948961e-04 -2.070826 0.046271 **

11) sq_pop -0.027168 0.011829 -2.296734 0.028117 **

13) sq_lit -1.037193e-04 3.525987e-05 -2.941567 0.005931 ***

15) sq_agr -6.449861e-05 2.212942e-05 -2.91461 0.006352 ***

Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678

Error Sum of Sq (ESS) 0.117168 Std Err of Resid. (sgmahat) 0.059587

Unadjusted R-squared 0.785 Adjusted R-squared 0.739

F-statistic (7, 33) 17.195923 pvalue = Prob(F > 17.196) is < 0.0001

Durbin-Watson Stat. 2.252419 First-order auto corr coeff -0.259

MODEL SELECTION STATISTICS

SGMASQ 0.003551 AIC 0.004222 FPE 0.004243

HQ 0.004769 SCHWARZ 0.005898 SHIBATA 0.003973

GCV 0.004411 RICE 0.004687

Model selection statistics have decreased (i.e. improved) for 8 criteria

?

I.3

Using the significance of coefficients as the criterion, we could conclude that the last model [Model 1H] is the best model. All the variables in the model are significant at 5% level. It also has the lowest values for most of the model selection statistics among all the models estimated. However, some could argue that there exists omitted variable bias. If we compare this model [Model 1D] with the 5th model [Model 1E], we can see that the omission of variable 10 has worsened the P-value for three other variables. It improves only 4 out of the 8 model selection statistics. Both R-square and adjusted R-squared decreased as a result. Thus, some could conclude that the 4th model is the best model.

Part two

II.1

Reading header file data4-15.hdr

List of variables

0) const 1) Y 2) gnp 3) gdp 4) pop

5) urb 6) lit 7) edu 8) agr

period: 1, maxobs: 41, obs range: full 1-41, current 1-41

Reading datafile data4-15 BY OBSERVATIONS

?square gnp gdp pop urb lit edu agr ;

Created sq_gnp = gnp squared as var no. 9

Created sq_gdp = gdp squared as var no. 10

Created sq_pop = pop squared as var no. 11

Created sq_urb = urb squared as var no. 12

Created sq_lit = lit squared as var no. 13

Created sq_edu = edu squared as var no. 14

Created sq_agr = agr squared as var no. 15

List of variables

0) const 1) Y 2) gnp 3) gdp 4) pop

5) urb 6) lit 7) edu 8) agr 9) sq_gnp

10) sq_gdp 11) sq_pop 12) sq_urb 13) sq_lit 14) sq_edu

15) sq_agr

?ols Y const gnp gdp pop urb lit edu agr ;

OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41

Dependent variable - Y

VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)

0) constant 0.27196 0.119047 2.284477 0.028907 **

2) gnp -1.913184e-06 1.809138e-06 -1.057511 0.297958

3) gdp 0.005726 0.002632 2.175465 0.03686 **

4) pop 0.077401 0.017472 4.429995 < 0.0001 ***

5) urb 0.00198 8.379782e-04 2.363365 0.024153 **

6) lit 6.131706e-04 0.001167 0.525254 0.602918

7) edu -0.002043 8.230743e-04 -2.482524 0.018306 **

8) agr -0.003353 0.001543 -2.172969 0.037063 **

Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678

Error Sum of Sq (ESS) 0.150798 Std Err of Resid. (sgmahat) 0.067599

Unadjusted R-squared 0.723 Adjusted R-squared 0.664

F-statistic (7, 33) 12.309711 pvalue = Prob(F > 12.310) is < 0.0001

Durbin-Watson Stat. 2.027037 First-order auto corr coeff -0.095

MODEL SELECTION STATISTICS

SGMASQ 0.00457 AIC 0.005434 FPE 0.005461

HQ 0.006137 SCHWARZ 0.007591 SHIBATA 0.005113

GCV 0.005677 RICE 0.006032

Excluding the constant, p-value was highest for variable 6 (lit)

?genr ut = uhat

Generated var. no. 16 (ut)

II.2

?ols ut const gnp gdp pop urb lit edu agr sq_gnp sq_gdp sq_pop sq_urb

sq_lit sq_edu sq_agr ;

OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41

Dependent variable - ut

VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)

0) constant -0.454278 0.189029 -2.403213 0.023681 **

2) gnp -8.749547e-06 8.208931e-06 -1.065857 0.296286

3) gdp -9.398834e-04 0.003182 -0.295382 0.770048

4) pop 0.080434 0.050219 1.60164 0.121317

5) urb 0.002109 0.004462 0.472726 0.640353

6) lit 0.012063 0.006189 1.949149 0.062143 *

7) edu 1.348771e-04 0.003106 0.043426 0.965693

8) agr 0.003746 0.003183 1.176831 0.249925

9) sq_gnp 4.370790e-10 3.088420e-10 1.415219 0.168871

10) sq_gdp -3.617565e-04 3.872316e-04 -0.934212 0.35879

11) sq_pop -0.030838 0.015053 -2.048618 0.05072 *

12) sq_urb -1.432770e-05 3.581574e-05 -0.400039 0.692395

13) sq_lit -8.845356e-05 4.558500e-05 -1.94041 0.063247 *

14) sq_edu -9.977581e-07 2.364934e-05 -0.04219 0.96667

15) sq_agr -6.011323e-05 5.627975e-05 -1.068115 0.295286

Mean of dep. var. -1.356410e-17 S.D. of dep. variable 6.139990e-02

Error Sum of Sq (ESS) 0.097731 Std Err of Resid. (sgmahat) 0.06131

Unadjusted R-squared 0.352 Adjusted R-squared 0.003

F-statistic (14, 26) 1.008413 pvalue = Prob(F > 1.008) is 0.474137

Durbin-Watson Stat. 2.279825 First-order auto corr coeff -0.219

MODEL SELECTION STATISTICS

SGMASQ 0.003759 AIC 0.004955 FPE 0.005134

HQ 0.006225 SCHWARZ 0.009275 SHIBATA 0.004128

GCV 0.005927 RICE 0.008885

Excluding the constant, p-value was highest for variable 14 (sq_edu)

?genr lm = $nrsq

Generated var. no. 17 (lm)

?pvalue 3 7 lm

For Chi-square (7), area to the right of 14.428242 is 0.044069

II.3

The null hypothesis for the LM test is that the coefficients for the 7 added squared variables (variable 9 through 15) are all zero. The LM test statistics is distributed as chi-square with 7 (the number of restrictions) degrees of freedom. The p-value of 0.044069 suggests that we are "safe" in rejecting the null hypothesis at 5% level (The probability of make a Type I error is small). You can also get the LM* for 5% and 7 d.f. by looking up the Chi-square table (LM*=14.0671). Since LM (14.4283) > LM*, we can then reject the null with 95% of confidence. We can conclude that at least one of the seven added square terms belong to the model.

II.4

Using the above auxiliary regression, we can select new variables to be added to the basic model we estimated at the beginning of Part two. If we use 0.5 p-value as the cut off rule, we would include variables 9 10 11 13 and 15. We will next re-estimate the model using Y (Gini, variable 1) as dependent variable rather than ut (variable 16). The independent variables include those in the basic models (2 3 4 5 6 7 8) and those we decide to add (9 10 11 13 and 15).

?ols Y const gnp gdp pop urb lit edu agr sq_gnp sq_gdp sq_pop sq_lit

sq_agr ;

[Model 2A]

OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41

Dependent variable - Y

VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)

0) constant -0.167179 0.178487 -0.936648 0.356949

2) gnp -1.217546e-05 7.068469e-06 -1.722504 0.096008 *

3) gdp 0.005014 0.002961 1.693201 0.101516

4) pop 0.149604 0.043982 3.401499 0.002035 ***

5) urb 0.002371 8.508392e-04 2.786557 0.009456 ***

6) lit 0.013724 0.005095 2.693606 0.011806 **

7) edu -0.001826 7.452389e-04 -2.450402 0.020778 **

8) agr 3.527296e-04 0.003035 0.116213 0.908313

9) sq_gnp 4.817411e-10 2.750036e-10 1.751763 0.090762 *

10) sq_gdp -4.348464e-04 3.317064e-04 -1.310938 0.200529

11) sq_pop -0.027918 0.012809 -2.179594 0.037856 **

13) sq_lit -9.604922e-05 3.764087e-05 -2.551727 0.016465 **

15) sq_agr -6.805355e-05 4.998254e-05 -1.361546 0.184197

Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678

Error Sum of Sq (ESS) 0.09836 Std Err of Resid. (sgmahat) 0.05927

Unadjusted R-squared 0.819 Adjusted R-squared 0.742

F-statistic (12, 28) 10.584719 pvalue = Prob(F > 10.585) is < 0.0001

Durbin-Watson Stat. 2.289221 First-order auto corr coeff -0.223

MODEL SELECTION STATISTICS

SGMASQ 0.003513 AIC 0.004523 FPE 0.004627

HQ 0.005513 SCHWARZ 0.007788 SHIBATA 0.00392

GCV 0.005144 RICE 0.006557

Excluding the constant, p-value was highest for variable 8 (agr)

II.5

?omit agr ;

agr is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).

[Model 2B]

OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41

Dependent variable - Y

VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)

0) constant -0.159301 0.16228 -0.981647 0.334393

2) gnp -1.233683e-05 6.811839e-06 -1.811087 0.080498 *

3) gdp 0.005028 0.002908 1.729026 0.094438 *

4) pop 0.149741 0.043212 3.465298 0.00167 ***

5) urb 0.002365 8.347003e-04 2.833244 0.008299 ***

6) lit 0.013662 0.00498 2.743428 0.010317 **

7) edu -0.001839 7.237923e-04 -2.541356 0.016649 **

9) sq_gnp 4.861328e-10 2.677216e-10 1.815815 0.079752 *

10) sq_gdp -4.333227e-04 3.257609e-04 -1.330186 0.193824

11) sq_pop -0.027966 0.012583 -2.222619 0.034198 **

13) sq_lit -9.575821e-05 3.691315e-05 -2.594149 0.014715 **

15) sq_agr -6.285884e-05 2.198119e-05 -2.859664 0.00778 ***

Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678

Error Sum of Sq (ESS) 0.098408 Std Err of Resid. (sgmahat) 0.058253

Unadjusted R-squared 0.819 Adjusted R-squared 0.751

F-statistic (11, 29) 11.952321 pvalue = Prob(F > 11.952) is < 0.0001

Durbin-Watson Stat. 2.295352 First-order auto corr coeff -0.228

MODEL SELECTION STATISTICS

SGMASQ 0.003393 AIC 0.00431 FPE 0.004387

HQ 0.005173 SCHWARZ 0.007117 SHIBATA 0.003805

GCV 0.004798 RICE 0.005789

Excluding the constant, p-value was highest for variable 10 (sq_gdp)

Model selection statistics have decreased (i.e. improved) for 8 criteria

?omit sq_gdp ;

sq_gdp is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).

[Model 2C]

OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41

Dependent variable - Y

VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)

0) constant -0.126216 0.162405 -0.777163 0.443148

2) gnp -1.014652e-05 6.694049e-06 -1.515753 0.140049

3) gdp 0.003015 0.002515 1.198806 0.239988

4) pop 0.137445 0.042749 3.215154 0.003114 ***

5) urb 0.002179 8.333881e-04 2.6145 0.013843 **

6) lit 0.012746 0.004995 2.55184 0.01605 **

7) edu -0.001796 7.322524e-04 -2.452037 0.020242 **

9) sq_gnp 4.196390e-10 2.663647e-10 1.57543 0.125645

11) sq_pop -0.024132 0.012404 -1.945501 0.061138 *

13) sq_lit -8.981328e-05 3.710849e-05 -2.420289 0.021773 **

15) sq_agr -6.125829e-05 2.222790e-05 -2.755919 0.009857 ***

Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678

Error Sum of Sq (ESS) 0.104412 Std Err of Resid. (sgmahat) 0.058995

Unadjusted R-squared 0.808 Adjusted R-squared 0.744

F-statistic (10, 30) 12.646281 pvalue = Prob(F > 12.646) is < 0.0001

Durbin-Watson Stat. 2.137629 First-order auto corr coeff -0.187

MODEL SELECTION STATISTICS

SGMASQ 0.00348 AIC 0.004355 FPE 0.004414

HQ 0.005149 SCHWARZ 0.006897 SHIBATA 0.003913

GCV 0.004757 RICE 0.005495

Excluding the constant, p-value was highest for variable 3 (gdp)

Model selection statistics have decreased (i.e. improved) for 4 criteria

?omit gdp ;

gdp is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).

[Model 2D]

OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41

Dependent variable - Y

VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)

0) constant -0.167886 0.159756 -1.050889 0.301433

2) gnp -8.909629e-06 6.660526e-06 -1.337676 0.190734

4) pop 0.154093 0.040715 3.784713 0.000662 ***

5) urb 0.001859 7.949185e-04 2.337979 0.026017 **

6) lit 0.01447 0.004817 3.003861 0.005237 ***

7) edu -0.001609 7.205727e-04 -2.233035 0.032904 **

9) sq_gnp 3.877973e-10 2.668992e-10 1.452973 0.15628

11) sq_pop -0.028615 0.01191 -2.40263 0.022453 **

13) sq_lit -1.031931e-04 3.563894e-05 -2.895516 0.006879 ***

15) sq_agr -6.333251e-05 2.231615e-05 -2.837967 0.007938 ***

Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678

Error Sum of Sq (ESS) 0.109414 Std Err of Resid. (sgmahat) 0.059409

Unadjusted R-squared 0.799 Adjusted R-squared 0.741

F-statistic (9, 31) 13.698576 pvalue = Prob(F > 13.699) is < 0.0001

Durbin-Watson Stat. 2.188829 First-order auto corr coeff -0.261

MODEL SELECTION STATISTICS

SGMASQ 0.003529 AIC 0.004347 FPE 0.00439

HQ 0.005061 SCHWARZ 0.006602 SHIBATA 0.00397

GCV 0.004668 RICE 0.00521

Excluding the constant, p-value was highest for variable 2 (gnp)

Model selection statistics have decreased (i.e. improved) for 6 criteria

?omit gnp ;

gnp is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).

[Model 2E]

OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41

Dependent variable - Y

VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)

0) constant -0.177032 0.161567 -1.095718 0.281381

4) pop 0.15611 0.041186 3.7904 0.000629 ***

5) urb 0.001313 6.905051e-04 1.90085 0.066361 *

6) lit 0.014976 0.004861 3.080763 0.004222 ***

7) edu -0.001564 7.285928e-04 -2.145997 0.039559 **

9) sq_gnp 4.154543e-11 6.586736e-11 0.630744 0.532687

11) sq_pop -0.02821 0.012052 -2.340684 0.025643 **

13) sq_lit -1.069878e-04 3.596136e-05 -2.975076 0.005535 ***

15) sq_agr -6.615711e-05 2.248837e-05 -2.941836 0.006023 ***

Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678

Error Sum of Sq (ESS) 0.11573 Std Err of Resid. (sgmahat) 0.060138

Unadjusted R-squared 0.787 Adjusted R-squared 0.734

F-statistic (8, 32) 14.821605 pvalue = Prob(F > 14.822) is < 0.0001

Durbin-Watson Stat. 2.318174 First-order auto corr coeff -0.301

MODEL SELECTION STATISTICS

SGMASQ 0.003617 AIC 0.004379 FPE 0.00441

HQ 0.005021 SCHWARZ 0.006378 SHIBATA 0.004062

GCV 0.004634 RICE 0.005032

Excluding the constant, p-value was highest for variable 9 (sq_gnp)

Model selection statistics have decreased (i.e. improved) for 4 criteria

?omit sq_gnp ;

sq_gnp is omitted because it has the highest P-value. It is thus the least significant variable among all (always excluding the constant term).

[Model 2F]

OLS ESTIMATES USING THE 41 OBSERVATIONS 1-41

Dependent variable - Y

VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)

0) constant -0.165729 0.159098 -1.041675 0.305134

4) pop 0.152028 0.040301 3.772298 0.000639 ***

5) urb 0.001393 6.725382e-04 2.070568 0.046297 **

6) lit 0.014475 0.004752 3.046213 0.004533 ***

7) edu -0.001439 6.948961e-04 -2.070826 0.046271 **

11) sq_pop -0.027168 0.011829 -2.296734 0.028117 **

13) sq_lit -1.037193e-04 3.525987e-05 -2.941567 0.005931 ***

15) sq_agr -6.449861e-05 2.212942e-05 -2.91461 0.006352 ***

Mean of dep. var. 0.38088 S.D. of dep. variable 0.116678

Error Sum of Sq (ESS) 0.117168 Std Err of Resid. (sgmahat) 0.059587

Unadjusted R-squared 0.785 Adjusted R-squared 0.739

F-statistic (7, 33) 17.195923 pvalue = Prob(F > 17.196) is < 0.0001

Durbin-Watson Stat. 2.252419 First-order auto corr coeff -0.259

MODEL SELECTION STATISTICS

SGMASQ 0.003551 AIC 0.004222 FPE 0.004243

HQ 0.004769 SCHWARZ 0.005898 SHIBATA 0.003973

GCV 0.004411 RICE 0.004687

Model selection statistics have decreased (i.e. improved) for 8 criteria

?

II.6

Notice that the second model [Model 2B] is the same as the 4th model [model 1D] in part one. By using the same criterion to omit variables, we reach the same final model as we did in part one. Based on different believes, some could conclude that the last model in both parts is the best model. Others may argue that the 4th model in part one (or equivalently the second model in part two) is the best model.