a. X values being closer to their mean implies that Sxx is smaller. From equation 3.18 and 3.19, we see that a smaller Sxx means a larger variance. Thus the estimates are less precisely estimated and less precisely estimated and the statement is FALSE

a. (4 points) X values being closer to their mean implies that S_xx is smaller. From equation (3.18) and (3.19) , we see that a smaller S_xx means a larger variance. Thus the estimates are less precisely estimated and less precisely estimated and the statement is FALSE

b. (4 points) FALSE because for unbiasedness we need Assumption 3.3 (Each u is random variable with E(u)=0) and 3.4 (Each X_t is given and not a random variable). Violation of Assumption 3.4 implies that unbiasedness is no longer valid.

c. (4 points) Assumption 3.8 (Each u_t is distributed as N(0,s ²)) is needed only for hypothesis testing. Thus BLUE still holds and the statement is FALSE.

d. (4 points) TRUE because t- and F- distributions for the test statistics were derived from the assumption of normality which is a must for hypothesis testing.

e. (4 points) TRUE because the width of a confidence interval directly depends on the standard error of an estimate.

f. (4 points) TRUE because if Var(x) is large, then from equation (3.18) and (3.19) the variance will be smaller and hence confidence interval will be narrower.

g. (4 points) FALSE because a high p-value means rejection of H₀ might result in a high probability of Type I error. So we should not reject, implying that we should not conclude that the coefficient is significant.

h. (4 points) TRUE because a higher level of significance means a lower value for t* and hence actual |t_c| is more likely to be to the right of t*. Also, a higher level of significance means a greater chance for p-value to be below it and hence more likely for the null hypothesis to be rejected, implying significance of a coefficient.

i. (4 points) PARTLY TRUE. Violation of Assumption 3.5 (All the u’s are identically with the same variance s ²) and 3.6 (The u’s are independently distributed) only affects the BLUE property. Thus estimators are still unbiased and consistent but not BLUE.

j. (4 points) FALSE. The null hypothesis is a statement about whether or not the parameter has a certain value. This is either true or not true and therefore it is meaningless to attribute a probability to whether H₀ is true or not. However, the rejection of a true hypothesis, which is Type I error, is a random event because it can change from trial to trial. The p-value is the probability of making this type of mistake.

II. (20 points) Turn in printout.

Model A: Dependent variable - ATTEND

VARIABLE	COEFFICIENT	 STDERROR	 T STAT		 2Prob(t > |T|)
0) constant	-861.272511	 577.486631	 -1.491415	 0.140282
2) POP		0.231068	 0.042602	 5.42388	 < 0.0001 ***
4) PRIORWIN	16.617537	 3.868392	 4.295722	 < 0.0001 ***
5) CURNTWIN	16.003534	 6.325326	 2.530073	 0.013624 **
10) G5		-16.869136	 6.698056	 -2.518512	 0.01404 **
12) OTHER	-524.333953	 122.677053	 -4.2741	 < 0.0001 ***
13) TEAMS	-206.042514	 59.757073	 -3.448002	 0.000953 ***

Mean of dep. var.	 1782.86491	 S.D. of dep. variable	 	597.994624
Error Sum of Sq (ESS)	 6.810621e+06	 Std Err of Resid. (sgmahat)	309.716378
Unadjusted R-squared	 0.753		 Adjusted R-squared		0.732
F-statistic (6, 71)	 36.008266	 pvalue = Prob(F > 36.008) is	< 0.0001
Durbin-Watson Stat.	 2.236561	 First-order auto corr coeff	-0.130

Model B: Dependent variable - ATTEND

VARIABLE	 COEFFICIENT	 STDERROR	 T STAT		 2Prob(t > |T|)
0) constant	 -886.421023	 573.215343	 -1.546401	 0.126517
2) POP		 0.217937	 0.043216	 5.042996	 < 0.0001 ***
4) PRIORWIN	 17.706879	 3.910038	 4.528569	 < 0.0001 ***
5) CURNTWIN	 15.712971	 6.278862	 2.502519	 0.014669 **
6) G1		 -22.263022	 15.264011	 -1.45853	 0.149167
10) G5		 -13.162167	 7.114941	 -1.849933	 0.068544 *
12) OTHER	 -509.632258	 122.131256	 -4.172824	 < 0.0001 ***
13) TEAMS	 -190.289705	 60.263974	 -3.157603	 0.002348 ***

Mean of dep. var.	 1782.86491	 S.D. of dep. variable	 	597.994624
Error Sum of Sq (ESS)	 6.609749e+06	 Std Err of Resid. (sgmahat)	307.286498
Unadjusted R-squared	 0.760		 Adjusted R-squared		0.736
F-statistic (7, 70)	 31.65818	 pvalue = Prob(F > 31.658) is	< 0.0001
Durbin-Watson Stat.	 2.18034	 First-order auto corr coeff	-0.102

1. (8 points) By data-based model reduction technique, we can choose the model A (the last model). We omitted the variable with the least significant coefficient (the highest p-value) in each step.

2. (8 points) Each step shows reduction of model selection statistics. Note that model B (the 2^nd to the last model) has the lowest model selection statistics and all the coefficients except G1 are highly significant. Also the coefficients for PRIORWIN, G5, OTHER are very different between model A and B. Thus, the bias in omitting G5 might be serious. Based on the 5 out of 8 model-selection-criteria, model B appears to be ‘best’ and is chosen as the final model for interpretation.

3. (8 points) The population of a city, capacity of the stadium, previous year’s win of the home team, and the current years’, wins are likely to affects positively the attendance at baseball games. The measure G1 through G5, GF, OTHER, and TEAMS are likely to have negative effects.

In the regression, POP, PRIORWIN, G5, OTHER, and TEAMS had coefficients significant at levels less than 7 percent. All the coefficients had expected signs.

4. (8 points) H₀: b ₃ = b ₇ = b ₈ = b ₉ = b ₁₁ = 0 H₁: At least one of b ₃, b ₇, b ₈, b ₉ or b ₁₁ is not zero

(ESSR-ESSU)*DFU/(NR*ESSU) = (6.60e+06-6.26e+06)*65/(5*6.26e+06) = 0.71, F_5,65 » 1.95, so we can not reject the null.The result of Wald test tells us that we cannot reject the null but instead conclude that b ₃, b ₇, b ₈, b ₉, b ₁₁are jointly insignificant.

5. (8 points) The estimated value of each coefficient implies the marginal effects to ATTEND. For example, a one unit increase in POP (1000 persons) holding every other variables constant will increase 218 attendance more on average, which is a sensible value.