The optimal combination of forecasts, detailed in Bates and Granger (1969), has empirically often been overshadowed in practice by using the simple average instead. Explanations of why averaging might in practice work better than constructing the optimal combination have centered on estimation error and the effects variations of the data generating process have on this error. The flip side of this explanation is that the size of the gains must be small enough to be outweighed by the estimation error. This paper examines the sizes of the theoretical gains to optimal combination, providing bounds for the gains for restricted parameter spaces and also conditions under which averaging and optimal combination are equivalent. The paper also suggests a new method for selecting between models that appears to work well with SPF data.
p-Hacking can undermine the validity of empirical studies. A flourishing empirical literature investigates the prevalence of p-hacking based on the empirical distribution of reported p-values across studies. Interpreting results in this literature requires a careful understanding of the power of methods used to detect different types of phacking. We theoretically study the implications of likely forms of p-hacking on the distribution of reported p-values and the power of existing methods for detecting it. Power can be quite low, depending crucially on the particular p-hacking strategy and the distribution of actual effects tested by the studies. Publication bias can enhance the power for testing the joint null hypothesis of no p-hacking and no publication bias. We relate the power of the tests to the costs of p-hacking and show that power tends to be larger when p-hacking is very costly. Monte Carlo simulations support our theoretical results. (joint with N.Kudrin and K. Weutrich) Replication Package