Graham Elliott

The Power of Tests for Detecting p-Hacking

p-Hacking can undermine the validity of empirical studies. A flourishing empirical literature investigates the prevalence of p-hacking based on the empirical distribution of reported p-values across studies. Interpreting results in this literature requires a careful understanding of the power of methods used to detect different types of phacking. We theoretically study the implications of likely forms of p-hacking on the distribution of reported p-values and the power of existing methods for detecting it. Power can be quite low, depending crucially on the particular p-hacking strategy and the distribution of actual effects tested by the studies. Publication bias can enhance the power for testing the joint null hypothesis of no p-hacking and no publication bias. We relate the power of the tests to the costs of p-hacking and show that power tends to be larger when p-hacking is very costly. Monte Carlo simulations support our theoretical results. (joint with N.Kudrin and K. Weutrich) Replication Package

Combining Forecasts - On Why Averaging beats Optimal Linear Weights

A continuing puzzle in constructing a point forecast by combining individual forecasts is that simple averaging often beats estimating optimal weights (the forecast combination puzzle). Most researchers have focused on the size of estimation error other difficulties in forecasting weights, despite this estimation procedure being a simple least squares regression. For this explanation to hold, gains from using optimal weights must be small. This paper focuses on this complementary part of the argument - we ask how big can the gains from optimal combination be in empirically and theoretically reasonable situations. Under these restrictions we show that gains can indeed be small, and that for gains to be large the best approach to forecast combination is to discard some of the forecasts and average over the remaining ones. (joint with Jie Liao) Replication Package