The Power of Tests for Detecting p-Hacking

p-Hacking can undermine the validity of empirical studies. A flourishing empirical literature investigates the prevalence of p-hacking based on the empirical distribution of reported p-values across studies. Interpreting results in this literature requires a careful understanding of the power of methods used to detect different types of phacking. We theoretically study the implications of likely forms of p-hacking on the distribution of reported p-values and the power of existing methods for detecting it. Power can be quite low, depending crucially on the particular p-hacking strategy and the distribution of actual effects tested by the studies. Publication bias can enhance the power for testing the joint null hypothesis of no p-hacking and no publication bias. We relate the power of the tests to the costs of p-hacking and show that power tends to be larger when p-hacking is very costly. Monte Carlo simulations support our theoretical results. (joint with N.Kudrin and K. Weutrich) Replication Package