In 2018, De Los Reyes and Langer expanded the scope of the Evidence Base Updates series to include reviews of psychological assessment techniques. In keeping with the goal of offering clear "take-home messages" about the evidence underlying the technique, experts have proposed a rubric for evaluating the reliability and validity support. Changes in the research environment and pressures in the peer review process, as well as a lack of familiarity with some statistical methods, have created a situation where many findings that appear “excellent” in the rubric are likely to be “too good to be true,” in the sense that they are unlikely to generalize to clinical settings or are unlikely to be reproduced in independent samples. We describe several common scenarios where published results are often too good to be true, including internal consistency, inter-rater reliability, correlation, standardized mean differences, diagnostic accuracy, and global model fit statistics. Simple practices could go a long way towards improving design, reporting, and interpretation of findings. When effect sizes are in the “excellent” range for issues that have been challenging, scrutinize before celebrating. When benchmarks are available base on theory or meta-analyses, results that are moderately better than expected in the favorable direction (i.e., Cohen’s q≥+.30) also invite critical appraisal and replication before application. If readers and reviewers pull for transparency and do not unduly penalize authors who provide it, then change in research quality will be faster and both generalizability and reproducibility are likely to benefit.
This is an Accepted Manuscript of an article published by Taylor & Francis in Journal of Clinical Child & Adolescent Psychology on 16/10/2019, available online: https://www.tandfonline.com/doi/full/10.1080/15374416.2019.1669158