diff --git a/rsAbsence.Rnw b/rsAbsence.Rnw index 30ba2257c4554580780b773b197f6063e67dd4d7..2f2a1f7e8b766613a26cdf494a90f2ab3563a92b 100755 --- a/rsAbsence.Rnw +++ b/rsAbsence.Rnw @@ -196,7 +196,7 @@ equivalence testing or Bayes factors, should be used. The contextualization of null results becomes even more complicated in the setting of replication studies. In a replication study, researchers attempt to repeat an original study as closely as possible in order to assess whether -similar results can be obtained with new data. There have been various +similar results can be obtained with new data \citep{NSF2019}. There have been various large-scale replication projects in the biomedical and social sciences in the last decade \citep[among others]{Prinz2011,Begley2012,Klein2014,Opensc2015,Camerer2016,Camerer2018,Klein2018,Cova2018,Errington2021}. @@ -423,7 +423,7 @@ with confidence intervals from two RPCB study pairs. Both are ``null results'' and meet the non-significance criterion for replication success (the two-sided $p$-values are greater than 5\% in both the original and the replication study), but intuition would suggest that these two pairs are very much different. - +\todo[inline]{RH: this data is really a mess. turns out for Dawson n represents the group size (n = 6 in https://osf.io/8acw4) while in Goetz it is the sample size of the whole experiment (n = 34 and 61 in https://osf.io/acg8s).} \begin{figure}[ht] << "2-example-studies", fig.height = 3.25 >>= ## some evidence for absence of effect (when a really genereous margin Delta = 1 @@ -619,15 +619,15 @@ established treatment -- is practically equivalent to the established treatment whether an effect is practically equivalent to the value of an absent effect, usually zero. The main challenge is to specify the margin $\Delta > 0$ that defines an equivalence range $[-\Delta, +\Delta]$ in which an effect is -considered as absent for practical purposes. The goal is then to reject the -composite null hypothesis that the true effect is outside the equivalence range. -To ensure that the null hypothesis is falsely rejected at most $\alpha \times -100\%$ of the time, one either rejects it if the $(1-2\alpha)\times 100\%$ -confidence interval for the effect is contained within the equivalence range -(for example, a 90\% confidence interval for $\alpha = 5\%$), or if two -one-sided tests (TOST) for the effect being smaller/greater than $+\Delta$ -and $-\Delta$ are significant at level $\alpha$, respectively. -A quantitative measure of evidence for the absence of an effect is then given +considered as absent for practical purposes. The goal is then to reject the +composite null hypothesis that the true effect is outside the equivalence range. +To ensure that the null hypothesis is falsely rejected at most $\alpha \times +100\%$ of the time, one either rejects it if the $(1-2\alpha)\times 100\%$ +confidence interval for the effect is contained within the equivalence range +(for example, a 90\% confidence interval for $\alpha = 5\%$), or if two +one-sided tests (TOST) for the effect being smaller/greater than $+\Delta$ +and $-\Delta$ are significant at level $\alpha$, respectively. +A quantitative measure of evidence for the absence of an effect is then given by the maximum of the two one-sided $p$-values. \todo{CM: maybe more logical to first discuss margin and then mention the @@ -762,7 +762,7 @@ If the goal of study is to find evidence for the absence of an effect, the replication sample size should also be determined so that the study has adequate power to make conclusive inferences regarding the absence of the effect. -\todo{CM: mention that margin + prior distribution should be chosen +\todo{CM: mention that margin + prior distribution should be chosen before first/second study is conducted?} \section*{Acknowledgements}