diff --git a/Dockerfile b/Dockerfile
index 7f468ee2a7f43f5f560ab7061c77d1a72917de56..533036ef96a9b1459c3c8b2e04698b7a43eeec1d 100755
--- a/Dockerfile
+++ b/Dockerfile
@@ -1,5 +1,5 @@
 ## set R version (https://hub.docker.com/r/rocker/verse/tags)
-FROM rocker/verse:4.2
+FROM rocker/verse:4.2.3
 
 ## name of the manuscript (as in Makefile and paper/Makefile)
 ENV FILE=rsabsence
diff --git a/paper/bibliography.bib b/paper/bibliography.bib
index 06b1789f7b39251f3200f4f86a7894e632fbf316..3225f87d545042096b38c9123b1e453eca3ce935 100755
--- a/paper/bibliography.bib
+++ b/paper/bibliography.bib
@@ -1377,6 +1377,7 @@ Visualizing Intersecting Sets},
   journal = {Psychological Methods}
 }
 
+
 @article{Chalmers2014,
   doi = {10.1016/s0140-6736(13)62229-1},
   year = {2014},
diff --git a/paper/rsabsence.Rnw b/paper/rsabsence.Rnw
index 0eaa22d4b8f05e0ca31c429580ff276c002448c1..97dda7770f30d0d9eee516c6c3c78e21eec4f4ba 100755
--- a/paper/rsabsence.Rnw
+++ b/paper/rsabsence.Rnw
@@ -1,4 +1,4 @@
-\documentclass[9pt,lineno %, onehalfspacing
+\documentclass[9pt,%lineno %, onehalfspacing
 ]{elife}
 \usepackage[T1]{fontenc}
 \usepackage[utf8]{inputenc}
@@ -131,7 +131,7 @@ paper by Douglas Altman and Martin Bland has since become a mantra in the
 statistical and medical literature \citep{Altman1995}. Yet, the misconception
 that a statistically non-significant result indicates evidence for the absence
 of an effect is unfortunately still widespread \citep{Makin2019}. Such a ``null
-result'' -- typically characterized by a $p$-value of $p > 0.05$ for the null
+result'' -- typically characterized by a \textit{p}-value of $p > 0.05$ for the null
 hypothesis of an absent effect -- may also occur if an effect is actually
 present. For example, if the sample size of a study is chosen to detect an
 assumed effect with a power of 80\%, null results will incorrectly occur 20\% of
@@ -178,8 +178,8 @@ effect when analyzed with appropriate methods, so that the goal of the
 replication is clearer. However, the criterion does not distinguish between
 these two cases. Second, with this criterion researchers can virtually always
 achieve replication success by conducting two studies with very small sample
-sizes, such that the $p$-values are non-significant and the results are
-inconclusive. This is because the null hypothesis under which the $p$-values are
+sizes, such that the \textit{p}-values are non-significant and the results are
+inconclusive. This is because the null hypothesis under which the \textit{p}-values are
 computed is misaligned with the goal of inference, which is to quantify the
 evidence for the absence of an effect. We will discuss methods that are better
 aligned with this inferential goal. % in Section~\ref{sec:methods}.
@@ -189,7 +189,7 @@ replication success criterion of requiring significance from both studies
 \citep[also known as the two-trials rule, see chapter 12.2.8 in][]{Senn2008},
 which ensures that the error of falsely claiming the presence of an effect is
 controlled at a rate equal to the squared significance level (for example,
-$5\% \times 5\% = 0.25\%$ for a $5\%$ significance level). The non-significance
+5\ $\times$ 5\% = 0.25\% for a 5\% significance level). The non-significance
 criterion may be intended to complement the two-trials rule for null results,
 but it fails to do so in this respect, which may be important to regulators,
 funders, and researchers. We will now demonstrate these issues and potential
@@ -302,14 +302,14 @@ ggplot(data = plotDF1) +
   pairs which meet the non-significance replication success criterion from the
   Reproducibility Project: Cancer Biology \citep{Errington2021}. Shown are
   standardized mean difference effect estimates with \Sexpr{round(conflevel*100,
-    2)}\% confidence intervals, sample sizes, and two-sided $p$-values for the
-  null hypothesis that the standardized mean difference is zero.}
+    2)}\% confidence intervals, sample sizes, and two-sided \textit{p}-values
+  for the null hypothesis that the effect is absent.}
 \end{figure}
 
 Figure~\ref{fig:2examples} shows standardized mean difference effect estimates
 with \Sexpr{round(100*conflevel, 2)}\% confidence intervals from two RPCB study
 pairs. Both are ``null results'' and meet the non-significance criterion for
-replication success (the two-sided $p$-values are greater than 0.05 in both the
+replication success (the two-sided \textit{p}-values are greater than 0.05 in both the
 original and the replication study), but intuition would suggest that these two
 pairs are very much different.
 
@@ -401,20 +401,20 @@ the effect is zero, see Figure~\ref{fig:hypotheses} for an illustration.
 To ensure that the null hypothesis is falsely rejected at most
 $\alpha \times 100\%$ of the time, the standard approach is to declare
 equivalence if the $(1-2\alpha)\times 100\%$ confidence interval for the effect
-is contained within the equivalence range (for example, a 90\% confidence
-interval for $\alpha = 5\%$) \citep{Westlake1972}, which is equivalent to two
-one-sided tests (TOST) for the null hypotheses of the effect being
+is contained within the equivalence range, for example, a 90\% confidence
+interval for $\alpha = 5\%$ \citep{Westlake1972}. The procedure is equivalent to
+two one-sided tests (TOST) for the null hypotheses of the effect being
 greater/smaller than $+\Delta$ and $-\Delta$ being significant at level $\alpha$
 \citep{Schuirmann1987}. A quantitative measure of evidence for the absence of an
-effect is then given by the maximum of the two one-sided $p$-values (the TOST
-$p$-value). A reasonable replication success criterion for null results may
+effect is then given by the maximum of the two one-sided \textit{p}-values (the TOST
+\textit{p}-value). A reasonable replication success criterion for null results may
 therefore be to require that both the original and the replication TOST
-$p$-values be smaller than some level $\alpha$ (e.g., 0.05), or, equivalently,
+\textit{p}-values be smaller than some level $\alpha$ (e.g., 0.05), or, equivalently,
 that their $(1-2\alpha)\times 100\%$ confidence intervals are included in the
-equivalence region (e.g., 90\%). In contrast to the non-significance criterion,
-this criterion controls the error of falsely claiming replication success at
-level $\alpha^{2}$ when there is a true effect outside the equivalence margin,
-thus complementing the usual two-trials rule.
+equivalence region. In contrast to the non-significance criterion, this
+criterion controls the error of falsely claiming replication success at level
+$\alpha^{2}$ when there is a true effect outside the equivalence margin, thus
+complementing the usual two-trials rule.
 
 
 \begin{figure}
@@ -515,8 +515,8 @@ ggplot(data = rpcbNull) +
   indicated in the plot titles. The dashed gray line represents the value of no
   effect ($\text{SMD} = 0$), while the dotted red lines represent the
   equivalence range with a margin of $\Delta = \Sexpr{margin}$, classified as
-  ``liberal'' by \citet[Table 1.1]{Wellek2010}. The $p$-values $p_{\text{TOST}}$
-  are the maximum of the two one-sided $p$-values for the effect being less than
+  ``liberal'' by \citet[Table 1.1]{Wellek2010}. The \textit{p}-values $p_{\text{TOST}}$
+  are the maximum of the two one-sided \textit{p}-values for the effect being less than
   or greater than $+\Delta$ or $-\Delta$, respectively. The Bayes factors
   $\BF_{01}$ quantify the evidence for the null hypothesis
   $H_{0} \colon \text{SMD} = 0$ against the alternative
@@ -541,32 +541,39 @@ ptostr2 <- rpcbNull$ptostr[ind2]
 
 ## success BF criterion
 bfSuccesses <- sum(rpcbNull$BForig > 3 & rpcbNull$BFrep > 3)
+BForig1 <- rpcbNull$BForig[ind1]
+BFrep1 <- rpcbNull$BFrep[ind1]
+BForig2 <- rpcbNull$BForig[ind2]
+BFrep2 <- rpcbNull$BFrep[ind2]
 @
 
 Returning to the RPCB data, Figure~\ref{fig:nullfindings} shows the standardized
 mean difference effect estimates with \Sexpr{round(conflevel*100, 2)}\%
 confidence intervals for the 15 effects which were treated as quantitative null
 results by the RPCB.\footnote{There are four original studies with null effects
-  for which several internal replication studies were conducted, leading in
-  total to 20 replications of null effects. As in the RPCB main analysis
-  \citet{Errington2021}, we aggregated their SMD estimates into a single SMD
-  estimate with fixed-effect meta-analysis.} Most of them showed non-significant
-$p$-values ($p > 0.05$) in the original study, but there are two effects in
-paper 48 which the original authors regarded as null results despite their
-statistical significance. We see that there are \Sexpr{nullSuccesses}
-``success'' (with $p > 0.05$ in original and replication study) out of total
+  for which two or three ``internal'' replication studies were conducted,
+  leading in total to 20 replications of null effects. As in the RPCB main
+  analysis \citep{Errington2021}, we aggregated their SMD estimates into a
+  single SMD estimate with fixed-effect meta-analysis and recomputed the
+  replication \textit{p}-value based on a normal approximation. For the original
+  studies and single replication studies we report the \textit{p}-values as provided by
+  the RPCB.} Most of them showed non-significant \textit{p}-values ($p > 0.05$) in the
+original study, but there are two effects in paper 48 which the original authors
+regarded as null results despite their statistical significance. We see that
+there are \Sexpr{nullSuccesses} ``success'' according to the non-significance
+criterion (with $p > 0.05$ in original and replication study) out of total
 \Sexpr{ntotal} null effects, as reported in Table 1 from~\citet{Errington2021}.
 % , and which were therefore treated as null results also by the RPCB.
 
 We will now apply equivalence testing to the RPCB data. The dotted red lines
 represent an equivalence range for the margin $\Delta =
-\Sexpr{margin}$, % , for which the shown TOST $p$-values are computed.
+\Sexpr{margin}$, % , for which the shown TOST \textit{p}-values are computed.
 which \citet[Table 1.1]{Wellek2010} classifies as ``liberal''. However, even
 with this generous margin, only \Sexpr{equivalenceSuccesses} of the
 \Sexpr{ntotal} study pairs are able to establish replication success at the 5\%
 level, in the sense that both the original and the replication 90\% confidence
 interval fall within the equivalence range (or, equivalently, that their TOST
-$p$-values are smaller than $0.05$). For the remaining \Sexpr{ntotal -
+\textit{p}-values are smaller than $0.05$). For the remaining \Sexpr{ntotal -
   equivalenceSuccesses} studies, the situation remains inconclusive and there is
 no evidence for the absence or the presence of the effect. For instance, the
 previously discussed example from \citet{Goetz2011} marginally fails the
@@ -581,28 +588,37 @@ $p_{\text{TOST}} = \Sexpr{formatPval(ptostr2)}$ in the replication).
 % We chose the margin $\Delta = \Sexpr{margin}$ primarily for illustrative
 % purposes and because effect sizes in preclinical research are typically much
 % larger than in clinical research.
-The post-hoc determination of the equivalence margin is debateable. Ideally, the
-margin should be determined on a case-by-case basis before the studies are
-conducted by researchers familiar with the subject matter. One could also argue
-that the chosen margin $\Delta = \Sexpr{margin}$ is too lax compared to margins
-typically used in clinical research; for instance, in oncology, a margin of
-$\Delta = \log(1.3)$ is commonly used for log odds/hazard ratios, whereas in
-bioequivalence studies a margin of $\Delta =
-\log(1.25) % = \Sexpr{round(log(1.25), 2)}
-$ is the convention, which translates to $\Delta = % \log(1.3)\sqrt{3}/\pi =
+The post-hoc determination of the equivalence margins is controversal. Ideally,
+the margin should be determined on a case-by-case basis before the studies are
+conducted by researchers familiar with the subject matter. In the social and
+medical sciences, the conventions of \citet{Cohen1992} are typically used to
+classify SMD effect sizes ($\text{SMD} = 0.2$ small, $\text{SMD} = 0.5$ medium,
+$\text{SMD} = 0.8$ large). While effect sizes are typically larger in
+preclinical research, it seems unrealistic to specify margins larger than 1 to
+represent effect sizes that are absent for practical purposes. It could also be
+argued that the chosen margin $\Delta = \Sexpr{margin}$ is too lax compared to
+margins commonly used in clinical research; for instance, in oncology, a margin
+of $\Delta = \log(1.3)$ is commonly used for log odds/hazard ratios, whereas in
+bioequivalence studies a margin of \mbox{$\Delta =
+  \log(1.25) % = \Sexpr{round(log(1.25), 2)}
+  $} is the convention. These margins would translate into much more stringent
+margins of $\Delta = % \log(1.3)\sqrt{3}/\pi =
 \Sexpr{round(log(1.3)*sqrt(3)/pi, 2)}$ and $\Delta = % \log(1.25)\sqrt{3}/\pi =
 \Sexpr{round(log(1.25)*sqrt(3)/pi, 2)}$ on the SMD scale, respectively, using
 the $\text{SMD} = (\surd{3} / \pi) \log\text{OR}$ conversion \citep[p.
 233]{Cooper2019}. Therefore, we report a sensitivity analysis in
 Figure~\ref{fig:sensitivity}. The top plot shows the number of successful
 replications as a function of the margin $\Delta$ and for different TOST
-$p$-value thresholds. Such an ``equivalence curve'' approach was first proposed
-by \citet{Hauck1986}, see also \citet{Campbell2021} for alternative approaches
-to post-hoc equivalence margin specification. We see that for realistic margins
-between 0 and 1, the proportion of replication successes remains below 50\%. To
-achieve a success rate of 11 of the 15 studies, as with the RCPB
-non-significance criterion, unrealistic margins of $\Delta > 2$ are required,
-which illustrates the paucity of evidence provided by these studies.
+\textit{p}-value thresholds. Such an ``equivalence curve'' approach was first proposed
+by \citet{Hauck1986}.
+% see also \citet{Campbell2021} for alternative approaches to post-hoc
+% equivalence margin specification.
+We see that for realistic margins between 0 and 1, the proportion of replication
+successes remains below 50\%. To achieve a success rate of
+11/15 = \Sexpr{round(11/15*100, 1)}\%, as with the non-significance criterion,
+unrealistic margins of $\Delta >$ 2 are required, highlighting the paucity of
+evidence provided by these studies.
+
 
 
 \begin{figure}[!htb]
@@ -675,7 +691,7 @@ plotB <- ggplot(data = bfDF,
                 aes(x = priorsd, y = successes, color = factor(thresh, ordered = TRUE))) +
     facet_wrap(~ '"BF"["01"] >= gamma ~ "in original and replication study"',
                labeller = label_parsed) +
-    geom_vline(xintercept = 4, lty = 2, alpha = 0.4) +
+    geom_vline(xintercept = 2, lty = 2, alpha = 0.4) +
     geom_step(alpha = 0.8, linewidth = 0.8) +
     scale_y_continuous(breaks = bks, labels = labs, limits = c(0, nmax)) +
     ## scale_y_continuous(labels = scales::percent, limits = c(0, 1)) +
@@ -695,13 +711,13 @@ plotB <- ggplot(data = bfDF,
 grid.arrange(plotA, plotB, ncol = 1)
 @
 
-\caption{Number of successful replications of original null results in
-  the RPCB as a function of the margin $\Delta$ of the equivalence test
+\caption{Number of successful replications of original null results in the RPCB
+  as a function of the margin $\Delta$ of the equivalence test
   ($p_{\text{TOST}} \leq \alpha$ in both studies) or the standard deviation of
-  the normal prior distribution for the effect under the alternative $H_{1}$ of
-  the Bayes factor test ($\BF_{01} \geq \gamma$ in both studies). The dashed
-  gray lines represent the parameters used in the main analysis shown in
-  Figure~\ref{fig:nullfindings}.}
+  the normal prior distribution for the SMD effect size under the alternative
+  $H_{1}$ of the Bayes factor test ($\BF_{01} \geq \gamma$ in both studies). The
+  dashed gray lines represent the margin and standard deviation used in the main
+  analysis shown in Figure~\ref{fig:nullfindings}.}
 \label{fig:sensitivity}
 \end{figure}
 
@@ -727,7 +743,13 @@ Bayes factor greater than one (\mbox{$\BF_{01} > 1$}) indicates evidence for the
 absence of the effect and a Bayes factor smaller than one indicates evidence for
 the presence of the effect (\mbox{$\BF_{01} < 1$}), whereas a Bayes factor not
 much different from one indicates absence of evidence for either hypothesis
-(\mbox{$\BF_{01} \approx 1$}).
+(\mbox{$\BF_{01} \approx 1$}). A reasonable criterion for successful replication
+of a null result may hence be to require a Bayes factor larger than some level
+$\gamma > 1$ from both studies, for example, $\gamma = 3$ or $\gamma = 10$ which
+are conventional levels for ``substantial'' and ``strong'' evidence,
+respectively \citep{Jeffreys1961}. In contrast to the non-significance
+criterion, this criterion provides a genuine measure of evidence that can
+distinguish absence of evidence from evidence of absence.
 
 When the observed data are dichotomized into positive (\mbox{$p < 0.05$}) or null
 results (\mbox{$p > 0.05$}), the Bayes factor based on a null result is the
@@ -757,36 +779,63 @@ The Bayes factors $\BF_{01}$ shown in Figure~\ref{fig:nullfindings} then
 quantify the evidence for the null hypothesis of no effect
 ($H_{0} \colon \text{SMD} = 0$) against the alternative hypothesis that there is
 an effect ($H_{1} \colon \text{SMD} \neq 0$) using a normal ``unit-information''
-prior distribution \citep{Kass1995b} for the effect size under the alternative
-$H_{1}$. There are several more advanced prior distributions that could be used
-here, and they should ideally be specified for each effect individually based on
-domain knowledge. The normal unit-information prior (with a standard deviation
-of 2 for SMDs) is only a reasonable default choice, as it implies that small to
-large effects are plausible under the alternative. We see that in most cases
-there is no substantial evidence for either the absence or the presence of an
-effect, as with the equivalence tests. The Bayes factors for the two previously
-discussed examples from \citet{Goetz2011} and \citet{Dawson2011} are consistent
-with our intuitions -- there is indeed some evidence for the absence of an
-effect in \citet{Goetz2011}, while there is even slightly more evidence for the
-presence of an effect in \citet{Dawson2011}, though the Bayes factor is very
-close to one due to the small sample sizes. With a lenient Bayes factor
-threshold of $\BF_{01} > 3$ to define evidence for the absence of the effect,
-only \Sexpr{bfSuccesses} of the \Sexpr{ntotal} study pairs meets this criterion
-in both the original and replication study.
-
-The sensitivity of the Bayes factor choice of the of the prior may again be
-assessed visually, as shown in the bottom plot of Figure~\ref{fig:sensitivity}.
-We see ....
+prior distribution\footnote{For SMD effect sizes, a normal unit-information
+  prior is a normal distribution centered around the null value with a standard
+  deviation corresponding to one observation. Assuming that the group means are
+  normally distributed \mbox{$\bar{X}_{1} \sim \Nor(\theta_{1}, 2\sigma^{2}/n)$}
+  and \mbox{$\bar{X}_{2} \sim \Nor(\theta_{2}, 2\sigma^{2}/n)$} with $n$ the
+  total sample size and $\sigma$ the known data standard deviation, the
+  distribution of the SMD is
+  \mbox{$\text{SMD} = (\bar{X}_{1} - \bar{X}_{2})/\sigma \sim \Nor((\theta_{1} - \theta_{2})/\sigma, 4/n)$}.
+  The standard deviation of the SMD based on one unit ($n = 1$) is hence 2, just
+  as the unit standard deviation for log hazard/odds/rate ratio effect sizes
+  \citep[Section 2.4]{Spiegelhalter2004}.} \citep{Kass1995b} for the effect size
+under the alternative $H_{1}$. We see that in most cases there is no substantial
+evidence for either the absence or the presence of an effect, as with the
+equivalence tests. For instance, with a lenient Bayes factor threshold of 3,
+only \Sexpr{bfSuccesses} of the \Sexpr{ntotal} replications are successful, in
+the sense of having $\BF_{01} > 3$ in both the original and the replication
+study. The Bayes factors for the two previously discussed examples are
+consistent with our intuitions -- in the \citet{Goetz2011} example there is
+indeed substantial evidence for the absence of an effect
+($\BF_{01} = \Sexpr{formatBF(BForig1)}$ in the original study and
+$\BF_{01} = \Sexpr{formatBF(BFrep1)}$ in the replication), while in the
+\citet{Dawson2011} example there is even weak evidence for the \emph{presence}
+of an effect, though the Bayes factors are very close to one due to the small
+sample sizes ($\BF_{01} = \Sexpr{formatBF(BForig2)}$ in the original study and
+$\BF_{01} = \Sexpr{formatBF(BFrep2)}$ in the replication).
+
+As with the equivalence margin, the choice of the prior distribution for the SMD
+under the alternative $H_{1}$ is debatable. The normal unit-information prior
+seems to be a reasonable default choice, as it implies that small to large
+effects are plausible under the alternative, but other normal priors with
+smaller/larger standard deviations could have been considered to make the test
+more sensitive to smaller/larger true effect sizes.
+% There are also several more advanced prior distributions that could be used
+% here \citep{Johnson2010,Morey2011}, and any prior distribution should ideally
+% be specified for each effect individually based on domain knowledge.
+We therefore report a sensitivity analysis with respect to the choice of the
+prior standard deviation in the bottom plot of Figure~\ref{fig:sensitivity}. It
+is uncommon to specify prior standard deviations larger than the
+unit-information standard deviation of 2, as this corresponds to the assumption
+of very large effect sizes under the alternatives. However, to achieve
+replication success for a larger proportion of replications than the observed
+\Sexpr{bfSuccesses}/\Sexpr{ntotal} = \Sexpr{round(bfSuccesses/ntotal*100, 1)}\%,
+unreasonably large prior standard deviations have to be specified. For instance,
+a standard deviation of roughly 5 is required to achieve replication success in
+50\% of the replications, and the standard deviation needs to be almost 20 so
+that the same success rate 11/15 = \Sexpr{round(11/15*100, 1)}\% as with the
+non-significance criterion is achieved.
+
 
 << >>=
 studyInteresting <- filter(rpcbNull, id == "(48, 2, 4)")
 noInteresting <- studyInteresting$no
 nrInteresting <- studyInteresting$nr
-## write.csv(rpcbNull, "rpcb-Null.csv", row.names = FALSE)
 @
 
-Among the \Sexpr{ntotal} RPCB null results, there are three interesting cases
-(the three effects from paper 48) where the Bayes factor is qualitatively
+Of note, among the \Sexpr{ntotal} RPCB null results, there are three interesting
+cases (the three effects from paper 48) where the Bayes factor is qualitatively
 different from the equivalence test, revealing a fundamental difference between
 the two approaches. The Bayes factor is concerned with testing whether the
 effect is \emph{exactly zero}, whereas the equivalence test is concerned with
@@ -794,7 +843,7 @@ whether the effect is within an \emph{interval around zero}. Due to the very
 large sample size in the original study ($n = \Sexpr{noInteresting}$) and the
 replication ($n = \Sexpr{nrInteresting}$), the data are incompatible with an
 exactly zero effect, but compatible with effects within the equivalence range.
-Apart from this example, however, the approaches lead to the same qualitative
+Apart from this example, however, both approaches lead to the same qualitative
 conclusion -- most RPCB null results are highly ambiguous.
 
 \section{Conclusions}
@@ -814,8 +863,8 @@ studies are conducted. Typically, however, the original studies were designed to
 find evidence for the presence of an effect, and the goal of replicating the
 ``null result'' was formulated only after failure to do so. It is therefore
 important that margins and prior distributions are motivated from historical
-data and/or field conventions, and that sensitivity analyses regarding their
-choice are reported \citet{Campbell2021}.
+data and/or field conventions \citep{Campbell2021}, and that sensitivity
+analyses regarding their choice are reported.
 
 While the equivalence test and the Bayes factor are two principled methods for
 analyzing original and replication studies with null results, they are not the
@@ -860,23 +909,10 @@ preparation, dynamic reporting, and formatting, respectively. The data from the
 RPCB were obtained by downloading the files from
 \url{https://github.com/mayamathur/rpcb} (commit a1e0c63) and extracting the
 relevant variables as indicated in the R script \texttt{preprocess-rpcb-data.R}
-which is available in our git repository.% The effect estimates and standard
-% errors on SMD scale provided in this data set differ in some cases from those in
-% the data set available at \url{https://doi.org/10.17605/osf.io/e5nvr}, which is
-% cited in \citet{Errington2021}. We used this particular version of the data set
-% because it was recommended to us by the RPCB statistician (Maya Mathur) upon
-% request.
-% For the \citet{Dawson2011} example study and its replication \citep{Shan2017},
-% the sample sizes $n = 3$ in th data set seem to correspond to the group sample
-% sizes, see Figure 5A in the replication study
-% (\url{https://doi.org/10.7554/eLife.25306.012}), which is why we report the
-% total sample sizes of $n = 6$ in Figure~\ref{fig:2examples}.
-
+which is available in our git repository.
 
 \bibliography{bibliography}
 
-
-
 << "sessionInfo1", eval = Reproducibility, results = "asis" >>=
 ## print R sessionInfo to see system information and package versions
 ## used to compile the manuscript (set Reproducibility = FALSE, to not do that)
diff --git a/rsabsence.pdf b/rsabsence.pdf
index de9bec8c3944376ab5d7ca9084c3162661273ed1..07fb43fbfabee556dd8bbb4515280b1bbe8cc8d9 100644
Binary files a/rsabsence.pdf and b/rsabsence.pdf differ