Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
R
Replication of null results - Absence of evidence or evidence of absence
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Container Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Issue analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Samuel Pawel
Replication of null results - Absence of evidence or evidence of absence
Commits
5701c9f2
Commit
5701c9f2
authored
2 years ago
by
Rachel Heyard
Browse files
Options
Downloads
Patches
Plain Diff
polishing intro (far from done/happy)
parent
68fffbee
No related branches found
No related tags found
No related merge requests found
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
bibliography.bib
+23
-0
23 additions, 0 deletions
bibliography.bib
rsAbsence.Rnw
+153
-105
153 additions, 105 deletions
rsAbsence.Rnw
with
176 additions
and
105 deletions
bibliography.bib
+
23
−
0
View file @
5701c9f2
...
@@ -96,6 +96,20 @@
...
@@ -96,6 +96,20 @@
journal
=
{BMJ}
journal
=
{BMJ}
}
}
@article
{
Goodman2008
,
doi
=
{10.1053/j.seminhematol.2008.04.003}
,
url
=
{https://doi.org/10.1053/j.seminhematol.2008.04.003}
,
year
=
{2008}
,
month
=
jul
,
publisher
=
{Elsevier {BV}}
,
volume
=
{45}
,
number
=
{3}
,
pages
=
{135--140}
,
author
=
{Steven Goodman}
,
title
=
{A Dirty Dozen: Twelve P-Value Misconceptions}
,
journal
=
{Seminars in Hematology}
}
@Article
{
Bayarri2003
,
@Article
{
Bayarri2003
,
doi
=
{10.1016/s0378-3758(02)00282-3}
,
doi
=
{10.1016/s0378-3758(02)00282-3}
,
year
=
{2003}
,
year
=
{2003}
,
...
@@ -830,6 +844,15 @@ url = {www.fda.gov/regulatory-information/search-fda-guidance-documents/providi
...
@@ -830,6 +844,15 @@ url = {www.fda.gov/regulatory-information/search-fda-guidance-documents/providi
title
=
{New preprint server for medical research}
,
title
=
{New preprint server for medical research}
,
journal
=
{{BMJ}}
journal
=
{{BMJ}}
}
}
@book
{
NSF2019
,
doi
=
{10.17226/25303}
,
url
=
{https://doi.org/10.17226/25303}
,
year
=
{2019}
,
month
=
sep
,
publisher
=
{National Academies Press}
,
author
=
{{National Academies of Sciences, Engineering, and Medicine}}
,
title
=
{Reproducibility and Replicability in Science}
}
@Manual
{
Gehlenborg2019
,
@Manual
{
Gehlenborg2019
,
title
=
{UpSetR: A More Scalable Alternative to Venn and Euler Diagrams for
title
=
{UpSetR: A More Scalable Alternative to Venn and Euler Diagrams for
...
...
This diff is collapsed.
Click to expand it.
rsAbsence.Rnw
+
153
−
105
View file @
5701c9f2
...
@@ -24,7 +24,7 @@
...
@@ -24,7 +24,7 @@
bottom=25mm,
bottom=25mm,
}
}
\title
{
\bf
Meta-research: Replication studies and absence of evidence
}
\title
{
\bf
Meta-research: Replication studies and
the ``
absence of evidence
''
}
\author
{{
\bf
Rachel Heyard, Charlotte Micheloud, Samuel Pawel, Leonhard Held
}
\\
\author
{{
\bf
Rachel Heyard, Charlotte Micheloud, Samuel Pawel, Leonhard Held
}
\\
Epidemiology, Biostatistics and Prevention Institute
\\
Epidemiology, Biostatistics and Prevention Institute
\\
Center for Reproducible Science
\\
Center for Reproducible Science
\\
...
@@ -122,9 +122,9 @@ formatBF <- Vectorize(FUN = formatBF.)
...
@@ -122,9 +122,9 @@ formatBF <- Vectorize(FUN = formatBF.)
replicating or even proving a null
-
effect. Methods to adequately summarize
replicating or even proving a null
-
effect. Methods to adequately summarize
the evidence for the null have been proposed. With this paper we want to
the evidence for the null have been proposed. With this paper we want to
highlight the consequences of the ``absence of evidence'' fallacy in the
highlight the consequences of the ``absence of evidence'' fallacy in the
replication setting and want to guide the reader
s
and
hopefully
future
replication setting and want to guide the reader and future
author
s
of replication studies to
the correct
methods to
design and
author of replication studies to
existing
methods to
appropriately
analyse
their
replication attempts.
design and
analyse replication attempts
of non
-
significant findings
.
}
\\
}
\\
\rule
{
\textwidth
}{
0
.
5
pt
}
\emph
{
Keywords
}
: Bayesian hypothesis testing,
\rule
{
\textwidth
}{
0
.
5
pt
}
\emph
{
Keywords
}
: Bayesian hypothesis testing,
equivalence test, non
-
inferiority test, null hypothesis, replication
equivalence test, non
-
inferiority test, null hypothesis, replication
...
@@ -142,25 +142,41 @@ participants, n) is used to achieve an 80-90\% power of correctly rejecting the
...
@@ -142,25 +142,41 @@ participants, n) is used to achieve an 80-90\% power of correctly rejecting the
null hypothesis. This leaves us with a
10
-
20
\%
chance of a false negative.
null hypothesis. This leaves us with a
10
-
20
\%
chance of a false negative.
Somehow this fact from ``Hypothesis Testing
101
'' is all too often forgotten and
Somehow this fact from ``Hypothesis Testing
101
'' is all too often forgotten and
studies showing an effect with a p
-
value larger than the conventionally used
studies showing an effect with a p
-
value larger than the conventionally used
significance level of
$
\alpha
= 0.05
$
is
doomed to be a ``negative study'' or showing a
significance level of
$
\alpha
= 0.05
$
are
doomed to be a ``negative study'' or showing a
``null effect''. Some have even called to abolish the term ``negative
``null effect''. Some have even called to abolish the term ``negative
study'' altogether, as every well
-
designed and conducted study is a ``positive
study'' altogether, as every well
-
designed and
well
-
conducted study is a ``positive
contribution to knowledge'', regardless it’s results
\citep
{
Chalmers
1002
}
. Others
contribution to knowledge'', regardless it’s results
\citep
{
Chalmers
1002
}
. Others
suggest to shift away from significance testing because of the many misconceptions
suggest to shift away from significance testing because of the many misconceptions
of
$
p
$
-
values and significance
\citep
{
Berner
2022
}
.
of
$
p
$
-
values and significance
\citep
{
Goodman
2008
, Berner
2022
}
.
More specifically, turning to the replication context, ``the absence of evidence'' fallacy
Turning to the replication context, replicability has been
appeared in the definitions of replication success in some of the large
-
scale
defined as ``obtaining consistent results across studies aimed at answering the
replication projects. The Replication Project Cancer Biology
\citep
[
RPCB
]
{
Errington
2021
}
same scientific question, each of which has obtained its own data''
\citep
{
NSF
2019
}
.
and the RP in Experimental Philosophy
\citep
[
RPEP
]
{
Cova
2018
}
explicitly define a
Hence, a replication of an original finding attempts to find consistent results while
replication of a non
-
significant original effect as successful if the effect in the
applying the same methods and protocol as published in the original study on newly collected data.
replication study is also non
-
significant. While the authors of the RPEP warn
In the past decade, big collaborations of researcher and research groups conducted
the reader that the use of p
-
values as criterion for success is problematic when
large
-
scale replication projects
(
RP
)
to estimate the replicability of their respective research
applied to replications of original non
-
significant findings, the authors of the
field. In these projects, a set of high impact and influential original studies were
RPCB do not. The RP in Psychological Science
\citep
{
Opensc
2015
}
, on the other hand,
selected to be replicated as close as possible to the original methodology. The
excluded the ``original nulls'' when deciding replication success based on significance and
results and conclusions of the RPs showed alarmingly low levels of replicability in most fields.
the Social Science RP
\citep
{
Camerer
2018
}
as well as the RP in Experimental Economics
The Replication Project Cancer Biology
\citep
[
RPCB
]
{
Errington
2021
}
, the RP in
\cite
{
Camerer
2016
}
did not include original studies without a significant finding.
Experimental Philosophy
\citep
[
RPEP
]
{
Cova
2018
}
and the RP in Psychological
Science
\citep
[
RPP
]
{
Opensc
2015
}
also attempted to replicate original studies with
non
-
significant effects. The authors of those RPs unfortunately fell into the
``absence of evidence''
-
fallacy trap when defining successful replications.
More specifically, the RPCB and RPEP explicitly define a replication of a non
-
significant
original effect as successful if the effect in the replication study is also non
-
significant.
While the authors of the RPEP warn the reader that the use of
$
p
$
-
values as criterion
for success is problematic when applied to replications of original non
-
significant findings,
the authors of the RPCB do not. In the RPP, on the other hand, ``original nulls''
were excluded when assessing replication success based on significance.
% In general, using the significance criterion as definition of replication success
% arises from a false interpretation of the failure to find evidence against the null
% hypothesis as evidence for the null. Non-significant original finding does not
% mean that the underlying true effect is zero nor that it does not exist. This is
% especially true if the original study is under-powered.
\textbf
{
To replicate or not to replicate an original ``null'' finding?
}
\textbf
{
To replicate or not to replicate an original ``null'' finding?
}
Because of the previously presented fallacy, original studies with
Because of the previously presented fallacy, original studies with
...
@@ -174,41 +190,28 @@ successful replication we need a ``significant result in the same direction in
...
@@ -174,41 +190,28 @@ successful replication we need a ``significant result in the same direction in
both the original and the replication study''
(
i.e. the two
-
trials rule,
\cite
{
Senn
2008
}
)
,
both the original and the replication study''
(
i.e. the two
-
trials rule,
\cite
{
Senn
2008
}
)
,
replicating a non
-
significant original result does indeed not make any sense.
replicating a non
-
significant original result does indeed not make any sense.
However, the use of significance as sole criterion for replication success has
However, the use of significance as sole criterion for replication success has
its shortcomings.
its shortcomings and other definitions for replication success have been proposed
\cite
{
Simonsohn
2015
, Ly
2018
, Hedges
2019
, Held
2020
}
. Additionally, replication
\citet
{
Anderson
2016
}
summarized the goals of replications and recommended analyses and
studies have to be well
-
design too in order to ensure high enough replication power
success criterion. Interestingly they recommended using the two
-
trials rule only if
\cite
{
Anderson
2017
, Micheloud
2020
}
.
the goal is to infer the
\textit
{
existence and direction
}
of a statistical significant
effect, while the replicating researchers are not interested in the size of this effect.
According to
\citet
{
Anderson
2016
}
, if the goal of a replications is to infer a null effect
A successful replication attempt would result in a small
$
p
$
-
value, while a large
$
p
$
-
value
evidence for the null hypothesis has to be provided. To achieve this they recommend to use
in the replication would only mean that the
equivalence tests or Bayesian methods to quantify the evidence for the null hypothesis can be used.
On the contrary, if the goal is to infer a null effect
\cite
{
Anderson
2016
}
write that,
In the following, we will illustrate how to accurately interpret the potential
in this case, evidence for the null hypothesis has to be provided. To achieve this
replication of original non
-
significant results in the Replication Project Cancer Biology.
goal equivalence tests or Bayesian methods to quantify the evidence for the null
hypothesis can be used. In the following, we will illustrate how to accurately
interpret the potential replication of original non
-
significant results in the
Cancer Biology Replication Project.
% \todo[inline]{SP: look and discuss the papers from \citet{Anderson2016, Anderson2017}}
\todo
[
inline
]
{
RH: Note sure what to cite from
\citet
{
Anderson
2017
}}
In general a non
-
significant original finding does not mean that the underlying
true effect is zero nor that it does not exist. This is especially true if the
original study is under
-
powered.
\todo
[
inline
]
{
RH: for myself, more blabla on
under
-
powered original studies
}
\section
{
Example: ``Null findings'' from the Replication Project Cancer
\section
{
Example: ``Null findings'' from the Replication Project Cancer
Biology
}
Biology
}
Of the
158
effects presented in
23
original studies that were repeated in the
Of the
158
effects presented in
23
original studies that were repeated in the
cancer biology
RP
\citep
{
Errington
2021
}
14
\%
(
22
)
were interpreted as ``null
RP
CB
\citep
{
Errington
2021
}
14
\%
(
22
)
were interpreted as ``null
effects''.
effects''.
% One of those repeated effects with a non-significant original finding was
% Note that the attempt to replicate all the experiments from the original study
% presented in Lu et al. (2014) and replicated by Richarson et al (2016).
% was not completed because of some unforeseen issues in the implementation (see
Note that the attempt to replicate all the experiments from the original study
% \cite{Errington2021b} for more details on the unfinished registered reports in
was not completed because of some unforeseen issues in the implementation
(
see
% the RPCB).
\cite
{
Errington
2021
b
}
for more details on the unfinished registered reports in
Figure~
\ref
{
fig:nullfindings
}
shows effect estimates with confidence
the RPCB
)
. Figure~
\ref
{
fig:nullfindings
}
shows effect estimates with confidence
intervals for these original ``null findings''
(
with
$
p
_{
o
}
> 0.05
$
)
and their
intervals for the original ``null findings''
(
with
$
p
_{
o
}
> 0.05
$
)
and their
replication studies from the project.
replication studies from the project.
% The replication of our example effect (Paper \# 47, Experiment \# 1, Effect \#
% The replication of our example effect (Paper \# 47, Experiment \# 1, Effect \#
% 5) was however completed. The authors of the original study declared that
% 5) was however completed. The authors of the original study declared that
...
@@ -223,16 +226,6 @@ replication studies from the project.
...
@@ -223,16 +226,6 @@ replication studies from the project.
% effect sizes together with their 95\% confidence intervals and respective
% effect sizes together with their 95\% confidence intervals and respective
% two-sided p-values.
% two-sided p-values.
\todo
[
inline
]
{
SP: I have used the original
$
p
$
-
values as reported in the data
set to select the studies in the figure . I think in this way we have the data
correctly identified as the RPCP paper reports that there are
20
null findings
in the ``All outcomes'' category. I wonder how they go from the all outcomes
category to the ``effects'' category
(
15
null findings
)
, perhaps pool the
internal replications by meta
-
analysis? I think it would be better to stay in
the all outcomes category, but of course it needs to be discussed. Also some
of the
$
p
$
-
values were probably computed in a different way than under
normality
(
e.g., the
$
p
$
-
value from
(
47
,
1
,
6
,
1
)
under normality is clearly
significant
)
.
}
<< "data" >>
=
<< "data" >>
=
## data
## data
...
@@ -282,53 +275,31 @@ rpcbNull <- rpcb %>%
...
@@ -282,53 +275,31 @@ rpcbNull <- rpcb %>%
@
@
\begin
{
figure
}
[!
htb
]
<< "plot
-
p
-
values", fig.height
=
3
.
5
>>
=
## check discrepancy between reported and recomputed p
-
values for null results
pbreaks <
-
c
(
0
.
005
,
0
.
02
,
0
.
05
,
0
.
15
,
0
.
4
)
ggplot
(
data
=
rpcbNull, aes
(
x
=
po, y
=
po
2
))
+
geom
_
abline
(
intercept
=
0
, slope
=
1
, alpha
=
0
.
2
)
+
geom
_
vline
(
xintercept
=
0
.
05
, alpha
=
0
.
2
, lty
=
2
)
+
geom
_
hline
(
yintercept
=
0
.
05
, alpha
=
0
.
2
, lty
=
2
)
+
geom
_
point
(
alpha
=
0
.
8
, shape
=
21
, fill
=
"darkgrey"
)
+
geom
_
label
_
repel
(
data
=
filter
(
rpcbNull, po
2
<
0
.
05
)
,
aes
(
x
=
po, y
=
po
2
, label
=
id
)
, alpha
=
0
.
8
, size
=
3
,
min.segment.length
=
0
, box.padding
=
0
.
7
)
+
labs
(
x
=
bquote
(
italic
(
p
[
"o"
])
~ "
(
reported
)
"
)
,
y
=
bquote
(
italic
(
p
[
"o"
])
~ "
(
recomputed under normality
)
"
))
+
scale
_
x
_
log
10
(
breaks
=
pbreaks, label
=
scales::percent
)
+
scale
_
y
_
log
10
(
breaks
=
pbreaks, labels
=
scales::percent
)
+
coord
_
fixed
(
xlim
=
c
(
min
(
c
(
rpcbNull
$
po2, rpcbNull
$
po
))
,
1
)
,
ylim
=
c
(
min
(
c
(
rpcbNull
$
po2, rpcbNull
$
po
))
,
1
))
+
theme
_
bw
()
+
theme
(
panel.grid.minor
=
element
_
blank
())
@
\caption
{
Reported versus recomputed under normality two
-
sided
$
p
$
-
values from
original studies declared as ``null findings''
(
$
p
_{
o
}
> 0.05
$
)
in
Reproducibility Project: Cancer Biology
\citep
{
Errington
2021
}
.
}
\end
{
figure
}
\begin
{
figure
}
[!
htb
]
\begin
{
figure
}
[!
htb
]
<< "plot
-
null
-
findings
-
rpcb", fig.height
=
8
.
5
>>
=
<< "plot
-
null
-
findings
-
rpcb", fig.height
=
8
.
5
>>
=
ggplot
(
data
=
rpcbNull
)
+
ggplot
(
data
=
rpcbNull
)
+
facet
_
wrap
(
~ id, scales
=
"free", ncol
=
4
)
+
facet
_
wrap
(
~ id, scales
=
"free", ncol
=
4
)
+
geom
_
hline
(
yintercept
=
0
, lty
=
2
, alpha
=
0
.
5
)
+
geom
_
hline
(
yintercept
=
0
, lty
=
2
, alpha
=
0
.
5
)
+
geom
_
pointrange
(
aes
(
x
=
"Original", y
=
smdo, ymin
=
smdo
-
2
*
so,
geom
_
pointrange
(
aes
(
x
=
"Original", y
=
smdo, ymin
=
smdo
-
2
*
so,
ymax
=
smdo
+
2
*
so
))
+
ymax
=
smdo
+
2
*
so
))
+
geom
_
pointrange
(
aes
(
x
=
"Replication", y
=
smdr, ymin
=
smdr
-
2
*
sr,
geom
_
pointrange
(
aes
(
x
=
"Replication", y
=
smdr, ymin
=
smdr
-
2
*
sr,
ymax
=
smdr
+
2
*
sr
))
+
ymax
=
smdr
+
2
*
sr
))
+
geom
_
text
(
aes
(
x
=
"Replication", y
=
pmax
(
smdr
+
2
.
1
*
sr, smdo
+
2
.
1
*
so
)
,
labs
(
x
=
"", y
=
"Standardized mean difference
(
SMD
)
"
)
+
label
=
paste
(
"'BF'
[
'
01
'
]
",
geom
_
text
(
aes
(
x
=
1
.
4
, y
=
smdo, #pmin
(
smdr
-
2
.
2
*
sr, smdo
-
2
.
2
*
so
)
,
ifelse
(
BFrformat
==
"<
1
/
1000
", "", "
==
"
)
,
label
=
paste
(
"n
[
o
]==
", no
))
, col
=
"darkblue",
BFrformat
))
,
parse
=
TRUE, size
=
2
.
5
,
parse
=
TRUE, size
=
3
,
nudge
_
x
=
-
.
05
)
+
nudge
_
y
=
-
0
.
5
)
+
geom
_
text
(
aes
(
x
=
2
.
4
, y
=
smdr, #pmin
(
smdr
-
2
.
2
*
sr, smdo
-
2
.
2
*
so
)
,
labs
(
x
=
"", y
=
"Standardized mean difference
(
SMD
)
"
)
+
label
=
paste
(
"n
[
r
]==
", nr
))
, col
=
"darkblue",
theme
_
bw
()
+
parse
=
TRUE, size
=
2
.
5
,
theme
(
panel.grid.minor
=
element
_
blank
()
,
nudge
_
x
=
-
.
05
)
+
panel.grid.major.x
=
element
_
blank
())
theme
_
bw
()
+
theme
(
panel.grid.minor
=
element
_
blank
()
,
panel.grid.major.x
=
element
_
blank
())
# TODO: one replication is missing, id
==
"
(
37
,
2
,
2
,
1
)
"
# what should we do with it?
@
@
\caption
{
Standardized mean difference effect estimates with
95
\%
confidence
\caption
{
Standardized mean difference effect estimates with
95
\%
confidence
...
@@ -338,12 +309,14 @@ ggplot(data = rpcbNull) +
...
@@ -338,12 +309,14 @@ ggplot(data = rpcbNull) +
number, Effect number, Internal replication number
)
. The data were downloaded
number, Effect number, Internal replication number
)
. The data were downloaded
from
\url
{
https:
//
doi.org
/
10
.
17605
/
osf.io
/
e
5
nvr
}
. The relevant variables were
from
\url
{
https:
//
doi.org
/
10
.
17605
/
osf.io
/
e
5
nvr
}
. The relevant variables were
extracted from the file ``
\texttt
{
RP
\_
CB Final Analysis
-
Effect level
extracted from the file ``
\texttt
{
RP
\_
CB Final Analysis
-
Effect level
data.csv
}
''.
}
data.csv
}
''. Additionally the original
(
$
n
_
o
$
)
and replication sample sizes
(
$
n
_
r
$
)
are indicated in each plot.
}
\label
{
fig:nullfindings
}
\label
{
fig:nullfindings
}
\end
{
figure
}
\end
{
figure
}
\section
{
Dealing with original non
-
significant findings in replication projects
}
\section
{
Dealing with original non
-
significant findings in replication projects
}
\subsection
{
Equivalence Design
}
\subsection
{
Equivalence Design
}
For many years, equivalence designs have been used in clinical trials to
For many years, equivalence designs have been used in clinical trials to
understand whether a new drug, which might be cheaper or have less side effects
understand whether a new drug, which might be cheaper or have less side effects
...
@@ -384,10 +357,85 @@ absence of evidence for either hypothesis ($\BF_{01} \approx 1$).
...
@@ -384,10 +357,85 @@ absence of evidence for either hypothesis ($\BF_{01} \approx 1$).
% the replication Bayes factor \citep{Verhagen2014}.
% the replication Bayes factor \citep{Verhagen2014}.
\begin
{
figure
}
[!
htb
]
<< "plot
-
null
-
findings
-
rpcb
-
br", fig.height
=
8
.
5
>>
=
ggplot
(
data
=
rpcbNull
)
+
facet
_
wrap
(
~ id, scales
=
"free", ncol
=
4
)
+
geom
_
hline
(
yintercept
=
0
, lty
=
2
, alpha
=
0
.
5
)
+
geom
_
pointrange
(
aes
(
x
=
"Original", y
=
smdo, ymin
=
smdo
-
2
*
so,
ymax
=
smdo
+
2
*
so
))
+
geom
_
pointrange
(
aes
(
x
=
"Replication", y
=
smdr, ymin
=
smdr
-
2
*
sr,
ymax
=
smdr
+
2
*
sr
))
+
geom
_
text
(
aes
(
x
=
"Replication", y
=
pmax
(
smdr
+
2
.
1
*
sr, smdo
+
2
.
1
*
so
)
,
label
=
paste
(
"'BF'
[
'
01
'
]
",
ifelse
(
BFrformat
==
"<
1
/
1000
", "", "
==
"
)
,
BFrformat
))
,
parse
=
TRUE, size
=
3
,
nudge
_
y
=
-
0
.
5
)
+
labs
(
x
=
"", y
=
"Standardized mean difference
(
SMD
)
"
)
+
theme
_
bw
()
+
theme
(
panel.grid.minor
=
element
_
blank
()
,
panel.grid.major.x
=
element
_
blank
())
@
\caption
{
Standardized mean difference effect estimates with
95
\%
confidence
interval for the ``null findings''
(
with
$
p
_{
o
}
> 0.05
$
)
and their replication
studies from the Reproducibility Project: Cancer Biology
\citep
{
Errington
2021
}
.
The identifier above each plot indicates
(
Original paper number, Experiment
number, Effect number, Internal replication number
)
. The data were downloaded
from
\url
{
https:
//
doi.org
/
10
.
17605
/
osf.io
/
e
5
nvr
}
. The relevant variables were
extracted from the file ``
\texttt
{
RP
\_
CB Final Analysis
-
Effect level
data.csv
}
''.
}
\label
{
fig:nullfindings
}
\end
{
figure
}
\bibliographystyle
{
apalikedoiurl
}
\bibliographystyle
{
apalikedoiurl
}
\bibliography
{
bibliography
}
\bibliography
{
bibliography
}
\appendix
\section
{
Note on
$
p
$
-
values
}
\todo
[
inline
]
{
SP: I have used the original
$
p
$
-
values as reported in the data
set to select the studies in the figure . I think in this way we have the data
correctly identified as the RPCP paper reports that there are
20
null findings
in the ``All outcomes'' category. I wonder how they go from the all outcomes
category to the ``effects'' category
(
15
null findings
)
, perhaps pool the
internal replications by meta
-
analysis? I think it would be better to stay in
the all outcomes category, but of course it needs to be discussed. Also some
of the
$
p
$
-
values were probably computed in a different way than under
normality
(
e.g., the
$
p
$
-
value from
(
47
,
1
,
6
,
1
)
under normality is clearly
significant
)
.
}
\begin
{
figure
}
[!
htb
]
<< "plot
-
p
-
values", fig.height
=
3
.
5
>>
=
## check discrepancy between reported and recomputed p
-
values for null results
pbreaks <
-
c
(
0
.
005
,
0
.
02
,
0
.
05
,
0
.
15
,
0
.
4
)
ggplot
(
data
=
rpcbNull, aes
(
x
=
po, y
=
po
2
))
+
geom
_
abline
(
intercept
=
0
, slope
=
1
, alpha
=
0
.
2
)
+
geom
_
vline
(
xintercept
=
0
.
05
, alpha
=
0
.
2
, lty
=
2
)
+
geom
_
hline
(
yintercept
=
0
.
05
, alpha
=
0
.
2
, lty
=
2
)
+
geom
_
point
(
alpha
=
0
.
8
, shape
=
21
, fill
=
"darkgrey"
)
+
geom
_
label
_
repel
(
data
=
filter
(
rpcbNull, po
2
<
0
.
05
)
,
aes
(
x
=
po, y
=
po
2
, label
=
id
)
, alpha
=
0
.
8
, size
=
3
,
min.segment.length
=
0
, box.padding
=
0
.
7
)
+
labs
(
x
=
bquote
(
italic
(
p
[
"o"
])
~ "
(
reported
)
"
)
,
y
=
bquote
(
italic
(
p
[
"o"
])
~ "
(
recomputed under normality
)
"
))
+
scale
_
x
_
log
10
(
breaks
=
pbreaks, label
=
scales::percent
)
+
scale
_
y
_
log
10
(
breaks
=
pbreaks, labels
=
scales::percent
)
+
coord
_
fixed
(
xlim
=
c
(
min
(
c
(
rpcbNull
$
po2, rpcbNull
$
po
))
,
1
)
,
ylim
=
c
(
min
(
c
(
rpcbNull
$
po2, rpcbNull
$
po
))
,
1
))
+
theme
_
bw
()
+
theme
(
panel.grid.minor
=
element
_
blank
())
@
\caption
{
Reported versus recomputed under normality two
-
sided
$
p
$
-
values from
original studies declared as ``null findings''
(
$
p
_{
o
}
> 0.05
$
)
in
Reproducibility Project: Cancer Biology
\citep
{
Errington
2021
}
.
}
\end
{
figure
}
<< "sessionInfo
1
", eval
=
Reproducibility, results
=
"asis" >>
=
<< "sessionInfo
1
", eval
=
Reproducibility, results
=
"asis" >>
=
## print R sessionInfo to see system information and package versions
## print R sessionInfo to see system information and package versions
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment