Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
R
Replication of null results - Absence of evidence or evidence of absence
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Container Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Issue analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Samuel Pawel
Replication of null results - Absence of evidence or evidence of absence
Commits
b78884b1
Commit
b78884b1
authored
1 year ago
by
SamCH93
Browse files
Options
Downloads
Patches
Plain Diff
some of Leo's suggestions
parent
3166d0b3
No related branches found
Branches containing commit
No related tags found
Tags containing commit
No related merge requests found
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
paper/bibliography.bib
+1
-1
1 addition, 1 deletion
paper/bibliography.bib
paper/rsabsence.Rnw
+58
-26
58 additions, 26 deletions
paper/rsabsence.Rnw
with
59 additions
and
27 deletions
paper/bibliography.bib
+
1
−
1
View file @
b78884b1
...
...
@@ -3158,7 +3158,7 @@ Discrimination},
@book
{
Senn2008
,
title
=
{Statistical
i
ssues in
d
rug
d
evelopment}
,
title
=
{Statistical
I
ssues in
D
rug
D
evelopment}
,
author
=
{Stephen Senn}
,
volume
=
{69}
,
year
=
{2008}
,
...
...
This diff is collapsed.
Click to expand it.
paper/rsabsence.Rnw
+
58
−
26
View file @
b78884b1
...
...
@@ -25,7 +25,8 @@
}
\title
{
\vspace
{
-4em
}
\textbf
{
Meta-research:
\\
Replication studies of original ``null results'' --
\\
Absence of evidence or evidence of absence?
}}
\textbf
{
% Meta-research:\\
Replication of ``null results'' -- Absence of evidence or evidence of absence?
}}
\author
{{
\bf
Samuel Pawel
\textsuperscript
{
*
}
,
Rachel Heyard
\textsuperscript
{
*
}
,
Charlotte Micheloud,
...
...
@@ -135,9 +136,9 @@ BF01 <- function(estimate, se, null = 0, unitvar = 4) {
the context of replication studies. In several large
-
scale replication
projects, non
-
significant results in both the original and the
replication study have been interpreted as a ``replication success''.
Here we discuss the logical problems with this approach.
It does not
ensure that
th
e
studies
provide evidence for the absence of an effect
and
Here we discuss the logical problems with this approach.
Non
-
significance in bo
th studies
does not ensure that the studies
provide evidence for the absence of an effect
and
% Because the null hypothesis of the statistical tests in both studies
% is misaligned,
``replication success'' can virtually always be achieved if the sample
...
...
@@ -153,7 +154,7 @@ BF01 <- function(estimate, se, null = 0, unitvar = 4) {
appropriately.
}
\\
\rule
{
\textwidth
}{
0
.
5
pt
}
\emph
{
Keywords
}
: Bayesian hypothesis testing,
equivalence testing, null hypothesis, replication success
}
equivalence testing,
meta
-
reasearch,
null hypothesis, replication success
}
\end
{
minipage
}
\end
{
center
}
...
...
@@ -178,7 +179,7 @@ adequately powered and under what assumed effect the power was calculated
\citep
{
Hoenig
2001
, Greenland
2012
}
. However, if the goal of a study is to
explicitly quantify the evidence for the absence of an effect, more appropriate
methods designed for this task, such as equivalence testing or Bayes factors,
should
ideally
be used from the outset.
should be used from the outset.
% two systematic reviews that I found which show that animal studies are very
% much underpowered on average \citep{Jennions2003,Carneiro2018}
...
...
@@ -190,7 +191,7 @@ similar results can be obtained with new data \citep{NSF2019}. There have been
various large
-
scale replication projects in the biomedical and social sciences
in the last decade
\citep
[
among
others
]
{
Prinz
2011
,Begley
2012
,Klein
2014
,Opensc
2015
,Camerer
2016
,Camerer
2018
,Klein
2018
,Cova
2018
,Errington
2021
}
.
Most of these projects
sugges
ted alarmingly low replicability rates across a
Most of these projects
repor
ted alarmingly low replicability rates across a
broad spectrum of criteria for quantifying replicability. While most of these
projects restricted their focus on original studies with statistically
significant results
(
``positive results''
)
, the
\emph
{
Reproducibility Project:
...
...
@@ -239,6 +240,7 @@ solutions using the null results from the RPCB.
rpcbRaw <
-
read.csv
(
file
=
"..
/
data
/
prepped
_
outcome
_
level
_
data.csv"
)
rpcb <
-
rpcbRaw
%>%
select
(
osf
=
OSF.project.link,
paper
=
pID,
experiment
=
eID,
effect
=
oID,
...
...
@@ -305,6 +307,21 @@ rpcbNull <- rpcb %>%
## mutate
(
success
=
sign
(
smdo
)
==
sign
(
smdr
)
&
pr >
=
0
.
05
)
%>%
## summarise
(
sum
(
success
))
## ### noooo :
)
## check the sample sizes
## paper
5
(
https:
//
osf.io
/
q
96
yj
)
-
1
Cohen's d
-
sample size correspond to forest plot
## paper
9
(
https:
//
osf.io
/
yhq
4
n
)
-
3
Cohen's w
-
sample size do not correspond at all
## paper
15
(
https:
//
osf.io
/
ytrx
5
)
-
1
r
-
sample size correspond to forest plot
## paper
19
(
https:
//
osf.io
/
465
r
3
)
-
2
Cohen's dz
-
sample size correspond to forest plot
## paper
20
(
https:
//
osf.io
/
acg
8
s
)
-
1
r and
1
Cliff's delta
-
sample size correspond to forest plot
## paper
21
(
https:
//
osf.io
/
ycq
5
g
)
-
1
Cohen's d
-
sample size correspond to forest plot
## paper
24
(
https:
//
osf.io
/
pcuhs
)
-
2
Cohen's d
-
sample size correspond to forest plot
## paper
28
(
https:
//
osf.io
/
gb
7
sr
/)
-
3
Cohen's d
-
sample size correspond to forest plot
## paper
29
(
https:
//
osf.io
/
8
acw
4
)
-
1
Cohen's d
-
sample size do not correspond, seem to be double
## paper
41
(
https:
//
osf.io
/
qnpxv
)
-
1
Hazard ratio
-
sample size correspond to forest plot
## paper
47
(
https:
//
osf.io
/
jhp
8
z
)
-
2
r
-
sample size correspond to forest plot
## paper
48
(
https:
//
osf.io
/
zewrd
)
-
1
r
-
sample size do not correspond to forest plot for original study
@
...
...
@@ -328,13 +345,13 @@ plotDF1 <- rpcbNull %>%
mutate
(
label
=
ifelse
(
id
==
study
1
,
"Goetz et al.
(
2011
)
\nEvidence
of absence",
"Dawson et al.
(
2011
)
\nAbsence
of evidence"
))
## RH: this data is really a mess. turns out for Dawson n represents the group
## size
(
n
=
6
in https:
//
osf.io
/
8
acw
4
)
while in Goetz it is the sample size of
## the whole experiment
(
n
=
34
and
61
in https:
//
osf.io
/
acg
8
s
)
. in study
2
the
## so multiply by
2
to have the total sample size, see Figure
5
A
## https:
//
doi.org
/
10
.
7554
/
eLife.
25306
.
012
plotDF
1
$
no[plotDF1
$
id
==
study
2
]
<
-
plotDF
1
$
no[plotDF1
$
id
==
study
2
]*
2
plotDF
1
$
nr[plotDF1
$
id
==
study
2
]
<
-
plotDF
1
$
nr[plotDF1
$
id
==
study
2
]*
2
##
## RH: this data is really a mess. turns out for Dawson n represents the group
##
## size
(
n
=
6
in https:
//
osf.io
/
8
acw
4
)
while in Goetz it is the sample size of
##
## the whole experiment
(
n
=
34
and
61
in https:
//
osf.io
/
acg
8
s
)
. in study
2
the
##
## so multiply by
2
to have the total sample size, see Figure
5
A
##
## https:
//
doi.org
/
10
.
7554
/
eLife.
25306
.
012
##
plotDF
1
$
no[plotDF1
$
id
==
study
2
]
<
-
plotDF
1
$
no[plotDF1
$
id
==
study
2
]*
2
##
plotDF
1
$
nr[plotDF1
$
id
==
study
2
]
<
-
plotDF
1
$
nr[plotDF1
$
id
==
study
2
]*
2
## create plot showing two example study pairs with null results
conflevel <
-
0
.
95
ggplot
(
data
=
plotDF
1
)
+
...
...
@@ -370,8 +387,8 @@ ggplot(data = plotDF1) +
pairs which meet the non
-
significance replication success criterion from the
Reproducibility Project: Cancer Biology
\citep
{
Errington
2021
}
. Shown are
standardized mean difference effect estimates with
\Sexpr
{
round
(
conflevel
*
100
,
2
)
}
\%
confidence intervals,
total
sample size, and
$
p
$
-
values for the
null
hypothesis that the standardized mean difference is zero.
}
2
)
}
\%
confidence intervals, sample size
s
, and
two
-
sided
$
p
$
-
values for the
null
hypothesis that the standardized mean difference is zero.
}
\end
{
figure
}
The original study from
\citet
{
Dawson
2011
}
and its replication both show large
...
...
@@ -434,11 +451,17 @@ ggplot(data = rpcbNull) +
geom
_
text
(
aes
(
x
=
0
.
54
, y
=
pmax
(
smdo
+
2
.
5
*
so, smdr
+
2
.
5
*
sr,
1
.
1
*
margin
)
,
label
=
paste
(
"BF
[
'
01
'
]
", ifelse
(
BForig <
=
1
/
1000
, "", "
==
"
)
,
BForigformat
))
, col
=
"darkblue",
parse
=
TRUE, size
=
2
.
3
, vjust
=
1
.
7
, hjust
=
0
,
)
+
parse
=
TRUE, size
=
2
.
3
, vjust
=
1
.
7
, hjust
=
0
)
+
geom
_
text
(
aes
(
x
=
1
.
59
, y
=
pmax
(
smdo
+
2
.
5
*
so, smdr
+
2
.
5
*
sr,
1
.
1
*
margin
)
,
label
=
paste
(
"BF
[
'
01
'
]
", ifelse
(
BFrep <
=
1
/
1000
, "", "
==
"
)
,
BFrepformat
))
, col
=
"darkblue",
parse
=
TRUE, size
=
2
.
3
, vjust
=
1
.
7
, hjust
=
0
,
)
+
parse
=
TRUE, size
=
2
.
3
, vjust
=
1
.
7
, hjust
=
0
)
+
geom
_
text
(
aes
(
x
=
1
.
05
, y
=
pmin
(-
0
.
9
*
margin, smdr
-
1
.
6
*
sr, smdo
-
1
.
6
*
so
)
,
label
=
paste
(
"italic
(
n
)
==
", no
))
, col
=
"darkblue",
parse
=
TRUE, size
=
2
.
3
, hjust
=
0
)
+
geom
_
text
(
aes
(
x
=
2
.
05
, y
=
pmin
(-
0
.
9
*
margin, smdr
-
1
.
6
*
sr, smdo
-
1
.
6
*
so
)
,
label
=
paste
(
"italic
(
n
)
==
", nr
))
, col
=
"darkblue",
parse
=
TRUE, size
=
2
.
3
, hjust
=
0
)
+
theme
_
bw
()
+
theme
(
panel.grid.minor
=
element
_
blank
()
,
panel.grid.major
=
element
_
blank
()
,
...
...
@@ -578,18 +601,27 @@ discussed examples from \citet{Goetz2011} and \citet{Dawson2011} are consistent
with our intuititons
--
there is indeed some evidence for the absence of an
effect in
\citet
{
Goetz
2011
}
, while there is even slightly more evidence for the
presence of an effect in
\citet
{
Dawson
2011
}
, though the Bayes factor is very
close to one due to the small sample sizes.
If we use
a lenient Bayes factor
close to one due to the small sample sizes.
With
a lenient Bayes factor
threshold of
$
\BF
_{
01
}
> 3
$
to define evidence for the absence of the effect,
only one of the twenty study pairs meets this crite
i
ron in both the original and
only one of the twenty study pairs meets this criter
i
on in both the original and
replication study.
<< >>
=
studyInteresting <
-
filter
(
rpcbNull, id
==
"
(
48
,
2
,
4
,
1
)
"
)
noInteresting <
-
studyInteresting
$
no
nrInteresting <- studyInteresting
$
nr
write.csv
(
rpcbNull, "rpcb
-
Null.csv", row.names
=
FALSE
)
@
Among the twenty RPCB null results, there is one interesting case
(
the rightmost
plot in the fourth row
(
48
,
2
,
4
,
1
))
where the Bayes factor is qualitatively
different from the equivalence test, revealing a fundamental difference between
the two approaches. The Bayes factor is concerned with testing whether the
effect is
\emph
{
exactly zero
}
, whereas the equivalence test is concerned with
whether the effect is within an
\emph
{
interval around zero
}
. Due to the very
large sample size in this replication study, the data are incompatible with an
large sample size in the original study
(
$
n =
\Sexpr
{
noInteresting
}$
)
and the
replication
(
$
n =
\Sexpr
{
nrInteresting
}$
)
, the data are incompatible with an
exactly zero effect, but compatible with effects within the equivalence range.
Apart from this example, however, the approaches lead to the same qualitative
conclusion
--
most RPCB null results are highly ambiguous.
...
...
@@ -664,11 +696,11 @@ data set differ in some cases from those in the data set available at
\url
{
https:
//
doi.org
/
10
.
17605
/
osf.io
/
e
5
nvr
}
, which is cited in
\citet
{
Errington
2021
}
. We used this particular version of the data set because
it was recommended to us by the RPCB statistician
(
Maya Mathur
)
upon request.
For the
\citet
{
Dawson
2011
}
example study and its replication
\citep
{
Shan
2017
}
,
the sample sizes
$
n = 3
$
in th data set seem to correspond to the group sample
sizes, see Figure
5
A in the replication study
(
\url
{
https:
//
doi.org
/
10
.
7554
/
eLife.
25306
.
012
}
)
, which is why we report the
total sample sizes of
$
n = 6
$
in Figure~
\ref
{
fig:
2
examples
}
.
%
For the \citet{Dawson2011} example study and its replication \citep{Shan2017},
%
the sample sizes $n = 3$ in th data set seem to correspond to the group sample
%
sizes, see Figure 5A in the replication study
%
(\url{https://doi.org/10.7554/eLife.25306.012}), which is why we report the
%
total sample sizes of $n = 6$ in Figure~\ref{fig:2examples}.
The code and data to reproduce our analyses is openly available at
\url
{
https:
//
gitlab.uzh.ch
/
samuel.pawel
/
rsAbsence
}
. A snapshot of the repository
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment