Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
R
Replication of null results - Absence of evidence or evidence of absence
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Container Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Issue analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Samuel Pawel
Replication of null results - Absence of evidence or evidence of absence
Commits
cd9d0fb4
Commit
cd9d0fb4
authored
2 years ago
by
SamCH93
Browse files
Options
Downloads
Patches
Plain Diff
added the replication BF, polished plot
parent
8fb62cf0
No related branches found
Branches containing commit
No related tags found
Tags containing commit
No related merge requests found
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
.gitignore
+3
-0
3 additions, 0 deletions
.gitignore
rsAbsence.Rnw
+127
-51
127 additions, 51 deletions
rsAbsence.Rnw
with
130 additions
and
51 deletions
.gitignore
+
3
−
0
View file @
cd9d0fb4
...
...
@@ -40,3 +40,6 @@ vignettes/*.pdf
# Emacs LaTeX files
.auctex-auto/
# output pdf
rsAbsence.pdf
This diff is collapsed.
Click to expand it.
rsAbsence.Rnw
+
127
−
51
View file @
cd9d0fb4
...
...
@@ -20,8 +20,8 @@
total=
{
170mm,257mm
}
,
left=25mm,
right=25mm,
top=
2
0mm,
bottom=2
0
mm,
top=
3
0mm,
bottom=2
5
mm,
}
\title
{
\bf
Replication studies and absence of evidence
}
...
...
@@ -72,6 +72,36 @@ Reproducibility <- TRUE
library
(
ggplot
2
)
# plotting
library
(
dplyr
)
# data manipulation
## the replication Bayes factor under normality
BFr <
-
function
(
to, tr, so, sr
)
{
bf <
-
dnorm
(
x
=
tr, mean
=
0
, sd
=
so
)
/
dnorm
(
x
=
tr, mean
=
to, sd
=
sqrt
(
so
^
2
+
sr
^
2
))
return
(
bf
)
}
formatBF. <
-
function
(
BF
)
{
if
(
is.na
(
BF
))
{
BFform <
-
NA
}
else if
(
BF >
1
)
{
if
(
BF >
1000
)
{
BFform <
-
">
1000
"
}
else
{
BFform <
-
as.character
(
signif
(
BF,
2
))
}
}
else
{
if
(
BF <
1
/
1000
)
{
BFform <
-
"<
1
/
1000
"
}
else
{
BFform <
-
paste
0
(
"
1
/
", signif
(
1
/
BF,
2
))
}
}
if
(!
is.na
(
BFform
)
&&
BFform
==
"
1
/
1
"
)
{
return
(
"
1
"
)
}
else
{
return
(
BFform
)
}
}
formatBF <
-
Vectorize
(
FUN
=
formatBF.
)
## data
rpcbRaw <
-
read.csv
(
file
=
"data
/
RP
_
CB Final Analysis
-
Effect level data.csv"
)
rpcb <
-
rpcbRaw
%>%
...
...
@@ -107,10 +137,12 @@ rpcb <- rpcbRaw %>%
smdm
=
(
smdo
/
so
^
2
+
smdr
/
sr
^
2
)*
sm
^
2
, # fixed effect estimate
pm
2
=
2
*(
1
-
pnorm
(
q
=
abs
(
smdm
/
sm
)))
, # two
-
sided fixed effect p
-
value
Q
=
(
smdo
-
smdr
)
^
2
/(
so
^
2
+
sr
^
2
)
, # Q
-
statistic
pQ
=
pchisq
(
q
=
Q, df
=
1
, lower.tail
=
FALSE
)
# p
-
value from Q
-
test
pQ
=
pchisq
(
q
=
Q, df
=
1
, lower.tail
=
FALSE
)
, # p
-
value from Q
-
test
BFr
=
BFr
(
to
=
smdo, tr
=
smdr, so
=
so, sr
=
sr
)
, # replication BF
BFrformat
=
formatBF
(
BF
=
BFr
)
)
## TODO identify correct "null"
effect
s as in paper
## TODO identify correct "null"
finding
s as in paper
rpcbNull <
-
rpcb
%>%
## filter
(
po
1
>
0
.
025
)
#?
filter
(
po >
0
.
05
)
#?
...
...
@@ -130,7 +162,8 @@ rpcbNull <- rpcb %>%
interpretation of replication projects and alike.
}
\\
\rule
{
\textwidth
}{
0
.
5
pt
}
\emph
{
Keywords
}
: Bayesian hypothesis testing,
equivalence test, non
-
inferiority, null hypothesis, replication success
}
equivalence test, non
-
inferiority test, null hypothesis, replication
success
}
\end
{
minipage
}
\end
{
center
}
...
...
@@ -139,19 +172,18 @@ rpcbNull <- rpcb %>%
The general misconception that statistical non
-
significance indicates
th
e
absence of an effect is unfortunately widespread
\citep
{
Altman
1995
}
. A
The general misconception that statistical non
-
significance indicates
evidenc
e
for the
absence of an effect is unfortunately widespread
\citep
{
Altman
1995
}
. A
well
-
designed study is constructed in a way that a large enough sample
(
of
participants, n
)
is used to achieve an
80
-
90
\%
power of correctly rejecting the
null hypothesis. This leaves us with a
10
-
20
\%
chance of a false negative.
Somehow this fact from
“
Hypothesis Testing
101
”
is all too often forgotten and
Somehow this fact from
``
Hypothesis Testing
101
''
is all too often forgotten and
studies showing an effect with a p
-
value larger than the conventionally used
significance level of
0
.
05
is doomed to be “negative study” or showing a “null
effect”. Some have even pleaded for abolishing the term “negative study”, as
every well
-
designed and conducted study is a “positive contribution to
knowledge”, regardless it’s results
[
REF
]
.
\todo
[
inline
]
{
Some more from
https:
//
onlinelibrary.wiley.com
/
doi
/
full
/
10
.
1111
/
jeb.
14009
}
significance level of
0
.
05
is doomed to be ``negative study'' or showing a
``null effect''. Some have even pleaded for abolishing the term ``negative
study'', as every well
-
designed and conducted study is a ``positive contribution
to knowledge'', regardless it’s results
[
REF
]
.
\todo
[
inline
]
{
Some more from
https:
//
onlinelibrary.wiley.com
/
doi
/
full
/
10
.
1111
/
jeb.
14009
}
More specifically, turning to the replication context, the misconception
appeared in the definitions of replication success in some of the large
-
scale
...
...
@@ -162,11 +194,11 @@ replication study is also non-significant. While the authors of the RPEP warn
the reader that the use of p
-
values as criterion for success is problematic when
applied to replications of original non
-
significant findings, the authors of the
RPCB do not. The RP in Psychological Science
[
REF
]
, on the other hand, excluded
the
“
original nulls
”
when deciding replication success based on significance and
the
``
original nulls
''
when deciding replication success based on significance and
the Social Science RP
[
REF
]
as well as the RP in Experimental Economics
[
REF
]
did not include original studies without a significant finding.
\section
{
To replicate or not to replicate
(
a
“
null
”
)
?
}
\section
{
To replicate or not to replicate
(
a
``
null
''
)
?
}
Because of the previously presented fallacy, original studies with
non
-
significant effects are seldom replicated. Given the cost of replication
studies, it is also unwise to advise replicating a study that has low changes of
...
...
@@ -174,35 +206,52 @@ successful replication. To help deciding what studies are worth repeating,
efforts to predict which studies have a higher chance to replicate successfully
emerged
[
REF
]
. Of note is that the chance of a successful replication
intrinsically depends on the definition of replication success. If for a
successful replication we need a
“
significant result in the same direction in
both the original and the replication study
”
(
i.e. the two
-
trials rule
)
,
successful replication we need a
``
significant result in the same direction in
both the original and the replication study
''
(
i.e. the two
-
trials rule
)
,
replicating a non
-
significant original result does indeed not make any sense.
However, the use of significance as sole criterion for replication success has
its shortcomings .....
\todo
[
inline
]
{
SP: look and discuss the papers from
\citet
{
Anderson
2016
, Anderson
2017
}}
\section
{
Example
-
“Proving the null
-
hypothesis” in the cancer biology
replication project
}
\section
{
Example
: ``Null findings'' from the Reproducibility Project: Cancer
Biology
}
Of the
158
effects presented in
23
original studies that were repeated in the
cancer biology RP
\citep
{
Errington
2021
}
14
\%
(
22
)
were interpreted as “null
effects”. One of those repeated effects with a non
-
significant original finding
was presented in Lu et al.
(
2014
)
and replicated by Richarson et al
(
2016
)
. Note
that the attempt to replicate all the experiments from the original study was
not completed because of some unforeseen issues in the implementation
(
see
cancer biology RP
\citep
{
Errington
2021
}
14
\%
(
22
)
were interpreted as ``null
effects''.
% One of those repeated effects with a non-significant original finding was
% presented in Lu et al. (2014) and replicated by Richarson et al (2016).
Note that the attempt to replicate all the experiments from the original study
was not completed because of some unforeseen issues in the implementation
(
see
Errington et al
(
2021
)
for more details on the unfinished registered reports in
the RPCB
)
. The replication of our example effect
(
Paper
\#
47
, Experiment
\#
1
,
Effect
\#
5
)
was however completed. The authors of the original study declared
that there was no statistically significant difference in the level of
trimethylation of H
3
K
36
me
3
in tumor cells with or without specific mutations
(
two
-
sided p
-
value of
0
.
16
)
. The replication authors also found a
non
-
significant effect with a two
-
sided p
-
value of
0
.
38
and thus, according to
Errington et al., the replication of this effect was consistent with the
original findings. The effect sized found in the public data
(
downloaded from
osf.io
/
39
s
7
j
)
are correlation coefficients, which were transformed to a Fisher
-
z
scale
(
using arctanh
)
. Figure X shows the original and replication effect sizes
together with their
95
\%
confidence intervals and respective two
-
sided p
-
values.
the RPCB
)
. Figure~
\ref
{
fig:nullfindings
}
shows effect estimates with confidence
intervals for the original ``null findings''
(
with
$
p
_{
o
}
> 0.05
$
)
and their
replication studies from the project.
% The replication of our example effect (Paper \# 47, Experiment \# 1, Effect \#
% 5) was however completed. The authors of the original study declared that
% there was no statistically significant difference in the level of
% trimethylation of H3K36me3 in tumor cells with or without specific mutations
% (two-sided p-value of 0.16). The replication authors also found a
% non-significant effect with a two-sided p-value of 0.38 and thus, according to
% Errington et al., the replication of this effect was consistent with the
% original findings. The effect sized found in the public data (downloaded from
% osf.io/39s7j) are correlation coefficients, which were transformed to a
% Fisher-z scale (using arctanh). Figure X shows the original and replication
% effect sizes together with their 95\% confidence intervals and respective
% two-sided p-values.
\todo
[
inline
]
{
SP: I have used the original
$
p
$
-
values as reported in the data
set to select the studies in the figure . I think in this way we have the data
correctly identified as the RPCP paper reports that there are
20
null findings
in the ``All outcomes'' category. I wonder how they go from the all outcomes
category to the ``effects'' category
(
15
null findings
)
, perhaps pool the
internal replications by meta
-
analysis? I think it would be better to stay in
the all outcomes category, but of course it needs to be discussed. Also some
of the
$
p
$
-
values were probably computed in a different way than under
normality
(
e.g., the
$
p
$
-
value from
(
47
,
1
,
6
,
1
)
under normality is clearly
significant
)
.
}
\begin
{
figure
}
[!
htb
]
<< "plot
-
null
-
findings
-
rpcb", fig.height
=
9
>>
=
<< "plot
-
null
-
findings
-
rpcb", fig.height
=
8
.
5
>>
=
ggplot
(
data
=
rpcbNull
)
+
facet
_
wrap
(
~ id, scales
=
"free", ncol
=
4
)
+
geom
_
hline
(
yintercept
=
0
, lty
=
2
, alpha
=
0
.
5
)
+
...
...
@@ -210,14 +259,27 @@ ggplot(data = rpcbNull) +
ymax
=
smdo
+
2
*
so
))
+
geom
_
pointrange
(
aes
(
x
=
"Replication", y
=
smdr, ymin
=
smdr
-
2
*
sr,
ymax
=
smdr
+
2
*
sr
))
+
geom
_
text
(
aes
(
x
=
"Replication", y
=
pmax
(
smdr
+
2
.
1
*
sr, smdo
+
2
.
1
*
so
)
,
label
=
paste
(
"'BF'
[
'
01
'
]
",
ifelse
(
BFrformat
==
"<
1
/
1000
", "", "
==
"
)
,
BFrformat
))
,
parse
=
TRUE, size
=
3
,
nudge
_
y
=
-
0
.
5
)
+
labs
(
x
=
"", y
=
"Standardized mean difference
(
SMD
)
"
)
+
theme
_
bw
()
+
theme
(
panel.grid.minor
=
element
_
blank
()
,
panel.grid.major.x
=
element
_
blank
())
@
\caption
{
Null findings from the Reproducibility Project: Cancer Biology
\citep
{
Errington
2021
}
.
}
\caption
{
Standardized mean difference effect estimates with
95
\%
confidence
interval for the ``null findings''
(
with
$
p
_{
o
}
> 0.05
$
)
and their replication
studies from the Reproducibility Project: Cancer Biology
\citep
{
Errington
2021
}
.
The identifier above each plot indicates
(
Original paper number, Experiment
number, Effect number, Internal replication number
)
. The data were downloaded
from
\url
{
https:
//
doi.org
/
10
.
17605
/
osf.io
/
e
5
nvr
}
. The relevant variables were
extracted from the file ``
\texttt
{
RP
\_
CB Final Analysis
-
Effect level
data.csv
}
''.
}
\label
{
fig:nullfindings
}
\end
{
figure
}
\section
{
Equivalence Design
}
...
...
@@ -226,8 +288,9 @@ understand whether a new drug, which might be cheaper or have less side effects
is equivalent to a drug already on the market
[
some general REF
]
. Essentially,
this type of design tests whether the difference between the effects of both
treatments or interventions is smaller than a predefined margin
/
threshold.
Turning back to the replication contexts and our example ....
\todo
[
inline
]
{
fix margin:
to
0
.
25
??
}
Turning back to the replication contexts and our example ....
% \todo[inline]{fix margin:
% to 0.25??}
\section
{
Bayesian Hypothesis Testing
}
...
...
@@ -235,16 +298,29 @@ Bayesian hypothesis testing is a hypothesis testing framework in which the
distinction between absence of evidence and evidence of absence is more natural.
The central quantity is the Bayes factor
\citep
{
Jeffreys
1961
, Good
1958
,
Kass
1995
}
, that is, the updating factor of the prior odds to the corresponding
posterior odds of the null hypothesis H
0
versus the alternative hypothesis H
1
[
insert BF
01
equation
]
. The Bayes factor is a measure of evidence which is
inferentially relevant to the researchers as it directly measures how much the
data have increased
(
BF
01
>
1
)
or decreased
(
BF
01
<
1
)
the odds of H
0
relative
to H
1
. Bayes factors are symmetric
(
BF
10
=
1
/
BF
01
)
, so if a Bayes factor is
oriented in BF
10
, it can easily be transformed to a Bayes factor BF
01
orientation, and vice versa.
Bayes factor have also been proposed for the replication setting. Specifically,
the replication Bayes factor
\citep
{
Verhagen
2014
}
.
posterior odds of the null hypothesis
$
H
_{
0
}$
versus the alternative hypothesis
$
H
_{
1
}$
\begin
{
align
*
}
\underbrace
{
\frac
{
\Pr
(
H
_{
0
}
\given
\mathrm
{
data
}
)
}{
\Pr
(
H
_{
1
}
\given
\mathrm
{
data
}
)
}}_{
\mathrm
{
Posterior~odds
}}
=
\underbrace
{
\frac
{
\Pr
(
H
_{
0
}
)
}{
\Pr
(
H
_{
1
}
)
}}_{
\mathrm
{
Prior~odds
}}
\times
\underbrace
{
\frac
{
f
(
\mathrm
{
data
}
\given
H
_{
0
}
)
}{
f
(
\mathrm
{
data
}
\given
H
_{
1
}
)
}}_{
\mathrm
{
Bayes~factor
}
~
\BF
_{
01
}}
.
\end
{
align
*
}
As such, the Bayes factor is an evidence measure which is inferentially relevant
to researchers as it quantifies how much the data have increased
(
$
\BF
_{
01
}
> 1
$
)
or decreased
(
$
\BF
_{
01
}
< 1
$
)
the odds of the null hypothesis
$
H
_{
0
}$
relative to the alternative
$
H
_{
1
}$
. Bayes factors are symmetric
(
$
\BF
_{
01
}
= 1/
\BF
_{
10
}$
)
, so if a Bayes factor is oriented toward the null
hypothesis
(
$
\BF
_{
01
}$
)
, it can easily be transformed to a Bayes factor oriented
toward the alternative
(
$
\BF
_{
10
}$
)
, and vice versa.
The data thus provide evidence for the null hypothesis if the Bayes factor is
larger than one
(
$
\BF
_{
01
}
> 1
$
)
, whereas a Bayes factor around one indicates
absence of evidence for either hypothesis
(
$
\BF
_{
01
}
\approx
1
$
)
.
% Bayes factor have also been proposed for the replication setting. Specifically,
% the replication Bayes factor \citep{Verhagen2014}.
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment