This repository contains the protocol for the simulation study which can be
found [here](https://gitlab.switch.ch/felix.hofmann2/hmean_switch).
## Getting started
To make it easy for you to get started with GitLab, here's a list of recommended next steps.
Already a pro? Just edit this README.md and make it your own. Want to make it easy? [Use the template at the bottom](#editing-this-readme)!
## Add your files
-[ ] [Create](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#create-a-file) or [upload](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#upload-a-file) files
-[ ] [Add files using the command line](https://docs.gitlab.com/ee/gitlab-basics/add-file.html#add-a-file-using-the-command-line) or push an existing Git repository with the following command:
-[ ] [Set up project integrations](https://gitlab.uzh.ch/felix.hofmann2/simulation_protocol/-/settings/integrations)
## Collaborate with your team
-[ ] [Invite team members and collaborators](https://docs.gitlab.com/ee/user/project/members/)
-[ ] [Create a new merge request](https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html)
-[ ] [Automatically close issues from merge requests](https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#closing-issues-automatically)
Use the built-in continuous integration in GitLab.
-[ ] [Get started with GitLab CI/CD](https://docs.gitlab.com/ee/ci/quick_start/index.html)
-[ ] [Analyze your code for known vulnerabilities with Static Application Security Testing(SAST)](https://docs.gitlab.com/ee/user/application_security/sast/)
-[ ] [Deploy to Kubernetes, Amazon EC2, or Amazon ECS using Auto Deploy](https://docs.gitlab.com/ee/topics/autodevops/requirements.html)
-[ ] [Use pull-based deployments for improved Kubernetes management](https://docs.gitlab.com/ee/user/clusters/agent/)
-[ ] [Set up protected environments](https://docs.gitlab.com/ee/ci/environments/protected_environments.html)
***
# Editing this README
When you're ready to make this README your own, just edit this file and use the handy template below (or feel free to structure it however you want - this is just a starting point!). Thank you to [makeareadme.com](https://www.makeareadme.com/) for this template.
## Suggestions for a good README
Every project is different, so consider which of these sections apply to yours. The sections used in the template are suggestions for most open source projects. Also keep in mind that while a README can be too long and detailed, too long is better than too short. If you think your README is too long, consider utilizing another form of documentation rather than cutting out information.
## Name
Choose a self-explaining name for your project.
## Description
Let people know what your project can do specifically. Provide context and add a link to any reference visitors might be unfamiliar with. A list of Features or a Background subsection can also be added here. If there are alternatives to your project, this is a good place to list differentiating factors.
## Badges
On some READMEs, you may see small images that convey metadata, such as whether or not all the tests are passing for the project. You can use Shields to add some to your README. Many services also have instructions for adding a badge.
## Visuals
Depending on what you are making, it can be a good idea to include screenshots or even a video (you'll frequently see GIFs rather than actual videos). Tools like ttygif can help, but check out Asciinema for a more sophisticated method.
## Installation
Within a particular ecosystem, there may be a common way of installing things, such as using Yarn, NuGet, or Homebrew. However, consider the possibility that whoever is reading your README is a novice and would like more guidance. Listing specific steps helps remove ambiguity and gets people to using your project as quickly as possible. If it only runs in a specific context like a particular programming language version or operating system or has dependencies that have to be installed manually, also add a Requirements subsection.
## Usage
Use examples liberally, and show the expected output if you can. It's helpful to have inline the smallest example of usage that you can demonstrate, while providing links to more sophisticated examples if they are too long to reasonably include in the README.
## Support
Tell people where they can go to for help. It can be any combination of an issue tracker, a chat room, an email address, etc.
## Roadmap
If you have ideas for releases in the future, it is a good idea to list them in the README.
## Contributing
State if you are open to contributions and what your requirements are for accepting them.
For people who want to make changes to your project, it's helpful to have some documentation on how to get started. Perhaps there is a script that they should run or some environment variables that they need to set. Make these steps explicit. These instructions could also be useful to your future self.
You can also document commands to lint the code or run tests. These steps help to ensure high code quality and reduce the likelihood that the changes inadvertently break something. Having instructions for running tests is especially helpful if it requires external setup, such as starting a Selenium server for testing in a browser.
## Authors and acknowledgment
Show your appreciation to those who have contributed to the project.
## License
For open source projects, say how it is licensed.
## Project status
If you have run out of energy or time for your project, put a note at the top of the README saying that development has slowed down or stopped completely. Someone may choose to fork your project or volunteer to step in as a maintainer or owner, allowing your project to keep going. You can also make an explicit request for maintainers.
author={Mawdsley, David and Higgins, Julian P. T. and Sutton, Alex J. and Abrams, Keith R.},
title={Accounting for heterogeneity in meta-analysis using a multiplicative model---an empirical study},
journal={Research Synthesis Methods},
volume={8},
number={1},
pages={43--52},
doi={doi.org/10.1002/jrsm.1216},
year={2017}
}
@article{morr:etal:19,
author={Morris, Tim P. and White, Ian R. and Crowther, Michael J.},
title={Using simulation studies to evaluate statistical methods},
journal={Statistics in Medicine},
volume={38},
number={11},
pages={2074--2102},
doi={doi.org/10.1002/sim.8086},
year={2019}
}
@article{Gnei:Raft:07,
author={Tilmann Gneiting and Adrian E Raftery},
title={Strictly Proper Scoring Rules, Prediction, and Estimation},
journal={Journal of the American Statistical Association},
volume={102},
number={477},
pages={359-378},
year={2007},
publisher={Taylor \& Francis},
doi={10.1198/016214506000001437},
}
@Article{ruck:etal:08,
author={R{\"u}cker, Gerta and Schwarzer, Guido and Carpenter, James R. and Schumacher, Martin},
title={Undue reliance on I2 in assessing heterogeneity may mislead},
journal={BMC Medical Research Methodology},
year={2008},
volume={8},
number={1},
pages={79},
issn={1471-2288},
doi={10.1186/1471-2288-8-79},
url={https://doi.org/10.1186/1471-2288-8-79}
}
@article{burt:etal:06,
title={The design of simulation studies in medical statistics},
author={Burton, Andrea and Altman, Douglas G and Royston, Patrick and Holder, Roger L},
journal={Statistics in Medicine},
volume={25},
number={24},
pages={4279--4292},
year={2006},
publisher={Wiley Online Library}
}
@book{Altman,
author={Altman, Douglas G and Machin, David and Bryant, Trevor N and Gardner, Martin J},
title={{Statistics with Confidence}},
publisher={BMJ Books},
year={2000},
edition={Second}
}
@article{GneitingRaftery,
year={2007},
volume={102},
number={477},
pages={359--378},
author={Tilmann Gneiting and Adrian E Raftery},
title={{Strictly Proper Scoring Rules, Prediction, and Estimation}},
journal={Journal of the American Statistical Association}
}
@article{GneitingBracher,
year={2021},
volume={17},
number={2},
author={Johannes Bracher and Evan L Ray and Tilmann Gneiting and Nicholas G Reich},
title={{Evaluating epidemic forecasts in an interval format}},
journal={PLoS Computational Biology}
}
@book{HeldSabanesBove,
author={Leonhard Held and Saban\'{e}s Bov\'{e}, Daniel},
title={{Likelihood and Bayesian Inference - With Applications in Biology and Medicine}},
publisher={Springer},
year={2020},
edition={Second}
}
@article{Held2020a,
year={2020},
volume={183},
number={2},
pages={431--448},
author={Leonhard Held},
title={{A new standard for the analysis and design of replication studies}},
journal={Journal of the Royal Statistical Society Series A}
}
@article{Held2020b,
year={2020},
volume={69},
number={3},
pages={697--708},
author={Leonhard Held},
title={{The harmonic mean $\chi^2$-test to substantiate scientific findings}},
journal={Journal of the Royal Statistical Society Series C}
}
@article{IntHoutIoannidis,
year={2014},
volume={14},
number={25},
author={Joanna IntHout and John PA Ioannidis and George F Borm},
title={{The Hartung-Knapp-Sidik-Jonkman method for random effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method}},
journal={BMC Medical Research Methodology}
}
@article{Newcombe1998,
year={1998},
volume={17},
number={8},
pages={857--872},
author={Newcombe, Robert G},
title={{Two-sided confidence intervals for the single proportion: comparison of seven methods}},
journal={Statistics in Medicine}
}
@article{Newcombe2011,
year={2011},
volume={40},
number={10},
pages={1743--1767},
author={Newcombe, Robert G},
title={{Measures of Location for Confidence Intervals for Proportions}},
journal={Communications in Statistics -- Theory and Methods}
}
@book{Newcombe2013,
author={Newcombe, Robert G},
title={{Confidence Intervals for Proportions and Related Measures of Effect Size}},
Comparison of confidence intervals summarizing the\\[2mm]
uncertainty of the combined estimate of a meta-analysis
}\\
\bigskip
{\noindent\Large Florian Gerber, Leonhard Held, Lisa Hofer, Felix Hofmann, Philip Heesen
}\end{center}
\bigskip
\vspace*{.5cm}
For the present protocol is inspired by \citet{burt:etal:06} and \citet{morr:etal:19}.
The simulation is implemented in \texttt{simulate\_all.R}.
\tableofcontents
\newpage
\section{Aims and objectives}\label{ref:aims}
% TODO:
% Add paragraph for distribution of the test statistic (F and Chi-square) for harmonic mean methods (or did we decide to ditch the F-distribution?)
% Standard REML is missing (DONE)
% Add k-Trials (DONE)
%\color{red}
%Changes to the last version:
%\begin{itemize}
%\color{red}
%\item The protocol mentioned the DerSimonian-Laird method to construct confidence intervals. However, there is no method called ``DerSimonian-Laird'' in the simulation. The paper indicates that this also uses a random effects model with REML. Is this the same as the method called ``REML'' that we already have in the simulation?
%\item Removed harmonic mean with alternative \texttt{two.sided}
%\item Added entries for k-Trials rule
% \item Added a subsection about the distribution of the minimum of the p-value function in ``Aims and objectives''
%\item Updated criteria for evaluation of CIs
%\item Updated number of studies $k$
%\end{itemize}
%\vspace*{.5cm}
%Notes:
%\begin{itemize}
%\color{red}
% \item{Should we mention that we tried F-distribution for the harmonic mean test statistic?}
%\end{itemize}
%\color{black}
The aim of this simulation study is the comparison of confidence intervals (CIs) summarizing the uncertainty of the combined estimate of a meta-analysis. Specifically, we focus on CIs constructed using the harmonic mean method, which is described in \citet{Held2020b}, and the $k$-trials rule, which is defined in Subsection~\ref{sec:ktrial}. The underlying data sets are simulated as described in Section~\ref{sec:simproc} and Section~\ref{sec:scenario}. The resulting intervals are then compared to CIs constructed using the other methods listed in Section~\ref{sec:method} using the measures defined in Section~\ref{sec:meas}.
\section{Simulation of the data sets}\label{sec:simproc}
\subsection{Allowance for failures}
We expect no failures, \ie, for all simulated data sets all type of CI methods should lead to a valid CI and all valid CIs should lead to valid CI criteria.
If a failure occurs, we stop the simulation and investigate the reason for the failure.
\subsection{Software to perform simulations}
The simulation study is performed using the statistical software R \citep{R}.
We save the output of \texttt{sessionInfo()} giving information on the used version of R, packages, and platform with the simulation results.
\subsection{Random number generator}
We use the package \pkg{doRNG} with its default random number generator to ensure that random numbers generated inside parallel for loops are independent and reproducible.
\subsection{Scenarios to be investigated}\label{sec:scenario}
The $720$ simulated scenarios consist of all combinations of the following parameters:
\item Heterogeneity model $\in\{\text{'additive'}, \text{'multiplicative'}\}$.
\item Number of studies summarized by the meta-analysis $k \in\{3, 5, 10, 20, 50\}$.
\item Publication bias is $\in\{\text{'none'}, \text{'moderate'}, \text{strong'}\}$ following the terminology of \citet{henm:copa:10}.
The average study effect also influences the publication bias, and we set it to $\theta=0.2$ to obtain a similar scenario as used in \citet{henm:copa:10}.
\item The distribution to draw the true study values $\delta_i$ is either 'Gaussian' or 't' with 4 degrees of freedom. The latter still has finite mean and variance, but leads to more 'outliers'.
\item The sample size $n_i$ of the $i$-th study (number of patients per study) is $n_i =50$ (small study) except for 0, 1, or 2 studies where $n_i=500$ (large study).
\end{itemize}
Note that \citet{IntHoutIoannidis} use a similar setup.
\subsection{Simulation details}
For the \textbf{Additive heterogeneity model without publication bias}, the simulation of one meta-analysis dataset is performed as follows:
\begin{enumerate}
\item Compute the within-study variance $\epsilon^2=\frac{2}{k}\sum\limits_{i=1}^k \frac{1}{n_i}$.
\item Compute the between-study variance
\begin{equation}\label{eq:eq1}
\tau^2 = \epsilon^2 \frac{I^2}{1-I^2}.
\end{equation}
\item For a trial $i$ of the meta-analysis with $k$ trials, $i =1, \dots, k$:
\begin{enumerate}
\item Simulate the true effect size using the Gaussian model: $\delta_i \sim\N(\theta, \tau^2)$ or using a Student-$t$ distribution with 4 degrees of freedom such that the samples have mean $\theta$ and variance $\tau^2$.
\item Simulate the effect estimates of each trial $y_i \sim\N(\delta_i, \frac{2}{n_i})$.
\item Simulate the standard errors of the trial outcomes: $\text{se}_i \sim\sqrt{\frac{\chi^2(2n_i-2)}{(n_i-1)n_i}}$.
\end{enumerate}
\paragraph{Note: The marginal variance}\mbox{}\\
The marginal variance of this simulation procedure is
$\tau^2+2/n$, so follows the additive model as intended.
\end{enumerate}
For the \textbf{Multiplicative model without publication bias}, the simulation of one meta-analysis dataset is performed as follows:
\begin{enumerate}
\item Compute the within-study variance $\epsilon^2=\frac{2}{k}\sum\limits_{i=1}^k \frac{1}{n_i}$.
\item Compute the multiplicative heterogeneity factor $\phi=\frac{1}{1-I^2}$. Compute the corresponding
\begin{equation}\label{eq:eq2}
\tau^2 = \epsilon^2 \, (\phi-1) .
\end{equation}
\item For a trial $i$ of the meta-analysis with $k$ trials, $i =1, \dots, k$:
\begin{enumerate}
\item Simulate the true effect size using the Gaussian model: $\delta_i \sim\N(\theta, \tau^2)$ or using a Student-$t$ distribution such that the samples have mean $\theta$ and variance $\tau^2$.
\item Simulate the effect estimates of each trial $y_i \sim\N(\delta_i, \frac{2}{n_i})$.
\item Simulate the standard errors of the trial outcomes: $\text{se}_i \sim\sqrt{\frac{\chi^2(2n_i-2)}{(n_i-1)n_i}}$.
\end{enumerate}
\end{enumerate}
\paragraph{Note: The marginal variance}\mbox{}\\
The marginal variance of this simulation procedure is
$\frac{2}{n}\,(\phi-1)+\frac{2}{n}=\frac{2\phi}{n}=\phi\,\epsilon^2$, so follows the multiplicative
model as intended.
\paragraph{Note: Publication bias}\mbox{}\\
To simulate studies under \textbf{publication bias}, we follow the suggestion of \citet{henm:copa:10} and accept each simulated study with probability
$$\exp(-4\,\Phi(-y_i /\text{se}_i)^\gamma),$$
where $\gamma=3$ and $\gamma=1.5$ correspond to \emph{moderate} and \emph{strong} publication bias, respectively.
This is, accepted studies are kept and for a rejected study we replace $y_i$ and $\text{se}_i$ by newly simulated values, which are then again accepted with the given probability above.
This procedure is repeated until the required number of studies is simulated.
The mean study effect $\theta$ and the sample size $n_i$ have an influence on the acceptance probability.
To obtain a similar scenario as in \citet{henm:copa:10} we set
However, we assume that only small studies with $n_i =50$ are subject to publication bias. Thus, larger studies with $n_i =500$ are always accepted. For the effect size, we set $\theta=0.2$
% To study the effect of unbalanced sample sizes $n_1, \ldots, n_k$ we consider the following setup:
% \begin{enumerate}
% \item Increase the sample size of \textbf{one} of the $k$ by a factor 10.
% \item Increase the sample size of \textbf{two} of the $k$ by a factor 10.
% \end{enumerate}
%% See the argument \texttt{large} of \texttt{simREbias()}.
\subsection{Simulation procedure}
For each scenario in Section~\ref{sec:scenario} we
\begin{enumerate}
\item simulate 10'000 meta-analysis datasets
\item compute the CIs listed in Section~\ref{sec:method} for each meta-analysis
\item summarize the performance of the CIs by the criteria listed in Section~\ref{sec:meas}
\end{enumerate}
\section{Analysis of the confidence intervals}
This section contains an overview over the construction methods for CIs that we consider in this simulation. Moreover, we explain what measures we use in order to compare the different CIs with each other.
\subsection{Construction methods for confidence intervals}\label{sec:method}
For this project, we will calculate 95\% CIs according to the following methods.
\item Random effects model (with REML estimate of the heterogeneity variance). %Check
\item Henmi and Copas (HC) \citep{henm:copa:10}. % Check
\item Harmonic mean analysis with alternative \texttt{none}\citep{Held2020b} and without variance adjustment. % Check
% \item Harmonic mean analysis with alternative {\texttt two.sided} \cite{Held2020b} % This was also thrown out
\item Harmonic mean analysis with alternative \texttt{none}, additive variance adjustment with $\hat\tau^2$. An extension of the idea in \citet{Held2020b}.
\item Harmonic mean analysis with alternative \texttt{none}, multiplicative variance adjustment \citep{mawd:etal:17}.
\item$k$-trials rule with alternative \texttt{none} and without variance adjustment.
\item$k$-trials rule with alternative \texttt{none}, additive variance adjustment with $\hat\tau^2$.
\item$k$-trials rule with alternative \texttt{none}, multiplicative variance adjustment.
\end{enumerate}
\subsection{Definition of the $k$-trials rule}\label{sec:ktrial}
Similar to the harmonic mean method, the $k$-trials rule takes a mean value under the null hypothesis $\mu_{0}$ as well as effect estimates $\hat{\theta_{i}}, i =1, \dots, k$ and the corresponding standard errors $\text{se}(\hat{\theta_i})$ from $k$ different studies as input and calculates the resulting $p$-value according to Equation~\ref{f:ktrial}.
As the effect estimates $\hat{\theta_i}$ and the corresponding standard errors $\text{se}(\hat{\theta_i})$ are usually given in the context of meta-analyses, the above $p$-value function only depends on $\mu_{0}$. Therefore, CI limits are computed by searching for those values of $\mu_0$ for which $p(\mu_0)=0.05$. This may result in confidence sets containing more than one confidence interval.
In case of variance adjustments, the term $\text{se}(\hat{\theta_i})$ in Equation~\ref{f:ktrial} is replaced with $\text{se}_{\text{adj}}(\hat{\theta_i})$, which is defined in Subsection~\ref{sec:varadj}.
\subsection{Definition of the variance adjustments}\label{sec:varadj}
As stated in Subsection~\ref{sec:method}, the harmonic mean and $k$-trials methods can be extended such that heterogeneity between the individual studies is taken into account. In scenarios where the additive variance adjustment is used, we estimate the between study variance $\tau^2$ using the REML method implemented in the \texttt{metagen}\texttt{R}-package ``meta'' and adjust the study-specific standard errors such that $\text{se}_{\text{adj}}(\hat{\theta_i})=\sqrt{\text{se}(\hat{\theta_i})^2+\tau^2}$.
In case of the multiplicative variance adjustment, we estimate the multiplicative parameter $\phi$ as described in \citet{mawd:etal:17} and adjust the study-specific standard errors such that $\text{se}_{\text{adj}}(\hat{\theta_i})=\text{se}(\hat{\theta_i})\cdot\sqrt{\phi}$.
\subsection{Measures considered}\label{sec:meas}
We assess the CIs using the following criteria
\begin{enumerate}
\item CI coverage of combined effect, \ie, the proportion of intervals containing the true effect % coverage_true
\item CI coverage of study effects, \ie, the proportion of intervals containing the true study-specific effects % coverage_effects
\item CI coverage of all study effects, \ie, whether or not the CI covers all of the study effects %coverage_all
\item CI coverage of at least one of the study effects, \ie, whether or not the CI covers at least one of the study effects % coverage_effects_min1
\item Prediction Interval (PI) coverage, \ie, the proportion of intervals containing the treatment effect of a newly simulated study. The newly simulated study has $n =50$ and is not subject to publication bias. All other simulation parameters stay the same as for the simulation of the original studies (only for Harmonic mean, $k$-trials, REML, and HK methods) % coverage_prediction
\item CI width (Corresponds to the sum the width of the individual intervals in case of more than one interval)%width
\item Interval score \citep{Gnei:Raft:07}% score
\item Number of CIs (only for Harmonic mean and $k$-trials methods). % n
\end{enumerate}
\vspace*{.5cm}
For the Harmonic mean and $k$-trials methods, we also investigate the distribution of the lowest value of the $p$-value function between the lowest and the highest treatment effect of the simulated studies. In order to do so, we calculate the following measures:
\begin{itemize}
\item Minimum
\item First quartile
\item Mean
\item Median
\item Third quartile
\item Maximum
\end{itemize}
\vspace*{.5cm}
As both methods, harmonic mean and $k$-trials, can result in more than one CI for a given meta-analysis, we record the relative frequency of the number of intervals $m$ over the 10'000 iterations for each of the different scenarios mentioned in Section~\ref{sec:scenario}. However, we truncate the distribution by summarising all events where the number of intervals is $> 9$.
\section{Estimates to be stored for each simulation and summary measures to be calculated over all simulations}
For each simulated meta-analysis we construct CIs according to all methods (Section~\ref{sec:method}) and calculate all available assessments (Section~\ref{sec:meas}) for the respective method. For assessments 1-8 in Subsection~\ref{sec:meas} we only store the mean value of all the 10'000 iterations in a specific scenario. Regarding the distribution of the lowest value of the $p$-value function, we store the summary measures mentioned in the respective paragraph of Subsection~\ref{sec:meas}. We calculate the relative frequencies of the number of intervals $m=1, 2, \ldots, 9, >9$ in each confidence set over the 10'000 iterations of the same scenario.
\section{Presentation of the simulation results}
For each of the performance measures 1-8 in Subsection~\ref{sec:meas} we construct plots with
\begin{itemize}
\item the number of studies $k$ on the $x$-axis
\item the performance measure on the $y$-axis
\item one connecting line and color for each value of $I^2$
\item one panel for each CI method
\end{itemize}
Regarding the distribution of the $p$-value function for the harmonic mean and $k$-trials methods, we will create plots that contain
\begin{itemize}
\item the number of studies $k$ on the $x$-axis
\item the value of the summary statistic on the $y$-axis
\item one connecting line and color for each summary statistic
\item one panel for each CI method
\end{itemize}
The plots for the relative frequencies of the number of intervals have
\begin{itemize}
\item the category ($1$ to $9$ and $>9$) indicating the number of intervals $n$ on the $x$-axis
\item the relative frequency on the $y$-axis
\item a bar for each category indicating the relative frequency for the respective category