The sample size required to power a trial to a nominal level in a paired
comparative diagnostic accuracy trial, i.e. Trials in which the diagnostic
accuracy of two testing procedures are compared relative to a gold
standard, depends on the correlation between the two diagnostic tests
being compared. The lower the correlation between the tests the
higher the sample size required, the higher the correlation between
the tests the lower the sample size required. A priori, we usually do not
know the correlation between the two tests and thus cannot determine
the exact sample size. Furthermore, the correlation between two tests
is a quantity for which 1) it is difficult to make an accurate intuitive estimate
and, 2) it is unlikely estimates exist in the literature, particularly if
one of the tests is new, as is very likely to be the case.
One option, suggested in the literature, is to use the implied sample
size for the maximal negative correlation between the two tests,
thus, giving the largest possible sample size. However, this overly
conservative technique is highly likely to be wasteful of resources
and unnecessarily burdensome on trial participants - as the trial is
likely to be overpowered and recruit many more participants than
needed. A more accurate estimate of the sample size can be determined
at a planned interim analysis point where the sample size is
re-estimated - thereby incorporating an internal pilot study into the
trial design, with the intention of producing an accurate estimate of
the correlation between the tests into the trial.
Methods
This paper discusses a sample size estimation and re-estimation
method based on the maximum likelihood estimates, under an implied
multinomial model, of the observed values of correlation between
the two tests and, if required, prevalence, at a planned
interim. The method is illustrated by comparing the accuracy of two
procedures for the detection of pancreatic cancer, one procedure
using the standard battery of tests, and the other using the standard
battery with the addition of a PET/CT scan all relative to the gold
standard of a cell biopsy. Simulation of the proposed method are
also conducted to determine robustness in various conditions.
Results
The results show that the type I error rate of the overall experiment
is stable using our suggested method and that the type II error rate
is close to or above nominal. Furthermore, the instances in which the
type II error rate is above nominal are in the situations where the
lowest sample size is required, meaning a lower impact on the actual
number of participants recruited.
Conclusion
We recommend a paired comparative diagnostic accuracy trial which
used an internal pilot study to re-estimate the sample size at the interim.
This design would use a maximum likelihood estimate, under a
multinomial model, of the correlation between the two tests being
compared for diagnostic accuracy, in order to more effectively estimate
the number of participants required to power the trial to at least the
nominal level.