James M Brophy, Andrew Gelman , 2025-06-19 23:05:00
- James M Brophy, professor of medicine and epidemiology1,
- Andrew Gelman, professor of statistics and political science2
- Correspondence to: J M Brophy james.brophy{at}mcgill.ca
A BMJ investigation by Doshi (doi:10.1136/bmj.r1201)1raises concerns about the integrity of two randomised trials with surrogate endpoints that supported the regulatory approval of ticagrelor.23 If data integrity is compromised, inferences from these studies are unreliable. Additional concerns are the study populations had stable coronary disease and not the acute coronary syndromes ticagrelor is mainly used to treat, 23 and the clinical value of the surrogate marker, inhibition of platelet aggregation, remains uncertain.
Historical examples show that surrogate endpoints, even without data integrity issues, can be misleading. For example, reductions in premature ventricular contractions and glycated haemoglobin levels have both been paradoxically associated with increased mortality in clinical trials.45 Regulatory agencies do not usually accept surrogate endpoints trials alone for approval, but these studies may still influence decision making by providing plausible mechanisms that make clinical endpoint results seem more credible. Ensuring data integrity in these studies is therefore crucial.
Data quality is vital in multinational randomised controlled trials of clinical endpoints used for drug approval as the decision is often based on a single pivotal phase 3 trial.6 A controversial example is the PLATO clinical endpoint trial, which was the pivotal study for ticagrelor approval.7 The trial recruited 18 624 patients from 862 centres in 43 countries, and data integrity issues, including selection bias in dropouts and possible biased adjudication, have been discussed in The BMJ previously.8
Influence of statistical models
The operational difficulties in performing global randomised trials are mammoth. What is less obvious is the important role that statistical models have in the integrity of study inferences. Statistical models codify scientific decision making and motivate more careful experimental design and data collection. However, although statistical methods are often presented as neutral tools to summarise data, they quietly encode strong—often unstated, unrealised, and unreasonable—assumptions that can decisively shape conclusions. Again, PLATO is a case in point. Ignoring for the moment any uncertainties regarding data quality, the estimated hazard ratio of 0.84 (P<0.001) that has been universally interpreted by regulators and clinicians as conclusive evidence of ticagrelor’s superiority is highly dependent on the hidden assumptions of the chosen statistical model.
PLATO was analysed under a pooled model that assumed constant baseline risks and constant treatment effects across centres, countries, and regions, an implausible assumption given that ancillary care after myocardial infarction certainly varies across the 43 participating countries. Ignoring any systematic centre level and country level factors will lead to falsely precise confidence intervals and increased risk of spurious claims of benefit. On the other hand, a statistical model examining each regional subgroup as distinct separate entities, an unpooled model, such as the US subgroup (HR=1.27, 95% CI 0.92 to 1.75), is wasteful as it excludes most of the randomised patient data. This is the difficulty that bedevilled the FDA, which initially rejected and then approved ticagrelor without any change in the database. Unknown is the role that the previously mentioned surrogate studies may have had in subtlety influencing the regulators to change their initial opposition.
However, there is a third statistical model that offers a sortie from this quandary of being a “lumper” or “splitter.” Hierarchical regression is a compromise between unpooled and completely pooled models and indeed includes those two alternatives as special cases, with the amount of pooling estimated from the data in the multicentre trial.9
In the bayesian hierarchical meta-analysis with partial pooling, estimated effects for individual studies are shrunk towards the global mean with improved precision or, more generally, to a prediction based on centre and country level characteristics. However, the overall summary estimate for the average hazard ratio or the hazard ratio for a predicted new hypothetical study will have wider confidence uncertainty than before, reflecting the now acknowledged additional between region or country variations.1011
The PLATO hierarchical meta-analysis yields a summary estimate (HR=0.90, 0.72 to 1.10) with enough uncertainty that a reasonable interpretation is the ticagrelor signal needs confirmation in further studies, especially for US patients. A second US study has not been done. Moreover, later randomised ticagrelor trials have shown no benefit.121314 This newer evidence provides more support for the cautious interpretation of the PLATO hierarchical model and less support for the confident claim of ticagrelor superiority from the pooled model.
The statistical model is not a passive, trivial entity but rather an important element that mediates how data are transformed into regulatory approval, guideline endorsement, and clinical decisions. We believe hierarchical regressions offer more robust estimates and provide a more complete and open accounting of uncertainty. Hence, we recommend they become standard in multicentre randomised trials, especially when data are aggregated across heterogeneous healthcare systems. Although researchers may choose to reject a particular hierarchical model for a multicountry trial, there should be a consensus that when conclusions differ drastically according to reasonable choices of statistical model, the data can hardly be considered robust and the results demand replication.
In an era where global trials shape global practice, assuring both data integrity and their accompanying statistical models are trustworthy is essential. Statistical models must be critically examined with the same thoroughness as the data they seek to summarise.