Home > Research > Publications & Outputs > Using machine learning to predict anticoagulati...


Text available via DOI:

View graph of relations

Using machine learning to predict anticoagulation control in atrial fibrillation: A UK Clinical Practice Research Datalink study

Research output: Contribution to Journal/MagazineJournal articlepeer-review

  • M. Norman
  • T. Mason
  • C. Dickerson
  • B. Sandler
  • K.G. Pollock
  • U. Farooqui
  • L. Groves
  • C. Tsang
  • D. Clifton
  • A. Bakhai
  • N.R. Hill
Article number100688
<mark>Journal publication date</mark>31/12/2021
<mark>Journal</mark>Informatics in Medicine Unlocked
Number of pages10
Publication StatusPublished
Early online date5/08/21
<mark>Original language</mark>English


To investigate the predictive performance of machine learning (ML) algorithms for estimating anticoagulation control in patients with atrial fibrillation (AF) who are treated with warfarin.

This was a retrospective cohort study of adult patients (≥18 years) between 2007 and 2016 using linked primary and secondary care data (Clinical Practice Research Datalink GOLD and Hospital Episode Statistics). Various ML techniques were explored to predict suboptimal anticoagulation control, defined as time in therapeutic range (TTR) < 70% based on International Normalised Ratio (INR) 2.0–3.0. Baseline (linear and non-linear support vector machines; random forests; stochastic gradient boosting [XGBoost]; neural networks [NN]) and time-varying data (6-week intervals up to 30 weeks (long-short term memory [LSTM] NN)) were applied. Patient records depicting unique lines of warfarin therapy (LOT) were separated into training (70%) and holdout sets (30%) for model training and testing, respectively.

35,479 patients were eligible for inclusion, of whom 24,684 and 10,795 were assigned to the training (32,683 unique LOTs) and holdout sets (14,218 unique LOTs). Across all models, depression (diagnosis and/or prescription of antidepressant medication) was a significant driver in predicting anticoagulation control. At baseline, XGBoost was the best-performing model (area under the curve [AUC]: 0.624) due to its ability to identify non-linear associations such as age and weight (greater probability of suboptimal control: 80 years and
ML algorithms displayed clinically useful ability to predict patients who are at greater risk of suboptimal control. The addition of time-varying data to the algorithm, especially prior INR measurements, improved predictive performance. These algorithms provide improved predictive tools for identifying patients who may benefit from more frequent INR monitoring or switching to alternative therapies.