Article Text

Protocol
Artificial intelligence (AI) for paediatric fracture detection: a multireader multicase (MRMC) study protocol
  1. Susan C Shelmerdine1,2,3,
  2. Cato Pauling2,
  3. Emma Allan1,
  4. Dean Langan2,4,
  5. Emily Ashworth1,
  6. Ka-Wai Yung5,
  7. Joy Barber6,
  8. Saira Haque7,
  9. David Rosewarne8,
  10. Nick Woznitza9,10,
  11. Sarim Ather11,
  12. Alex Novak12,
  13. Kanthan Theivendran13,
  14. Owen J Arthurs1,2,3
  1. 1Clinical Radiology, Great Ormond Street Hospital for Children, London, UK
  2. 2UCL Great Ormond Street Institute of Child Health, London, UK
  3. 3Great Ormond Street Hospital NIHR Biomedical Research Centre, London, UK
  4. 4Centre of Applied Statistics Courses, University College London, London, UK
  5. 5Wellcome/ EPSRC Centre for Interventional and Surgical Sciences, London, UK
  6. 6Clinical Radiology, St George's Healthcare NHS Trust, London, UK
  7. 7Clinical Radiology, Kings College Hospital NHS Foundation Trust, London, UK
  8. 8Clinical Radiology, Royal Wolverhampton Hospitals NHS Trust, Wolverhampton, UK
  9. 9School of Allied Health Professions, Faculty of Medicine, Health and Social Care, Canterbury Christ Church University, Canterbury, UK
  10. 10Clinical Radiology, University College London Hospitals NHS Foundation Trust, London, UK
  11. 11Oxford University Hospitals NHS Foundation Trust, Oxford, UK
  12. 12Emergency Medicine Research Oxford, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
  13. 13Orthopaedic Surgery, Sandwell and West Birmingham Hospitals NHS Trust, Birmingham, UK
  1. Correspondence to Dr Susan C Shelmerdine; susan.shelmerdine{at}gosh.nhs.uk

Abstract

Introduction Paediatric fractures are common but can be easily missed on radiography leading to potentially serious implications including long-term pain, disability and missed opportunities for safeguarding in cases of inflicted injury. Artificial intelligence (AI) tools to assist fracture detection in adult patients exist, although their efficacy in children is less well known. This study aims to evaluate whether a commercially available AI tool (certified for paediatric use) improves healthcare professionals (HCPs) detection of fractures, and how this may impact patient care in a retrospective simulated study design.

Methods and analysis Using a multicentric dataset of 500 paediatric radiographs across four body parts, the diagnostic performance of HCPs will be evaluated across two stages—first without, followed by with the assistance of an AI tool (BoneView, Gleamer) after an interval 4-week washout period. The dataset will contain a mixture of normal and abnormal cases. HCPs will be recruited across radiology, orthopaedics and emergency medicine. We will aim for 40 readers, with ~14 in each subspecialty, half being experienced consultants. For each radiograph HCPs will evaluate presence of a fracture, their confidence level and a suitable simulated management plan. Diagnostic accuracy will be judged against a consensus interpretation by an expert panel of two paediatric radiologists (ground truth). Multilevel logistic modelling techniques will analyse and report diagnostic accuracy outcome measures for fracture detection. Descriptive statistics will evaluate changes in simulated patient management.

Ethics and dissemination This study was granted approval by National Health Service Health Research Authority and Health and Care Research Wales (REC Reference: 22/PR/0334). IRAS Project ID is 274 278. Funding has been provided by the National Institute for Heath and Care Research (NIHR) (Grant ID: NIHR-301322). Findings from this study will be disseminated through peer-reviewed publications, conferences and non-peer-reviewed media and social media outlets.

Trial registration number ISRCTN12921105.

  • Diagnostic Imaging
  • Paediatric orthopaedics
  • ACCIDENT & EMERGENCY MEDICINE
https://creativecommons.org/licenses/by/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/licenses/by/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

STRENGTHS AND LIMITATIONS OF THIS STUDY

  • Performing a large multireader study evaluating paediatric fracture detection with and without artificial intelligence (AI) assistance across different medical subspecialties and experience levels will better evaluate which healthcare professionals benefit most from AI assistance.

  • Our imaging dataset represents a range of children’s ages and body parts across multiple National Health Service trusts to comprehensively evaluate performance of a commercially available AI algorithm.

  • The UK-based population used in this study, lack of patient history with predefined simulated clinical management choices may not exactly mimic real-world practices and outcomes.

  • Replicating societal and ethical biases, with a comprehensive health economic evaluation of providing AI assistance for fracture detection is difficult to achieve, but our study will provide a guide for future studies.

Introduction

Approximately, half of all the 12 million children (<16 years) in the UK will fracture a bone during childhood.1 2 Radiographic imaging is the first-line imaging tool for assessing the presence and extent of injury. Unfortunately, the interpretation of paediatric fractures is challenging for many healthcare professionals (HCPs) as children sustain different types of injury to adults, which can sometimes be subtle to identify (eg, buckle or Salter Harris fractures), exhibit a wide range of normal appearances across different ages and, sometimes normal physes can be mistaken for injuries. Furthermore, the level of experience of HCPs varies and while patients should, under best practice principles, not be discharged from hospital without a radiology report, in reality, this is not usually available.

In one study, researchers found that misdiagnoses were made in 10% of children’s fractures by emergency doctors.3–6 Unfortunately, due to workforce pressures and staff shortages,7 doctors with subspecialist skills in imaging or musculoskeletal injuries are not readily available 24/7 in a busy emergency department. This leads to potential delays in recognising mistakes,8–13 long-term pain and discomfort for the child and, in some situations, missed opportunities for safeguarding (as fractures can be the first sign of inflicted injury).14

Recently, many artificial intelligence (AI) tools have been developed and demonstrated high diagnostic accuracy rates for the detection of fractures on imaging, in some cases to the same or higher accuracy as a radiologist.15 16 Many of these tools, however, have been specifically designed for adults, although encouraging results have been demonstrated when these tools have been applied to children.17 18 Within the last year, one AI tool has specifically been approved by the US Food and Drug Administration (FDA) for use in children over the age of 2 years. If implemented clinically, it could potentially improve the quality of paediatric care, streamline orthopedic clinic referrals and reduce the likelihood of medical litigation.19

Nonetheless, widespread adoption of AI within the National Health Service (NHS) is still nascent, with various barriers to adoption identified, of which lack of sufficient evidence is one major concern.20 21 There is, therefore, a vital and crucial need to evaluate how such a tool may help (or hinder) different members of staff in this clinical care pathway and whether the use of such a tool would make any difference to patient management.

Objectives

In this study, the aim is to evaluate whether using a commercially available AI tool certified for paediatric use could help HCPs make better decisions about patient care.

Primary objective:

  • Determine differences in diagnostic accuracy rates of HCPs for paediatric fracture detection, before and after using the AI tool.

Secondary objectives:

  • Determine whether differences in accuracy rates or effect of adding AI varies according to job role and experience.

  • Determine whether user confidence in fracture diagnosis changes when using the AI tool.

  • Determine whether management plans are altered following the use of the AI tool.

Methods and analysis

Study design

The overall study design is of a cross-over multireader multicase (MRMC) study, where each reader (a HCP) will act as their own control across two imaging interpretation stages, with an interval washout period of at least 4 weeks duration (figure 1). Differences in diagnostic accuracy rates between stages, changes in reader’s confidence rates and simulated patient management will be compared. Subgroup analyses will be conducted according to reader specialty area, experience and body part interpreted.

Figure 1

A diagrammatic flow chart of the multireader multicase (MRMC) study outline. AI, artificial intelligence.

At the first interpretation stage (1 September 2024–31 October 2024), each reader will review a dataset of 500 paediatric limb radiographs (some normal and some abnormal) without AI assistance, then after the washout period (1 November 2024–30 November 2024), they will proceed with the second interpretation stage (1 December 2024–31 January 2025) where they will each read the same dataset with AI assistance. The order of the radiographs, and thus imaged body parts and those with and without abnormalities, will be randomly ordered within the dataset for every reader at each interpretation stage to further reduce recall bias.

Readers will complete the imaging interpretations online, via a password-protected and secure imaging platform (details below). They will be given a 2-month period to complete the exercise and are informed not to seek help in the imaging interpretation with anyone else. Clinical information associated with each radiograph will include the age of patient and gender. Mechanism of injury, pain location, history will not be provided. The reader will be asked to assume that there is generalised pain at the joint in question, and no significant medical history (ie, no genetic or metabolic bone disorder or known malignancy).

Feedback requested from each reader for each radiograph will include:

  1. Marking the site of a fracture on each image (or selecting ‘no fracture’).

  2. Confidence in their decision using a 5-point Likert scale (1=not confident; 5=absolutely certain).

  3. Selecting the most likely management for the patient. This will be done by providing each reader with a drop-down menu of seven predefined plans, tailored to each subspecialty, with the option for free-text comment. Examples of the different options are provided in figure 2.

Figure 2

Simulated patient management options provided to different subspecialty readers recruited in this study. For every case reviewed, the reader will be asked to select the single best treatment plan. They will be presented with slightly different options depending on their area of medical expertise. ED, emergency department.

Inclusion and exclusion criteria (imaging cases)

500 anonymised paediatric radiographic examinations (‘cases’) will be derived from a larger 5-year retrospective dataset of appendicular radiographs acquired in children from two NHS trusts (King’s College Hospital NHS Foundation Trust and St George’s University Hospitals NHS Foundation Trust). Both trusts are major trauma centres located in London, UK serving adult and paediatric cases. All radiographic imaging was acquired as part of routine clinical care. No change in patient management will occur as a result of this study, nor was any change in the usual imaging protocol required for this retrospective data collection.

We will include a mixture of normal and abnormal radiographic examinations according to the minimum ratio determined by our sample size calculation below. Four body parts will be used: wrist, elbow, knee and ankle. These were chosen because limbs account for 81.5% of all paediatric injuries,22 with those occurring at the knee, wrist and elbow being most commonly missed.23 Wrist and elbow fractures encompass 20.5% of all paediatric fractures and missed fractures in these areas are one of the most common reasons for litigation in children’s orthopaedic care.24 Although ankle fractures are less common, they account for up to 25% of all growth plate injuries25 and thus carry a high potential for long-term growth disorders if misdiagnosed.

Our inclusion criteria for all radiographic imaging in the subset of 500 cases include:

  • Children aged between 2 and 18 years old (as the intended commercial AI tool is not regulated for children under 2 years of age).

In order to effectively evaluate how well the AI tool could help HCPs identify easily missed fractures, we will include abnormal radiographs that do not contain ‘obvious’ fractures. Obvious fractures will be defined as any fracture that meets at least one of the following criteria and therefore excluded:

  • Any imaging with a ‘Red Dot’ annotation on the radiograph that cannot be removed (denoting the presence of either a true or false abnormality identified by the radiographer).

  • Any fracture which is angulated by more than 5°.

  • Any fracture which is displaced (>5 mm) or comminuted (multifragmented).

  • Any fracture impacted/shortened by >5 mm.

  • Any fracture with obvious callus formation/sclerosis.

As we will be evaluating the performance of acute fracture detection in an emergency setting, we will exclude healing fractures. Incidental bone lesions will also be excluded from our study, however, normal anatomical variants will be included.

Study cohort characteristics (imaging cases)

Cohort demographic characteristics for whole population and abnormal dataset for this study are available in online supplemental tables 1–3, with demographic data on normal dataset in online supplemental table 4. Overall, 500 different radiographic examinations across 500 different paediatric patients (none included more than once) will be used (comprising 183 fractures in 181 patients). The dataset will consist of 256 boys (97 with fractures) and 245 girls (84 with fractures), with mean age of 10 years (range 2–17 years).

The most common fractured bone in our dataset is the distal radius (41/181, 22.4% fractures), and the three the most common fracture types are Salter Harris 2 type injury (41/183, 22.4%), buckle fracture (39/183, 21.3%) and transverse fracture (28/183, 15.3%).

We will invite readers from different specialties and experience levels to participate. These will include HCPs working in radiology (including doctors and reporting radiographers), the emergency medicine department (including doctors and senior triage nurses) and orthopaedic surgeons. All experience levels will be welcome to participate, including those with a subspecialty interest within their field (eg, paediatric orthopaedic surgery, paediatric radiology). Readers will, however, be excluded if they do not routinely review paediatric radiographs as part of their expected job role.

Radiologists and reporting radiographers will be recruited voluntarily through society newsletter announcements via the European Society of Paediatric Radiology (ESPR), European Society of Skeletal Radiology (ESSR) and the British Societies of Paediatric Radiology and Skeletal Radiology (BSPR, BSSR) as well as Society of Radiographers (SoR). Emergency medical staff will be recruited through existing local collaborations, via the Royal College of Emergency Medicine (RCEM) and Association of Paediatric Emergency Medicine (APEM). Orthopaedic surgical colleagues will be recruited also via local collaborations, and via the British Society for Children’s Orthopaedic Surgery (BSCOS). All announcements and invitations to participate will be also posted via study collaborators through their personal social media channels.

We anticipate a minimum of 40 readers (approximately 14 readers from each specialty) equally split between trainees and consultant-level experience. Each reader will be provided instructions on how to participate in the study and complete a consent form online asking about their demographic details, job role/specialty and experience level. Prior to any interpretation, a short video and instruction sheet on how to use the online reporting platform will be provided. Reporters will be asked to replicate their usual reporting practices as far as possible (eg, use of a suitable monitor, dim lighting) and would be able to use reference tools (eg, textbooks or websites) that they would normally consult for this task, but just not to consult other reporters.

Intervention (AI tool)

Our ‘intervention’ will be the use of a commercially available AI tool called ‘BoneView’ (V.2.3.0) produced by a French AI vendor called Gleamer (Paris, France. https://www.gleamer.ai/). The tool received conformity (CE Class 2A EU MDR and FDA) approval for fracture detection in adults and children (aged >2 years old) in March 202226 on full-resolution Digital Imaging and Communications in Medicine (DICOM) images. This product was chosen as it was the first to achieve FDA approval for use in children and had the greatest evidence basis among all commercial products for fracture detection on radiographs.19 The full details of how the deep learning algorithm was developed and tested have been described in the existing literature,17 27 28 therefore, only a brief overview of how the product was developed is provided here.

The algorithm is a Deep Convolutional Neural Network-based on the object detection framework ‘Detectron 2’ written and further engineered in Pytorch (V.1.3). It was developed based on a data set of 312 602 radiographs from patients across over 60 radiology departments collected from January 2011 to May 2021. 30% of the radiographs included in the dataset were paediatric (<21 years). When the algorithm confidence surpasses that of a predefined threshold set during algorithm development, the AI tool (BoneView) will create an output of a duplicate radiograph in the imaging examination with either a region of interest on the radiograph with a white square box stating presence of a fracture, a region of interest on the radiograph with a dashed white square box stating ‘indeterminate’ fracture or no overlay with a note below the image stating no fracture. The data set of the present study does not overlap with any examinations used in the development dataset used to create the AI tool, and nor with any data in this study be used to further train the commercial AI tool.

There have been at least three prior publications evaluating the performance of this AI tool within a French, American and Swiss paediatric dataset across a similar range of body parts proposed in this protocol.17 18 29 Two of these studies have included a smaller dataset than this planned study (n=300)17 18 with an equal split of normal and abnormal cases (not reflective of clinical practice). One was a MRMC study design using eight radiologist readers.18 None of the prior studies included a simulated patient management plan component, nor a multidisciplinary team of readers as this study is planning to.

Gleamer has provided its AI tool free of charge for evaluation in this trial but has no involvement in the study design, data analysis, reader recruitment or the decision to publish the final results.

Reference (ground truth) standard

We will use a consensus interpretation by an expert panel of two paediatric radiologists, both with at least 5 years of subspecialist radiology experience as the reference standard (so-called ‘ground truth’) for this dataset. A bounding box around the entire area of bone fracture on each image (if present) will be assigned where the examination is abnormal. Reference radiologists will have access to the radiographic imaging and original imaging report when setting the ground truth bounding box, as well as any follow-up imaging available; none of this will be available to recruited readers. Disagreements will be resolved by a third musculoskeletal radiologist (with similar experience level).

Data deidentification and secure storage

Scans selected for the study will be deidentified using a software called XNAT V.3.2.4,30 an open source research platform for image-based biomedical research, before being uploaded to a secure image viewing platform for reader interpretation. Access to the platform will be controlled via separate user accounts and passwords for each recruited reader.

All study data generated by the readers’ interpretations will be entered into a password-protected and secure database. Individual reader accuracy scores will be anonymised, and the research team will not have access to the identifying link between the participants’ personal details and the data. Data about the readers’ experience level and subspecialty will be retained to allow group comparisons.

All research staff will comply with the requirements of the Data Protection Act 201831 with regard to the collection, storage, processing and disclosure of personal information and will uphold the Act’s core principles. Data will be collected and maintained according to Good Clinical Practice standards.32

Statistical methods/data analysis plan

The STARD-AI (Standards for Reporting of Diagnostic Accuracy - Artificial Intelligence) and CLAIM (Checklist for AI in Medical Imaging) guidelines will be adhered to in the reporting of this study.33 34 Diagnostic accuracy of the readers (with and without AI assistance) for each body part will be derived (ie, sensitivity, specificity, positive predictive value and negative predictive value). A true positive result will be counted if a mark made by a reader at the site of a suspected fracture falls within the area of the predefined ‘ground truth bounding box’ area set by the reference radiologists. If the mark made by a reader falls outside a bounding box, it will be counted as a false positive. Where a bounding box was set by the ground truth, but no mark made on the image then a false negative result will be assigned to the reader.

Estimates of these accuracy statistics will be derived through multilevel logistic regression models, with the reader included as a random-intercept. From these models, we will report 95% CIs, and p values (significance level set at 5%). Independent variables will be added, including the reader’s job role and experience, to assess their relationship with diagnostic accuracy. We will additionally present these same statistics to the reader, as derived through the random effects of the models, to explore the relationship between sensitivity and specificity. Intraobserver variability for the diagnostic accuracy of fracture detection before and after the use of AI assistance will also be evaluated, with subanalysis conducted to account for different reader medical specialty subgroups.

We will assess for differences in confidence scores for correctly identified images between HCPs with and without AI guidance, and also whether there were significant differences between readers across specialties and experience levels. We will also evaluate how an indeterminate AI reading affects reader decisions.

Changes in clinical management will be compared using descriptive statistics (ie, frequency and percentages) to determine, for example, how many children would be discharged with a missed fracture, or unnecessary second opinions/additional imaging sought for cases with and without AI assistance. This could provide information to help estimate potential future benefit and cost savings to the NHS at a future clinical implementation stage, where appropriate.

Sample size and power calculation

Using the sample size tables published by Obuchowski et al for ‘Receiver Operating Characteristic Studies’,35 the study has been powered to detect small differences in the AUC (area under the receiver operating curve) of 0.05, with power of 80% and type 1 error rate of 5% between reader and AI algorithm performance. Assuming that the dataset will be representative of clinical practice with at least 20% abnormal cases, our sample size would need to be at least 112 examinations for at least 10 readers, per body part.

In order to ensure better representation of different abnormal findings, we have increased the number of examinations to 125 per body part, with approximately one-third of the cases being abnormal (ie, between 44 and 46 abnormal cases per body part, see online supplemental tables 1–3).

Patient and public involvement

In designing this research protocol and in the application for the funding, the NIHR GOSH Biomedical Research Centre Patient and Public Advisory Groups for research were consulted which included ‘The Young Persons Advisory Group (YPAG)’ (comprising 24 young people, aged 11–21 years) and ‘The Parent and Carer Advisory Group (PCAG)’ (comprising 5 parent representatives).36 Many of the children were familiar with the concept of AI,37 and of these, four YPAG and three PCAG members volunteered to form the ‘FRACTURE Patient and Public Involvement & Engagement (PPIE) Steering Committee’ for this project and related works.38 Their input has confirmed to us that patients prefer to see how doctors can be helped (rather than replaced) by AI, and therefore, this study aims to understand if AI can enhance current clinical practices and the impact this could have on patient care.

Ethics and dissemination

Human research ethics committee approval

This study was granted approval by NHS Health Research Authority (HRA) and Health and Care Research Wales (HCRW) (REC Reference: 22/PR/0334). IRAS Project ID is 274 278. Informed consent was not required for the use of fully anonymised, retrospective imaging data for this study. Written consent will be received by all readers within this study prior to the interpretation exercises.

Intended publications and research dissemination

Datasets generated and/or analysed during the current study are not publicly available due to data confidentiality agreements with data custodians. Results generated by the research will be made publicly available at the summary level. Manuscripts addressing the study aims will be published in peer-reviewed journals and will also be presented at relevant national and international conferences. Findings will also be disseminated via social media and online blogs.

Study outcomes will be disseminated to all relevant clinical and non-clinical stakeholders which include our FRACTURE PPIE Steering Group, the wider Great Ormond Street Hospital YPAG and PCAG members, members of the ESPR, ESSR, BSPR, BSSR, SoR, RCEM, APEM, BSCOS, members of the NIHR Imaging Science Working Group and also the Clinical AI interest group of the Alan Turing Institute. The findings and awareness raised by the study and its dissemination will help inform future AI evaluation for paediatric healthcare, policy decisions and raise awareness of AI training needs for various multidisciplinary subspecialties and HCPs who may encounter such tools as part of their role.

Ethics statements

Patient consent for publication

Acknowledgments

We would like the acknowledge Ms Deirdre Leyden, Patient and Public Involvement /Engagement (PPIE) Lead for research at Great Ormond Street Hospital for Children NHS Foundation Trust, and the following persons, who form part of the FRACTURE (Fast Reporting using Artificial intelligence for Children’s TraUmatic Radiology Examinations) Study Patient and Public Involvement & Engagement (PPIE) Steering Committee for their help in providing feedback for this study: Lauren Lee, Laila Xu, Oceiah Annesley, Maryam Lyden, Becky Harmston, Paul Musticone and Viki Ainsworth. We also acknowledge the kind help from the PACS teams at both St. George’s Hospital NHS Foundation Trust and King’s College Hospital NHS Foundation Trust, in particular Mr Mukisa Scarinzi and Mr Gregory Stansil. Finally, we acknowledge the assistance on this protocol from Ms Jeanne Ventre and Mr Daniel Jones from Gleamer.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • X @SusieShels, @KTheivendran

  • Contributors SS conceived the idea, planned and designed the study protocol. KT, DL, AN, SA and NW contributed to the development of the protocol, study design and methods. CP, EAllan and SS acquired imaging data and checked data for quality control. DL and SS contributed to the data and statistical analysis plan. SS wrote the first draft. All authors (SS, CP, EAllan, DL, EAshworth, K-WY, JB, SH, DR, SA, AN, KT and OJA) critically revised the draft and checked the content for important intellectual content. All authors (SS, CP, EAllan, DL, EAshworth, K-WY, JB, SH, DR, SA, AN, KT and OJA) approved the final written manuscript. The guarantor responsible for the overall content of the article is SS.

  • Funding SS is funded by a National Institute for Health Research (NIHR) Advanced Fellowship Award (NIHR-301332). CP is funded by the Great Ormond Street Hospital Children’s Charity (GOSHCC) (Award Number: VS0618). K-WY is funded by grants from the Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) (Award Number: 203145Z/16/Z, NS/A000050/1). NW is funded by research grants awarded by NHSx Accelerated Access Collaborative (AAC) award, SBRI Healthcare and Health Education England (HEE) (Award number not applicable). SA is funded by an NIHR Research for Patient Benefit Grant (NIHR-204982). OJA is funded by a National Institute for Health Research (NIHR) Career Development Fellowship (NIHR-CDF-2017-10-037).

  • Disclaimer This article presents independently funded research—the views expressed are those of the author(s), and not necessarily those of the National Health Service (NHS), the Department of Health or any of the aforementioned funding bodies.

  • Competing interests NW reports the following competing interests: consultancy fees from InHealth Reporting, SM Radiology; speaker honoraria fees from AstraZeneca and travel grant award from Qure.ai Technologies. All other authors (SS, CP, EAllan, DL, EAshworth, K-WY, JB, SH, DR, SA, AN, KT and OJA) deny any competing interest statements.

  • Patient and public involvement Patients and/or the public were involved in the design, or conduct, or reporting, or dissemination plans of this research. Refer to the Methods section for further details.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.