Article Text

Download PDFPDF

Original research
Minimum clinically important difference in Quantitative Lung Fibrosis score associated with all-cause mortality in idiopathic pulmonary fibrosis: subanalysis from two phase II trials of pamrevlumab
  1. Grace Hyun Kim1,
  2. Xueping Zhang2,
  3. Matthew S Brown1,
  4. Lona Poole2,
  5. Jonathan Goldin1
  1. 1University of California Los Angeles David Geffen School of Medicine, Los Angeles, California, USA
  2. 2FibroGen Inc, San Francisco, California, USA
  1. Correspondence to Dr Grace Hyun Kim; GraceKim{at}mednet.ucla.edu

Abstract

Objectives Idiopathic pulmonary fibrosis (IPF) is a progressive interstitial lung disease. Chest high-resolution CT (HRCT) is instrumental in IPF management, and the Quantitative Lung Fibrosis (QLF) score is a computer-assisted metric for quantifying lung disease using HRCT. This study aimed to assess the change in QLF score associated with a minimum clinically important difference (MCID) of IPF symptoms and physiological lung function, and also determine the MCID of QLF change associated with all-cause mortality to serve as an imaging biomarker to confirm disease progression and response to therapy.

Design and study setting We conducted post hoc analyses of prospective data from two IPF phase II studies of pamrevlumab, a fully human monoclonal antibody that binds to and inhibits connective tissue growth factor activity.

Participants Overall, 152 patients with follow-up visits after week 24.

Methods We used the anchor-based Jaeschke’s method to estimate the MCID of the QLF score that corresponded with the already established MCID of St. George’s Respiratory Questionnaire (SGRQ) and percent-predicted forced vital capacity (ppFVC). We also conducted a Cox regression analysis to establish a sensitive and robust MCID of the QLF score in predicting all-cause mortality.

Results QLF changes of 4.4% and 3.6% corresponded to the established MCID of a 5-point increase in SGRQ and a 3.4% reduction in ppFVC, respectively. QLF changes of 1% (HR=4.98, p=0.05), 2% (HR=4.04, p=0.041), 20 mL (HR=6.37, p=0.024) and 22 mL (HR=6.38, p=0.024) predicted mortality.

Conclusion A conservative metric of 2% can be used as the MCID of QLF for predicting all-cause mortality. This may be considered in IPF trials in which the degree of structural fibrosis assessed via HRCT is an endpoint. The MCID of SGRQ and FVC corresponds with a greater amount of QLF and may reflect that a greater amount of change in fibrosis is required before there is functional change.

Trial registration number NCT01262001, NCT01890265.

  • Computed tomography
  • Interstitial lung disease
  • Chest imaging
  • QUALITATIVE RESEARCH
  • Patient Reported Outcome Measures
  • Pulmonary Disease

Data availability statement

Data are available upon reasonable request. FibroGen, Inc., is committed to data sharing and to furthering medical research and patient care. Based on scientific merit, requests from qualified external researchers for anonymised patient-level and study-level clinical trial data (including redacted clinical study reports) for medicines and indications approved in the USA and Europe will be considered after the respective primary study is accepted for publication. All data provided are anonymised to respect the privacy of patients who have participated in the trial in line with applicable laws and regulations.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

STRENGTHS AND LIMITATIONS OF THIS STUDY

  • This study demonstrates the utilisation of an anchor-based approach and an early prediction of mortality in estimating a minimum clinically important difference (MCID).

  • This study estimates the MCID for extensive pulmonary fibrosis in idiopathic pulmonary fibrosis using high-resolution CT (HRCT) as an imaging biomarker based on two clinical trials, in which subjects underwent HRCT scans according to clinical protocols—reducing the potential bias compared with observational data.

  • The limitation of this study is that the MCID estimation was based on post hoc research of using the existing data from clinical trials.

Introduction

Idiopathic pulmonary fibrosis (IPF) is a rare, progressive interstitial lung disease that includes symptoms of cough, worsening of dyspnoea and progressive lung injury and scarring. Together, these symptoms limit physical activity and reduce patient health-related quality of life (HRQOL).1–3 There is no cure for IPF,3 and its prognosis is very poor. Median survival is estimated to be no more than 2–5 years after diagnosis.4 Two approved antifibrotic drugs (pirfenidone and nintedanib) significantly reduce the rate of lung-function decline in IPF.5–7 However, individual responses to treatment are variable and unpredictable, and HRQOL does not improve.6 7 Identifying individual small, detectable and clinically meaningful changes in patient-level correlation will be beneficial for both physicians and patients in making informed decisions for available antifibrotic treatments and for ongoing novel therapeutic discovery in clinical trials.

Pamrevlumab is a fully human monoclonal antibody that binds to and inhibits the activity of connective tissue growth factor.8–10 Two phase II studies, one open-label and the other placebo-controlled intravenous administration of pamrevlumab, demonstrated slowing the rate of lung-function decline, progression of lung fibrosis evident on CT, and a trend towards improved HRQOL. Adverse events were generally mild.9 10 However, a recent phase III trial of pamrevlumab for IPF (ZEPHYRUS-1) did not meet its primary endpoint of absolute change in forced vital capacity (FVC) from baseline to week 48.11 12 Its companion study (ZEPHYRUS-2) was terminated.13

IPF treatment options are limited, and improved monitoring and a sensitive metric for assessing therapeutic efficacy are needed. Radiologically detected lung fibrosis correlates with physiological lung function and symptomatic changes in IPF, and early and sensitive imaging biomarkers are needed to confirm disease progression or worsening of FVC and response to therapy as quickly as possible to optimise drug development and patient care.2 3 14–17

FVC is the most common measure for assessing treatment efficacy in IPF.18 19 HRQOL and other patient-reported outcome (PRO) measures are also important endpoints for evaluating disease progression and treatment efficacy including St. George’s Respiratory Questionnaire (SGRQ).3 20 Use of chest high-resolution CT (HRCT) is expanding and is instrumental in the diagnosis and management of IPF.1 Computer-assisted methods for quantifying lung disease on HRCT calculate textural features derived from image data and classify different patterns of interstitial lung diseases based on machine learning algorithms.21–23 Computational quantitative scoring systems that analyse HRCT images have been used as imaging biomarkers in IPF clinical trials to assess the degree and progression of structural lung fibrosis. Of these, the Quantitative Lung Fibrosis (QLF) score has demonstrated high reproducibility22–24 (figure 1). QLF is associated with more prospective validation than other quantitative CT techniques and has been used in recent clinical trials of IPF.21 25 In two phase II IPF trials of pamrevlumab, significant correlations were observed between QLF changes and the changes in percent-predicted FVC (ppFVC) (ranging from −0.51 to −0.64), as well as with changes in PRO, SGRQ (ranging from 0.27 to 0.30).9 10 26 Furthermore, QLF changes of <2% were associated with better long-term survival than changes ≥2% for patients with interstitial lung disease in scleroderma.27

Figure 1

Use of HRCT to calculate QLF. QLF is a specific method (UCLA patent) that uses image normalisation (denoising) to minimise cross-site variability within images, resulting in decomposed CT images prior to texture calculation. HRCT, high-resolution CT; LF, lung fibrosis; QLF, Quantitative Lung Fibrosis; UCLA, University of California at Los Angeles.

The minimum clinically important difference (MCID) is an important standard for determining meaningful changes related to a clinical intervention or measurement tool28 and represents the smallest detectable and beneficial change.29 Both distribution-based methods (using variations from repeated measures) and anchor-based approaches (relying on established MCIDs from other relevant clinical variables) are used to determine MCIDs.30 The MCID of QLF changes in an IPF cohort has not been evaluated. For clinically meaningful validation, an MCID of the QLF threshold should provide a tool for both identifying an effective treatment and detecting a difference in mortality over time. This is especially important in IPF, which is a progressive disease with a considerably shorter median survival than other chronic lung diseases.31

Our aim is to assess the change in QLF score associated with MCID of a PRO measure, SGRQ, and a key measure in lung function physiology, FVC, using the anchor-based Jaeschke’s method, and to determine the MCID of QLF change based on its association with all-cause mortality through a post hoc analysis of prospective data from two phase II studies of pamrevlumab, exploring the potential of QLF as an imaging marker to confirm disease progression and response to therapy.

Methods

Patients

This was a secondary analysis of Study 0499 and the phase II PRAISE study.10 The two study populations were pooled to include a total of 190 patients with IPF. Eligibility criteria for the two studies were similar.9 10 Study 049 (NCT01262001), conducted between March 2011 and December 2012 at 18 centres in the USA, was a single-arm, open-label study.9 Pamrevlumab was administered every 3 weeks for 45 weeks: cohort 1 received 15 mg/kg and cohort 2 received 30 mg/kg.9 PRAISE (Study 067 (NCT01890265)), conducted between August 2013 and July 2017 at 39 centres throughout North America, Australia, Africa and Europe, was a double-blind, placebo-controlled study.10 Patients were randomised to receive placebo or pamrevlumab 30 mg/kg every 3 weeks for 45 weeks.10

Local ethics committees/institutional review boards (ECs/IRBs) approved the protocol for each site, and all patients provided written informed consent before enrolment (Study 049: Aspire IRB00004587; Study 067 (PRAISE): Quorum (now Advarra) 00023875; both studies WIRB (now WCG IRB) IRB00000533).

Of the 190 patients, 155 had follow-up visits after week 24 and data from week 48 visits, including the primary outcome measure of FVC. For both studies, pulmonary function tests, including spirometry, were performed at baseline and every 12 weeks thereafter, and HRCT was performed at baseline and every 24 weeks. SGRQ was completed at baseline and weeks 24 and 48. Mortality data were collected for the lengths of the respective studies.

Patient and public involvement

No patients or members of the public were involved in the design, conduct, reporting or dissemination plans of this study.

Outcomes

QLF scores were estimated from standardised non-contrast thin-section volumetric HRCT scans using an established radiomic texture-based quantification algorithm. QLF uses image normalisation (denoising) to minimise cross-site variability within images prior to texture calculation.22 QLF was measured as extent (%) and volume (mL). Online supplemental figure 1 provides an example of QLF extent (%) and volume (mL) on HRCT and overlaid images for a patient with IPF. QLF measures the amount of reticulation with architectural distortion in the lung. Scores range from 0% to 100% for extent of fibrosis and from 0 mL to total lung capacity for volume of fibrosis. Greater scores represent increased fibrosis.10 21 For this analysis, we considered 24-week or 48-week changes in QLF in the whole lung, which were calculated from the QLF scores of baseline HRCT.

Estimation of an MCID

We used the anchor-based Jaeschke’s method with predefined criteria for establishing the MCID of QLF that corresponded with the established MCIDs of SGRQ and ppFVC. We used a landmark Cox proportional hazards regression analysis using all-cause mortality as an anchor by applying several thresholds of 24-week QLF changes. Patients did not have follow-up visits if they died or received a lung transplant.

The SGRQ is a self-administered questionnaire that assesses HRQOL in respiratory diseases. SGRQ total score ranges from 0 to 100, and greater scores indicate deterioration in HRQOL, and in this study, SGRQ was used to represent severity of symptoms. The MCID of the SGRQ was assumed to be ±5 points.32 Changes in FVC are often used as primary endpoints in trials of respiratory diseases. The ppFVC is an estimate of lung function, with greater percentages indicating better function. The MCID of ppFVC was assumed to be ±3.4%.33 In this study, SGRQ was used to represent the severity of symptoms and lung function, respectively.

Changes in longitudinal QLF scores were initially correlated with established MCID changes in SGRQ and ppFVC. The anchor-based Jaeschke’s method was used to estimate the MCID of QLF scores from these changes in SGRQ and ppFVC from baseline at weeks 24 and 48. Jaeschke’s method describes the mean change in the measurement of interest for patients who experience a change in an anchor.31 Multiple anchors were chosen to obtain robust, unbiased estimates of the MCID.34 Another anchor-based Cox proportional hazards regression was used for all-cause mortality, in which duration of survival or time to death was used as an anchor. A preliminary threshold was derived from a previous reproducibility study35 and 6-month change observed in a clinical trial.21 The reproducibility coefficient of QLF score was estimated to be approximately 0.4% (≈2.77×0.14=Embedded Image),35 36 and the mean of 6-month change was 0.98% for extent QLF and 21.7 mL for volume QLF from a nintedanib arm.21 Thresholds increased incrementally as extent changes of 1%, 2%, 3% and 4% and volume changes of 20, 22, 24 and 26 mL.21 35 Covariates of age and ppFVC at baseline were adjusted in the regression analysis. (Of note, the covariates of sex and percent-predicted diffusing capacity of lungs for carbon monoxide (ppDLCO) were not used in Cox regression due to the imbalanced distribution of sex in the multiple thresholds and the collinearity among ppDLCO, ppFVC and QLF.) Continuous scale and multiple thresholds of QLF scores were compared with test differences in mortality risk. In addition, the MCID from each anchor (SGRQ and ppFVC) was tested in a Cox regression model as a threshold.

Summary statistics are reported for demographics and clinical variables. Continuous variables are reported as mean and SD, and categorical variables are reported as frequencies and percentages.

Results

There were no notable differences in demographics or baseline characteristics between the cohorts (table 1, online supplemental table 1). The median (±IQR) length of the follow-up period was 337 (±504) days. Because changes in QLF outcomes for all-cause mortality were derived from week 24 data and the screening HRCT scan, the median observed survival was relatively short. In total, 185 available screening HRCT scans, 33 patients had no available survival analyses after week 24 because they discontinued prior to week 24 visits (n=19), or they died prior to week 24 (n=13) or did not undergo scan (n=1) (online supplemental figure 2).

Table 1

Demographics and baseline clinical characteristics

Anchor-based analyses assessed the relationship between QLF and the established MCIDs for SGRQ, ppFVC (table 2) or both (online supplemental table 2). Thresholds of QLF changes (extent and volume) at week 24 were 4.4% and 91 mL for symptomatic worsening, respectively, when applying Jaeschke’s method using SGRQ as an anchor, and 3.6% and 65 mL for worsening lung function, respectively, when using ppFVC as an anchor. For improved condition of symptom by SGRQ or lung function by ppFVC, the thresholds of QLF changes were ≤0.5% and 10 mL and –2.0% and –56 mL, respectively. At week 48 for worsened condition, the thresholds of QLF changes were 2.3% and 81 mL by SGRQ, and 4.3% and 108 mL by ppFVC, respectively, and for improved condition, the QLF changes were –2.8% and –54 mL by SGRQ, and –2.1% and –65 mL by ppFVC, respectively.

Table 2

Relationship of QLF extent (%) and volume (mL) with the anchor-based MCID of SGRQ and FVC using Jaeschke’s method

Agreement between the two components of symptoms and lung function and their corresponding changes in QLF score are reported in online supplemental table 2. Similar percentages of patients experienced concordance in changes at weeks 24 and 48. If changes in SGRQ and FVC were both considered worsened, the mean (SD) changes in extent of QLF were 8.1% (8.27) at week 24 and 7.9% (7.37) at week 48. If changes were both considered stable, the mean (SD) changes in QLF were 0.5% (3.75) at week 24 and 2.5% (4.31) at week 48. If changes were both considered improved, the mean (SD) changes in QLF were –1.6% (3.71) at week 24 and –3.4% (6.50) at week 48. Overall, the concordant changes for worsening, stability and improvement followed expected directional changes in QLF scores.

IPF is a complex disease that includes dynamic changes in lung symptoms, function and structure over time. These changes do not always progress at the same rate, and most patients experienced discordance in symptoms and lung function. The discordant changes when one parameter worsened were associated with the mean QLF changes of approximately 2%; discordant changes with one parameter improved were associated with QLF changes of approximately within ±1% (online supplemental table 2).

The relationship of QLF changes to each of the MCIDs of SGRQ and ppFVC is provided in online supplemental table 3. Unadjusted analyses demonstrated that mean changes in MCIDs were greater (eg, 4.88%, 95% CI (3.46%, 6.30%) vs 4.39%, 95% CI (2.97%, 5.81%) when applying SGRQ as an anchor at week 24) compared with the stable group (eg, 0.49%, 95% CI (−0.31%, 2.32%) vs 0.00%, 95% CI (−1.15%, 1.15%) when applying SGRQ as an anchor) and trended in the correct direction. The effect sizes of QLF changes (both extent and volume) were approximately 1 or slightly greater for patient groups with worsened SGRQ scores and worsened ppFVC (ie, 1.08 for extent and 1.03 for volume when applying SGRQ as an anchor, 1.02 for extent and 0.93 for volume when applying ppFVC as an anchor, respectively), indicating a strong relationship between QLF and the anchor parameter. The effect sizes for worsened conditions were greater than those observed for patients with not-worsened conditions (ie, 0.93, 0.33, –0.17 for worsening, stable and better in the effect size of QLF volume change, respectively, when applying ppFVC as an anchor).

A Cox proportional hazards regression model of QLF changes is presented in table 3. Changes ranging from 1% to 4% and from 20 to 26 mL were associated with statistically significant differences in all-cause mortality. A minimum threshold of a 1% change in QLF at week 24 was associated with an increased risk of death. A twofold to fivefold increased risk of death was observed for patients with sizeable changes in QLF (ie, changes greater than the thresholds established by the single-variable anchor-based analyses). Online supplemental figure 3A,B presents patients with IPF with QLF changes of 1% (25 mL) and 2% (46 mL), respectively, at week 24. We used an MCID derived from the increase in SGRQ and the decrease in ppFVC, which indicates worsening in IPF, from table 2. After applying these thresholds, estimated MCID from QLF changes at week 24, using all-cause mortality, Cox proportional hazards model revealed a fourfold to ninefold increased risk of death for patients with sizeable changes in QLF at week 24 (table 4). Significant differences were observed from 1% to 4% changes (20–26 mL for QLF volume) (table 3), whereas 4.4% or 3.6% changes (or 91 or 65 mL) for volume changes were derived from the changes of QLF corresponding to the anchors of SGRQ and ppFVC, respectively (table 4).

Table 3

Cox proportional hazards model using all-cause mortality as an anchor with QLF with cut-off of week 24 changes

Table 4

Anchor-based Cox proportional hazards model with an MCID derived from SGRQ and ppFVC as anchors

Discussion

This study established the MCID for change in QLF score in the setting of IPF as it relates to all-cause mortality. A minimum threshold of change in QLF of 1% or 20 mL at week 24 was associated with an increased risk of death for patients with IPF. To include changes observed with SGRQ and FVC, a conservative estimate of 2% can be adopted as the MCID of QLF, based on the week 24 mean QLF changes when patients experienced worsening of either SGRQ or ppFVC. An increased risk of death was also associated with sizeable QLF changes using Jaeschke’s method with anchors of IPF symptoms (SGRQ) and lung function (ppFVC) to determine MCID.

Changes in QLF were consistent with changes in SGRQ and ppFVC, and mean QLF changes coincided with both symptom and lung-function changes. Changes in QLF and SGRQ were positively correlated, and changes in QLF and ppFVC were inversely correlated.9 10 26 This indicates that, generally, a responder in IPF clinical trial was associated with a reduction in QLF, a reduction in SGRQ and an increase in ppFVC. The threshold of QLF change for improved symptoms was close to zero (ie, 0.5%) at week 24 compared with the reduction (−2.0%) in MCID for improved lung function, but at week 48, further reduced in both symptomatic changes (ie, −2.8%) and functional changes (−2.1%) (see table 2 for the details). In contrast, the threshold of QLF change for worsened symptoms had a greater magnitude at week 24 (ie, 4.4%) than the threshold for worsened lung function (ie, 3.6%), but at week 48, functional changes (ie, 4.3%) were greater than symptomatic changes (ie, 2.3%) (see table 2 for the details). This suggests that symptomatic changes (measured by SGRQ) improve slower or more inconsistently than functional changes (FVC) for improvement but were sensitive in worsening faster. This is likely a result, in part, of the effects of limited symptom recall and the subjective nature of HRQOL.

Quantitative HRCT tools for measuring pulmonary fibrosis are critical in therapeutic development in ILD to confirm efficacy or evaluate the safety of an experimental drug.18 21 25 Assessments of QLF change have mostly served as secondary or exploratory quantitative imaging outcomes to estimate changes in lung fibrosis in clinical trials.11 13 25 QLF score based on HRCT images is traceable and can visualise regions of fibrosis. The incremental changes in QLF extent (%) or volume (mL) highlight the structural worsening in IPF that is associated with decreased FVC.10 The incremental worsening in ppFVC from week 24 and week 48 confirms that FVC is a reasonably reliable assessment in IPF and supports its use as the primary efficacy endpoint in clinical trials.19 A role of quantitative tools in the future can be expanded in patient care using a digital AI platform, when a trial is approved with a positive outcome from a primary endpoint or a secondary endpoint of an imaging outcome.

In this post hoc analysis of prospective IPF clinical trial data, an anchor-based method was used to estimate the threshold of QLF change associated with established MCIDs of SGRQ and ppFVC, and a sensitivity-based method was used to establish the MCID of QLF from all-cause mortality. Sensitivity-based methods for estimating MCID ideally require a baseline variable or characteristic to reliably quantify disease severity. For HRCT-based QLF, this requires repeat HRCT scans in a coffee break type (ie, approximately 15 min)37 38 of experiment for patients with IPF, but this method poses ethical challenges because of the unnecessary risk of radiation exposure. The variability can be estimated statistically,25 but we recognise this as a limitation of the analysis. Because repeated HRCT scans were not usually available, an anchor-based approach, which relies on the variability of anchored measurements, was selected in this study. Anchor-based methods require measurements of longitudinal change for the tool of interest and other anchor measurements that already have an established MCID. The estimates of MCIDs from other anchored measurements are likely overestimated because of the nature of additive variabilities. MCID estimates are approximations, and the recommended approach is to use multiple anchors to define a range of MCID estimates.39 Both function and PROs are relevant when interpreting the outcomes of clinical trials.

The MCIDs of the relationships between QLF and SGRQ (PRO) or FVC (lung function) in this study were less sensitive than the MCID of QLF using all-cause mortality as an anchor. The degree of QLF change needed to attain a meaningful, absolute change was 0.5–4.4% for SGRQ as an anchor, 2.0–4.3% for FVC as an anchor and −1.6%–8.1% for concordance of SGRQ and FVC in non-stable change. This suggests that, for a meaningful change in PROs and function derived from Jaeschke’s method, a greater amount of QLF change is needed in the evaluation of QLF with SGRQ or FVC than when using all-cause mortality as an anchor with multiple QLF thresholds. This reflects the progressive nature of IPF, which contrasts with other chronic lung diseases, and the fact that the observations from the follow-up visits are based on stable or worsening disease. This is similar to findings by Kon et al, who reported that the MCID of an assessment tool for chronic obstructive pulmonary disease from a receiver operating characteristic analysis was smaller than the MCID from an anchor-based approach.40 Further, multivariable regression modelling, such as the anchor-based Jaeschke’s method, that combine both clinical and subjective (eg, PRO) parameters to quantify changes in the outcome of interest (eg, QLF) offer less-biased estimates of MCIDs than distribution-based methods, which only assess the statistical significance of a change.41

The QLF changes of 1% and 2% at week 24 presented in online supplemental figure 3A,B, respectively, correspond to 25 and 46 mL changes. The structural changes are visualised both in the right side (online supplemental figure 3A) and left side of the lung (online supplemental figure 3B). These QLF changes are similar to the absolute mean changes of 2% and 56 mL (table 2) when FVC was used as an anchor in our analysis. When retesting the MCID of the QLF score derived from the MCID using SGRQ and ppFVC as anchors, the HR ranged from 4.33 to 8.89, which is similar to the HRs of 4.98 and 4.04 for the relatively smaller MCIDs of 1% and 2%, respectively. The QLF scores associated with the MCIDs of SGRQ and ppFVC were greater than the QLF score associated with survival, which suggests that these biomarkers require greater structural disease progression before they can detect meaningful change. This also suggests that QLF may be more sensitive than either ppFVC or SGRQ as a trial endpoint.42

There were 11 deaths in this study that occurred after week 24. Because QLF changes were derived from week 24 and the screening HRCT scans, the median survivals were relatively short. Week 48 data were omitted from the survival analysis because most changes in QLF were observed within 24 weeks. In addition, including data beyond week 24 had the potential to skew the results by including patients with mild to moderate IPF who were more likely to remain alive at 1 year. QLF change as a volume is a suitable clinical trial endpoint, as noted by the high HR in table 4. Change in QLF extent can provide a normalised measurement regardless of the volume differences between patients of different sex or height. Finally, the MCID of QLF is an early biomarker of change in lung fibrosis, so 48 weeks of data, which is often used for the primary endpoints in IPF clinical trials, are not needed to determine the value of the MCID of the QLF score and its clinical applicability for predicting early change.

The multiple thresholds of QLF changes can be found in other studies. In an independent cohort of patients who received nintedanib (n=42), an antifibrotic approved drug, mean absolute changes in QLF were 0.98% and a 21.7 mL increase at month 6, and 1.4% and a 27.6 mL increase at month 12. In the placebo arm of the same trial, the changes were 1.33% and 37.3 mL at month 6 and 2.2% and 67.0 mL at month 12. Negative correlations were observed between change in QLF score and change in FVC at month 6, supporting the findings of the QLF score.21 In a retrospective analysis of approximately 200 patients with IPF, a 4% change in QLF score for the most severe lobe and for the whole lung at 6 months was associated with a threefold to fivefold increased risk of clinical progression.43 Further, a placebo-controlled phase II trial of 137 patients with IPF reported significant correlations between QLF and ppFVC changes, as well as other symptoms of IPF, where most subjects in the placebo arm were within ±2% changes at week 24.25 Overall, QLF measured on HRCT, where the most of 6-month mean changes range from 1% to 4%, has proven to be useful as an efficacy endpoint in clinical trial settings.

This study has several limitations. First, it was a post hoc rather than an a priori analysis of data from two phase II clinical trials. Thus, due to the nature of phase II studies, mortality was based on a short follow-up period. In addition, the allocation of treatment arms and study locations was different between the studies. Specifically, Study 049 was a single-arm study, whereas Study 067 was a randomised study with one-to-one allocation of placebo and pamrevlumab. Further, Study 049 took place only in the USA, and Study 067 involved patients worldwide. Subcohorts of patients who received pamrevlumab or other treatments were not analysed separately here for purposes of simplification. Additionally, this study analysed the usefulness of QLF change for predicting mortality risk only over a short period of time. Second, we used MCID derived from a distribution-based approach for ppFVC and the symmetric changes in ±5 points SGRQ, which is close to 4.9 changes reported by Prior et al for deterioration, where most IPF subjects feel worsening or stable in their symptoms. We used ±5 where the subject-level change of SGRQ is an integer change, and MCID of SGRQ is considered to be around 4–5 points.32 44 45 Third, we did not estimate MCID using a distribution-based approach because the extra radiation exposure required for patients to estimate the MCID was not well-justified. Fourth, a single quantitative HRCT score for IPF was applied. The estimated MCID may not be generalisable to other available quantitative scores. Fifth, caution should be applied when applying the estimated MCID for the observational or registry studies, in which HRCT scans are not performed routinely. In this study, HRCT scans were scheduled and performed as part of clinical trials. Lastly, phase III studies did not show the efficacy of pamrevlumab.11–13 This study used survival as the primary endpoint to assess MCID.

We believe our analyses begin the evidence-generation process of using multiple thresholds for validation of a biomarker.46 The MCID of QLF in IPF has demonstrated clinical validity. The estimated MCID of 2% may be considered for associating changes in mortality, lung function and patient symptoms in ongoing and future trials of IPF, where the metric can be normalised to the volume of QLF changes for both sexes. The greater MCID of QLF using the MCID of SGRQ and ppFVC may suggest that structural changes precede functional changes. The change of QLF volume is a sensitive measurement that can be considered in applying an imaging outcome as a potential efficacy endpoint when the extent of structural fibrosis is assessed via HRCT.

Data availability statement

Data are available upon reasonable request. FibroGen, Inc., is committed to data sharing and to furthering medical research and patient care. Based on scientific merit, requests from qualified external researchers for anonymised patient-level and study-level clinical trial data (including redacted clinical study reports) for medicines and indications approved in the USA and Europe will be considered after the respective primary study is accepted for publication. All data provided are anonymised to respect the privacy of patients who have participated in the trial in line with applicable laws and regulations.

Ethics statements

Patient consent for publication

Ethics approval

For both of the studies referenced in the manuscript, FGCL-3019-049 and FGCL-3019-067. The design of two trials were published previously: (1) FGCL-3019-049 was conducted at 18 sites in United States from March 2011 to June 2017 (NCT01262001). (2) (PRAISE, FGCL-3019-067 was conducted at 39 centres throughout North America, Australia, Africa, and Europe, from March 2011 to June 2017 (NCT01890265). Local ethics committees/institutional review boards (ECs/IRBs) approved the protocol for each site, and all patients provided written informed consent before enrolment (Study 049: Aspire IRB00004587; Study 067 (PRAISE): Quorum (now Advarra) 00023875; both studies WIRB (now WCG IRB) IRB00000533). Safety results were reviewed by a Data and Safety Monitoring Board (DSMB). Participants gave informed consent to participate in the study before taking part.

Acknowledgments

The authors would like to express their gratitude to all the patients and their families for participating in the study. The authors thank Dr Mark D. Eisner, former chief medical officer of FibroGen, Inc. (San Francisco, California, USA), for his critical review of the manuscript. The authors considered his comments based on merit. Medical writing support was provided by Jennifer L. Gibson, PharmD, of Kay Square Scientific (Newtown Square, Pennsylvania, USA) and Michael Nissen, ELS, formerly of FibroGen, Inc. (San Francisco, California, USA). This support was funded by FibroGen, Inc.

References

Footnotes

  • Contributors GK, MSB and JG contributed the concept and design of the work. LP contributed the acquisition of the work. XZ contributed the analysis of the data. All authors contributed to the interpretation of the data for the work the drafting the work or review critically for the important intellectual content and development of the manuscript and reviewed and approved the final manuscript for submission. And all the authors agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. GK is responsible for the overall content as guarantor.

  • Funding Funding for this analysis was provided by FibroGen, Inc. (San Francisco, California, USA). The sponsor also contributed to data collection, data analysis, data interpretation and manuscript development. The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests LP was a former employee of FibroGen, Inc., at the time of the study, but is now an employee of and holds stock options in Pliant Therapeutics, Inc. XZ was a former employee of FibroGen, Inc., at the time of the study, but is now an employee of and holds stock options in Neumora Therapeutics, Inc. JG and MSB are former founders of MedQIA and board members of Voiant. GK is a former consultant to MedQIA and current consultant to Voiant. GK, MSB and JG are patent developers of the issued patent: UC-2013-078-2-LA-EP.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.