This is an open access journal, and articles are distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License, which allows others to remix, tweak, and build upon the work non-commercially, as long as appropriate credit is given and the new creations are licensed under the identical terms.
The concept of the minimal clinically important difference (MCID) emerged from the recognition that statistical significance alone is not enough to determine the clinical relevance of treatment effects in clinical research. In many cases, statistically significant changes in outcomes may not be meaningful to patients or may not result in any tangible improvements in their health. This has led to a growing emphasis on the importance of measuring patient-reported outcome measures (PROMs) in clinical trials and other research studies, in order to capture the patient perspective on treatment effectiveness. MCID is defined as the smallest change in scores that is considered meaningful or important to patients. MCID is particularly important in fields such as neurology, where many of the outcomes of interest are subjective or based on patient-reported symptoms. This review discusses the challenges associated with interpreting outcomes of clinical trials based solely on statistical significance, highlighting the importance of considering clinical relevance and patient perception of change. There are two main approaches to estimating MCID: anchor-based and distribution-based. Anchor-based approaches compare change scores using an external anchor, while distribution-based approaches estimate MCID values based on statistical characteristics of scores within a sample. MCID is dynamic and context-specific, and there is no single ‘gold standard’ method for estimating it. A range of MCID thresholds should be defined using multiple methods for a disease under targeted intervention, rather than relying on a single absolute value. The use of MCID thresholds can be an important tool for researchers, neurophysicians and patients in evaluating the effectiveness of treatments and interventions, and in making informed decisions about care.
Keywords: Anchor-based methods, clinical relevance, distribution-based methods, minimal clinical important difference (MCID), minimal clinically important change, neurology, patient-reported outcome measures (PROMs), Rasch model
In clinical practice, we often tend to interpret outcomes of a trial based on significant differences. In the world of statistics, a significant difference is simply a difference that is unlikely to be caused by chance, a claim backed by mathematical theory. To add further, a statistically significant difference in a given variable is often determined by the sample size of a clinical trial, such that a seemingly unimportant detail may gain an apparent statistical difference.[1] Randomized Controlled Trails (RCTs) with large sample sizes can demonstrate statistically significant differences that may not be true. On the other hand, some trials may not achieve statistically significant value and may be of uncertain relevance but can be perceived as an improvement by the patient. Also, in clinical practice, a discrepancy may exist between the objective and subjective assessment of this change.[2] A statistically significant change may not be clinically relevant if the change is not perceived by the patient. Also, clinical significance is often confused with statistical significance. Various patient-reported outcomes (PROs) are utilised in the neurology research and are well established in the literature. However, the clinical interpretability of these remains a challenge.[3] The minimal clinically important difference (MCID) is a crucial component that assigns a threshold of difference with clinical relevance. In the vast field of neurology where a working knowledge of MCID is vital as less thoughtful application of an arbitrary MCID estimate will do much harm than good. Most of the frequently used PROs lack clarity on the MCID thresholds. Through this article, we aim to review the definition, methods, MCID thresholds and their limitations (if any) in various neurological conditions.
We conducted a comprehensive search of electronic databases, including PubMed, Embase and Cochrane Library, to identify relevant articles published until February 2023. We used a combination of medical subject heading (MeSH) terms and keywords related to MCID, patient-reported outcome measures (PROMs), and neurological conditions. Search terms included ‘MCID’, ‘MID’, ‘MCID’, ‘minimal important difference (MID) ’ ‘minimal clinically important change (MCIC)’, ‘clinically important change’, ‘minimal clinical important difference’, ‘clinical important difference (CID)’, ‘meaningful change’, ‘stroke’, ‘parkinsonism’, ‘dystonia”, ‘essential tremor”, ‘tardive dyskinesia”, ‘ataxia’, ‘multiple sclerosis’, ‘Neuromyelitis Optics’, ‘Neuropathy’, ‘Myopathy’, ‘Headache’, ‘Dementia’, ‘Cognitive Impairment’, ‘myelopathy’, ‘seizure’, ‘epilepsy’ and ‘Myasthenia’. We also reviewed the reference lists of relevant articles to identify additional studies. The final reference list was generated on the basis of relevance and originality with regard to the topics covered in this review.
MCID is defined as ‘the smallest change in the outcome measure that patients perceive as beneficial’.[4] Jaeschke et al.[5] first defined MCID in the year 1989 as the ‘smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in the patients’ management’. MCID and its inconsistent nomenclature include MID, minimal MCIC, CID, and meaningful change.[6] MCID values are important in interpreting the clinical relevance of observed changes, at both individual and group levels, and can be patient or clinician centred. For instance, two patients (patients A and B), were both bedridden due to a stroke. Patient A is a young adult who considers ‘being able to attend a job and do physical work’ as a clinically relevant change. Patient B is an elderly patient who considers ‘being able to walk with aid’ as a clinically relevant improvement. Although both patients are affected by a similar disease and are functioning at a similar level, both have a different interpretation of the term ‘clinical relevance’ and will have different goals of treatment. Similarly, from a patient’s viewpoint, a meaningful change in the domain of interest may be the one that reflects a reduction in symptoms or improvement in function; however, from the clinician’s viewpoint, a meaningful change may be one that indicates a change in the treatment or in the prognosis of the disease.
Initially, MCID was developed as a tool for PROs, particularly quality of life measures.[7] However, over the years, the concept of MCID has been applied to a greater diversity of measures, including physical performance.[8,9] Evaluation of MCIDs for different outcome measures is important for clinical decision-making and for study designs in calculating the sample sizes for different trials and surveys.[10] The United States Food and Drug Administration (US FDA) also recognized the need to determine MCID on measures used to support the labelling claims of medical products.[11] Although there are several methods in place to determine MCID, two approaches namely anchor-based and distribution-based approaches are most commonly used.[12] While anchor-based approach estimates MCID by comparing change scores using an external anchor, distribution-based approach estimates MCID values based on the statistical characteristics of scores within a sample. There is no ‘gold standard’ approach for determining MCID, and both methods have their merits and limitations. Calculation of MCID through different approaches is presented in Figure 1 .
Shows the different approaches used to estimate MCID
The anchor-based methods estimate MCID values by comparing the change in PRO score with some other measure of change, known as an anchor or external criterion.[12] Anchor will be used to assign subjects into groups of no change, improvement or worsening. The anchor can be objective or subjective. Some commonly used anchors are Patient global impression of change (PGIC), Clinician global impression of change (CGIC) and Global rating of change (GROC). Although any instrument can be chosen as an anchor, there often needs to be an established association (correlation coefficient ≥0.3) between the anchor and the PRO measurement to make any meaningful interference in the PRO scores. While use of an anchor is the common characteristic of this approach, many variations may be identified among the anchor-based approaches.
‘With-in patients’ score change
In this type of anchor-based approach, patients are asked to rate their change after a targeted intervention on a scale, for example, a 7-point Likert scale (1 = ‘very bad’ to 4 = ‘same as before’ to 7 = ‘very good’).[13] The next step is to determine the group of patients who scored ‘good’ or ‘very good’. Then the median or mean change of the score of the instrument in this group of patients is determined that is often considered as the MCID that correlates with clinical improvement.
’Between-patients’ score change
In this approach, the PRO scores or changes in PRO scores of a group of patients are compared with two adjacent levels on a global assessment scale (anchor).[14,15] This approach is used when groups of patients are compared to each other. Examples include studies in which quality of life scores are compared between an active and a control group.
Sensitivity and specificity-based approach
In this method, patients are asked to rate their change on a scale. Receiver operating characteristic (ROC) curves are used to determine the score with the best trade-off between sensitivity and specificity to discriminate between ‘improved’ and ‘unchanged’ patients.[16] In addition, the area under the curve (AUC) of an ROC curve represents the probability that scores will correctly discriminate between improved and unchanged patients. An area of 0.7 to 0.8, 0.8 to 0.9 and more than 0.9 are considered acceptable, excellent and outstanding discrimination, respectively.[17]
The distribution-based approach estimates MCID values based on the statistical characteristics of change scores within a sample.[12] This method compares the change in PRO scores to a pre-defined measure of variability like effect size (ES), standard deviation (SD) and standard error of the mean (SEM). The SD measures the variability among the observations about their mean. Various studies have used a value of half of the SD as MCID for the disease condition/disorder under study.[7] The ES is defined as the change in mean scores divided by the SD of the scores at baseline. Cohen suggested that score differences of 0.2 SD units correspond to small but clinically important differences and 0.5 SD as substantial change.[18] SEM is a measure of variability in the scores owing to the inaccurate scale or measure used.[12] SEM is promising for MCID-related research as it takes into account the amount of error specific to the instrument and the amount of random variation to be expected in repeated administrations.[19] Another advantage of SEM is that it is sample independent.[19] Reliable change index (RCI) is often used to investigate the change that has taken place during the course of an intervention. RCI, introduced by Jacobson and Truax (1984), is obtained using the difference in pre-and post-test scores divided by standard error of difference in measurements.[20] RCI cannot be used alone and always should be used in conjunction with other methods while estimating MCID.[20]
A lesser-used approach to estimate MCID is the Delphi method.[21] This method is particularly used for acute onset diseases such as strokes, especially for a technical efficacy outcome such as reperfusion, which requires an expert to appreciate the difference, rather than the patient. A survey will be conducted involving various experts wherein the MCID for the outcome variable will be estimated based on their responses. Rasch model is a novel technique that is being increasingly used in modern medical research over the past decades.[22] The Rasch model states that the person with a higher ability (less disability) has a higher chance of getting a higher score. It is especially useful when the ordinal scale doesn’t behave in a linear fashion (for example, the Modified Ranking scale in stroke).
Determination of MCID needs balancing the implications of competing errors. A possibly too high threshold increases the chance of Type-II error. A Type-II error is considered when treatment is declared non-efficacious while it does have a clinically meaningful effect in reality. Similarly, a very low threshold increases chances of committing a Type-I error. A too low threshold might lead to conclude that a treatment is efficacious when in reality it doesn’t have a clinically meaningful effect. A researcher might be more concerned about the Type-II error as it indicates an insensitive tool or an inefficacious treatment. On the other hand, the readers might be more concerned about a Type-I error, as a poor tool might be declared to have larger sensitivity than it has in reality. The inverse relationship of MCID with sample size is also important to note. Larger the MCID, smaller will be the sample size required. An inconsiderate MCID may influence the sample size and hence threaten the study with ethical, scientific and economic concerns.
While estimating the MCID value, it is important to appreciate that there is no single ‘true’ MCID value for a given measure. MCID values are dynamic and context-specific.[23] Different MCID values may exist for detecting improvement and deterioration on the same outcome measure. Application of different methods even on the same sample can yield different MCID values. Also within each method, MCID value may change depending on the variable or anchor used. Similarly, usage of the same outcome measures and methods on different study populations can yield different MCID thresholds. Hence, it is recommended that multiple approaches should be used while estimating MCID thresholds.[6]
Both the anchor-based and distribution-based methods have their inherent advantages and limitations attached. While anchor-based methods take into account the patients’ perception of change in score, the same is a limitation of distribution-based methods.[12] Therefore, while estimating MCID thresholds, it has been suggested that both the approaches can be used to triangulate the process.[10] Secondly, MCID doesn’t take into account the cost of the intervention.[24] For example, a patient might appreciate the change after a treatment, but may not consider that was the benefit gained worth it. Lastly, MCID considers a change in scores from baseline and because not all PRO scales are true interval scales (for example, visual analogue scale), the amount and quality of change is likely to be different for improvement and worsening.[25]
MCID thresholds for commonly used scales in neurology
Disease | Scale | MCID threshold |
---|---|---|
Migraine[28,29,30,31] | HIT-6 | 2.5 to 6 |
Primary dystonia[32,33] | BFMD-RS | Improvement >16.6% Worsening >21.5% |
Epilepsy[34,35,36,37,38,39] | QOLIE-31 | 12 to 16 points |
Alzheimer’s dementia[40,41,42] | MMSE | 1 to 3 points |
Parkinson’s disease[26,27,43,44,45,46,47,48,49,50,51] | UPDRS | 4 to 8 points |
Multiple Sclerosis[52,53,54,55] | T25FW | 1.3 to 12.6 seconds |
Multiple Sclerosis[52,54] | 6MWT | 76.2 to 88 metres |
Stroke[56] | Barthel Index | 1.45 to 1.85 |
Abbreviations: HIT-6: Headache impact test-6, Burke-Fahn-Marsden Dystonia – Rating scale (BFMD-RS), QOLIE-31: Quality of life in epilepsy inventory-31, MMSE-mini-mental state examination, UPDRS – Unified Parkinson’s Disease Rating Scale, T25FW – Timed 25-foot walk test, 6MWT – 6-minute walk test
General characteristics of studies used for calculating MCID in various neurological disorders
Author, Year Reference | Study design | Number of patients | Mean age (SD) | Disease (specific treatment) | Scale | |
---|---|---|---|---|---|---|
Speck et al., 2021[57] | RCT | 2850 | 41.1 (11.6) | Episodic/chronic migraine (Galcanezumab) | Migraine specific QoL questionnaire v2.1 | |
Houts et al., 2020[28] | RCT | 1072 | 40.5 (11.2) | Chronic migraine (Eptinezumab) | HIT-6 | |
Smelt et al., 2014[29] | RCT | 490 | 47.9 (10.1) | Episodic migraine (Proactive approach vs usual care) | HIT-6 | |
Pintér et al., 2020[32] | Prospective cohort study | 198 | 46.1 (16.2) | Primary dystonia (medical, DBS) | Burke-Fahn-Marsden Dystonia – Rating scale (BFMD-RS), Burke-Fahn-Marsden Dystonia – Disability scale (BFMD-DS) SF-36 | |
Katzberg et al., 2014[58] | RCT | 51 | 56 | Myasthenia (IVIg) | QMGS | |
Rider et al., 2004[59] | Prospective observational study | 29 experts | - | Adult/juvenile myositis | MD global activity, Patient/parent global activity, muscle strength (MMT), physical function (HAQ/C-HAQ, CMAS), muscle enzymes, extra-muscular activity | |
Merkies et al., 2017[60] | Prospective multicentric phase 3 trial | 28 | 58 (22-79) | CIDP (IVIg) | INCAT, MRC, maximum grip strength | |
Qiu et al., 2019[61] | Prospective cohort study | 50 | - | Refractory temporal lobe epilepsy (Amygdalohippocampectomy) | QOLIE-31 score | |
Cramer et al., 2014[62] | RCT | 776 | - | Drug resistant focal epilepsy (Lacosamide) | SSQ | |
Andrews et al., 2019[40] | Retrospective cohort study | 19566 | 73.05 (9.8) | Alzheimer’s disease | M MMSE, CDS sum of boxes, FAQ | |
Makkos et al., 2019[63] | Prospective cohort study | 436 | - | Parkinson’s disease | UDysRS, MDS-UPDRS Part IV | |
Makkos et al., 2018[44] | Prospective cohort study | 452 | - | Parkinson’s disease | MDS-UPDRS scale | |
Schrag et al., 2006[27] | Pooled analysis of two RCTs | 603 | - | Parkinson’s disease (Ropinirole, Bromocriptine, Levodopa) | UPDRS | |
Negahban et al., 2018[64] | Prospective cohort study | 38 | 36 (8) | Multiple sclerosis (Balance rehabilitation) | ABC, BBS, FGA, 2MW, 10MTW, TUG, C2MW, CTUG | |
Lin et al., 2020[65] | Survey | 58 experts | - | Stroke (Endovascular thrombectomy) | Substantial reperfusion (TICI 2b-3) | |
Chen et al., 2019[66] | Prospective observational study | 115 | 54.2 (11.1) | Stroke (Rehabilitation) | MAS | |
Wu et al., 2019[67] | Prospective observational study | 65 | 53.5 (11.7) | Stroke (Rehabilitation) | MoCA | |
Fulk et al., 2018[68] | RCT | 265 | 61.3 (12.8) | Stroke (Rehabilitation) | 6MWT | |
Kim et al., 2015[69] | Prospective observational study | 487 | 68.3 (8.1) | Stroke | EQ-5D, SF-6D | |
Fulk et al., 2010[70] | Prospective cohort study | 36 | 60.9 (15.6) | Stroke (Rehabilitation) | SIS-16 | |
Schurch et al., 2007[71] | RCT | 59 | 41.2 | Neurogenic incontinence (Botulinum toxin A) | I-QOL questionnaire | |
Perera et al., 2006[72] | Pooled data of three clinical studies | 692 | - | Elderly individuals with mobility disabilities (Physical rehabilitation) | Gait speed, SPPB, 6MWD | |
Pintér et al., 2019[73] | Prospective cohort study | 248 | 58.7 (16.7) | Essential tremor | QUEST | |
Author, Year Reference | Subscale/dimensions | Generic | Specific | Anchor-based | Distribution-based | MCID threshold |
Speck et al., 2021[57] | Role function restrictive (RFR), Role function preventive (RFP), Emotional function (EF) | No | Yes | Yes | No | RFP: 20 RFR: 25.71 EF: 26.67 |
Houts et al., 2020[28] | - | No | Yes | Yes | Yes | -6 |
Smelt et al., 2014[29] | - | No | Yes | Yes | No | -2.5 to -6 |
Pintér et al., 2020[32] | - | Yes | Yes | Yes | Yes | BFMD-RS: Improvement >16.6%, worsening >21.5%; BFMD-DS: Improvement or worsening >0.5 points SF-36: Improvement >7.5 points, worsening >8.5 points |
Katzberg et al., 2014[58] | - | No | Yes | Yes | Yes | 3 |
Rider et al., 2004[59] | - | Yes | Yes | Delphi method | Improvement by ≥20% in 3/6 core set measures, no more than worse by≥25% (which could not include MMT to assess strength) | |
Merkies et al., 2017[60] | - | Yes | Yes | Yes | Yes | INCAT >4; MRC >4; Grip strength >8 kPa |
Qiu et al., 2019[61] | - | No | Yes | No | Yes | 16.47 points |
Cramer et al., 2014[62] | - | No | Yes | Yes | No | 0.48 |
Andrews et al., 2019[40] | - | Yes | Yes | Yes | Yes | MMSE – 1-3 points decrease; CDS sum of boxes 1–2 points increase; FAQ 3-5 increase |
Makkos et al., 2019[63] | - | No | Yes | Yes | No | UDysRS Part I: Improvement >2.1; Worsening>1.8; Part 2: Improvement >1.8, Worsening >1.7; MDS-UPDRS part IV: Improvement 0.9, worsening 0.8 |
Makkos et al., 2018[44] | MDS-UPDRS – Total, Part II+III, Part I+II+III | No | Yes | Yes | No | MDS-UPDRS: Total: Improvement 7.1, worsening 6.3 Part II+III: 4.9, 4.2 Part I+II+III: 6.7, 5.2 |
Schrag et al., 2006[27] | Total, Part II, Part III | No | Yes | Yes | No | UPDRS-Total: 8 Motor: 5 ADL: 2–3 |
Negahban et al., 2018[64] | - | Yes | Yes | Yes | N0 | ABC: 4.5 points BBS: 3 points FGA: 4.5 points 2MW: 7.5 metres C2MW: 14 metres 10MTW: 0.11 m/sec C10MTW: – 0.04 m/sec TUG: 0.77 sec CTUG: – 1.0 sec |
Lin et al., 2020[65] | - | No | Yes | Delphi method | Median: 3.1–5% | |
Chen et al., 2019[66] | - | No | Yes | No | Yes | Upper extremity: 0.45–0.48 Lower extremity: 0.45–0.73 |
Wu et al., 2019[67] | - | Yes | No | Yes | Yes | 1.22–2.15 |
Fulk et al., 2018[68] | - | Yes | No | Yes | No | 71 metres |
Kim et al., 2015[69] | - | Yes | No | Yes | No | EQ-5D: 0.08–0.12 SF-6D 0.04–0.14 |
Fulk et al., 2010[70] | - | No | Yes | Yes | No | 9.4–14.1 |
Schurch et al., 2007[71] | - | No | Yes | No | Yes | 4–11 points |
Perera et al., 2006[72] | - | Yes | Yes | Yes | Yes | Gait speed: 0.05 m/sec; SPPB: 0.5 points; 6MWD: 19–22 meters |
Pintér et al., 2019[73] | - | No | Yes | Yes | Yes | Improvement >4.47; Worsening >4.98 |
Abbreviations: RCT: Randomized clinical trial, HIT-6: Headache impact test-6, DBS: Deep brain stimulation, SF-36: Short form health survey-36, MMT – Manual muscle testing, HAQ – Health assessment questionnaire, C-HAQ – Childhood health assessment questionnaire, CMAS: Childhood myositis assessment scale, CIDP: Chronic immune demyelinating polyradiculoneuropathy, INCAT – Inflammatory neuropathy cause and treatment, MRC – medical research council, QOLIE-31: Quality of life in epilepsy inventory-31, SSQ– Seizure severity questionnaire, MMSE-mini-mental state examination, CDS-clinical dementia rating, FAQ – Functional activities questionnaire, UDyRS – Unified dyskinesia rating scale, MDS-UPDRS – movement disorder society P Unified Parkinson disability rating scale, Activities-specific Balance ABC), Berg Balance Scale (BBS), Functional Gait Assessment (FGA), 2 Minute Walk test (2MW), 10 Metre Timed Walk (10MTW), Timed Up and Go (TUG), cognitive 2 Minute Walk (C2MW), Cognitive Timed Up and Go (CTUG), TICI – Thrombolysis in cerebral infarction, MoCA – Montreal cognitive assessment test, Modified Ashworth Scale (MAS), 6MWT – 6-minute walking test, EQ-5D – EuroQol 5-Dimension (EQ-5D) health status index, SF-6D – short form health survey – 6 dimension, Stroke Impact Scale-16 (SIS-16), Incontinence Quality of Life (I-QOL) questionnaire, Short Physical Performance Battery (SPPB), 6-minute-walk distance (6MWD), QUEST – Quality of life in essential tremor
MCID methods estimation: Anchors and statistical methods
Author, Year Reference | Anchor based | Distribution based | |||||
---|---|---|---|---|---|---|---|
Number of anchors | Anchor | View point | Cut-offs used | Statistical methods | Number | Distribution criteria | |
Speck et al., 2021[57] | 2 | PGI-S, PGI-I | Patient | PGI-S: ≥ 1 improvement; PGI-I: ≥ 1 improvement | Mean change | - | - |
Houts et al., 2020[28] | 3 | PGIC, MMDs, Change in EQ-5D-5L-VAS | Patient | PGIC: Improved or not improved; MMDs: ≥75% reduction; EQ-5D-5L-VAS: > 10-point increase | Mean change | 2 | 0.5 SD, SEM |
Smelt et al., 2014[29] | 2 | Headache condition, limitation of ADL | Patient | Response on a scale | Mean change, ROC | No | No |
Pintér et al., 2020[32] | 1 | PGI-I | Patient | Response on a scale (7-point Likert scale) | Mean change, Regression analysis | 1 | ES |
Katzberg et al., 2014[58] | 1 | Patients perception of overall improvement on VAS | Patient | Response on a scale (7-point Likert scale) | Mean change, ROC | 3 | 0.5 SD, SEM, ES |
Rider et al., 2004[59] | - | - | - | - | - | - | - |
Merkies et al., 2017[60] | 1 | SF-36 (question 2) | Patient | Response on a scale | Mean change | 1 | 0.5 SD |
Qiu et al., 2019[61] | - | - | - | - | - | 1 | 1 SD |
Cramer et al., 2014[62] | 1 | PGIC | Patient | Response on a scale (7-point Likert scale) | Mean change | - | - |
Andrews et al., 2019[40] | 1 | Clinician’s assessment of meaningful decline | Clinician | - | Mean change | 3 | ES, SRM, 0.5 SD |
Makkos et al., 2019[63] | 2 | PGI-I | Patient | Response on a scale | ROC | - | - |
Makkos et al., 2018[44] | 1 | PGI-I | Patient | Response on a scale | Regression analysis | - | - |
Schrag et al., 2006[27] | 1 | CGI-I | Clinician | Response on a scale (7-point Likert scale) | Mean change | - | - |
Negahban et al., 2018[64] | 1 | Global rating scale | Patient | Response on a scale (7-point Likert scale) | ROC | - | - |
Lin et al., 2020[65] | - | - | - | - | - | - | - |
Chen et al., 2019[66] | - | - | - | - | - | 2 | 0.5 SD, 0.8 SD |
Wu et al., 2019[67] | 1 | Perceived recovery score of the SIS 3.0 | Patient | 10–15% change | Mean change | 1 | 0.5 SD |
Fulk et al., 2018[68] | 2 | mRS, SIS | Patient, clinician | Improvement in mRS ≥1; Increase in SIS by 10% | ROC | - | - |
Kim et al., 2015[69] | 2 | mRS, Barthel Index (BI) | Patient | mRS: Response on a 5-Likert scale: BI: Difference of at least 4 points | Mean change | - | - |
Fulk et al., 2010[70] | 2 | GROC | Patient, clinician | Response on a scale (15-point Likert scale) | ROC | - | - |
Schurch et al., 2007[71] | - | - | - | - | - | 3 | 0.2 SD, 0.5 SD, SEM |
Perera et al., 2006[72] | 2 | Two items from SF-36 GMCR flight of stair | Patient | SF-36: Limited a lot/limited a little/not limited at all; GMCR: Response on a 15-point Likert scale | Mean change | 2 | ES, SEM |
Pintér et al., 2019[73] | 1 | PGI-I | Patient | Improvement or worsening | Mean change, ROC | 1 | ES |
Abbreviations: Patient Global Impression of Severity (PGI-S), Patient Global Impression of Improvement (PGI-I); Monthly migraine days (MMDs), PGIC – Patient global impression of change (PGIC), monthly migraine days (MMDs), change in EuroQol 5 dimension5 levels visual analogue scale (EQ-5D-5L-VAS), ADL – Activities of daily living, ROC – Receiver operator characteristics, VAS – visual analogue scale, SF-36 – short form health survey 36 questionnaire, Clinician Global Impression of Improvement (PGI-I); mRS – Modified Rankin scale, SIS – stroke impact scale, SD – standard deviation, ES – effect size, SEM – standard error of measurement, SRM – standard response mean, Global Mobility Change Rating (GMCR)
The importance of correlating statistically significant results with clinical significance in clinical trials is being increasingly recognized to prevent misinterpretation of study outcomes and avoid subjecting patients to unnecessary treatments.[74,75] In 2003, the Task Force on Rating Scales for Parkinson’s Disease of the Movement Disorder Society emphasized the significance of establishing MCID thresholds for UPDRS (Unified Parkinson’s Disease Rating Scale) and urged researchers to determine MCID thresholds for the same.[76] Not only from a patient perspective, MCID is also gaining momentum from a regulatory perspective. The US Food and Drug Administration device branch has, in some instances, required sponsors of pivotal acute ischemic stroke trials to specify the MCID in advance and has established a framework to evaluate trial success. This framework includes not only the usual attainment of statistically significant differences between treatment arms but also a requirement that the point estimates for superiority exceed the pre-specified MCID. This indicates probable rather than possible clinical importance.[77]
To estimate the MCID for a specific PRO measure, it is suggested to use multiple approaches and triangulate methods. Anchor-based methods that involve various patient-rated, clinician-rated and disease-specific variables can provide primary and meaningful estimates of an instrument’s MCID. In addition, results from previous clinical trials, including the PRO measures, can offer insight into observed effects based on treatment comparisons and aid in determining the MCID. In situations where anchor-based estimates are not available, distribution-based methods can provide supportive information and help interpret estimates from anchor-based approaches. If the scale is ordinal, the raw score difference may not accurately represent the clinical significance of the change. In such cases, the Rasch model can be used to transform the ordinal scale into an interval-based scale, providing a more accurate determination of the MCID.[78] It is recommended that the MCID primarily rely on relevant patient-based and clinical anchors, while clinical trial experience is utilized to enhance the understanding of the MCID.[25,79] Using a combination of methods will generate a range of values that require decision-making guidance to choose a single value or a narrow range of MCID values. This can be facilitated by further analyzing the MCID in the context of the population and intervention to arrive at a more precise range of MCID. However, if it is still difficult to select the best MCID or uncertainty exists, researchers may consider using a systematic review and evaluation process, such as a modified Delphi method, to determine the final selection of MCID values.[80]
The reluctance of researchers to use or report the MCID may stem from various reasons, but one significant factor is the vagueness of the concept and the lack of clear guidelines on how to define it. This is further complicated by practical and financial limitations that may determine the trial’s sample size instead of the MCID. In certain situations, researchers may establish the estimated treatment effect and MCID after determining the maximum feasible sample size, which can result in biased assumptions. While there is no universal solution for determining the MCID at present, implementing processes such as a multidisciplinary committee could aid researchers in planning and executing their studies.[81]
Inconsistent MCID thresholds can be a drawback in clinical trials because they can lead to variability in how the treatment effect is interpreted. When there are inconsistent MCID thresholds, it can be difficult to compare the results of different studies. For example, one study might use an MCID threshold of 5 points on a particular scale, while another study might use a threshold of 10 points on the same scale. This can make it challenging to determine whether the treatment effect is clinically significant across studies. Inconsistent MCID thresholds can also affect clinical decision-making. If the threshold for clinical significance varies widely between studies, it can be difficult for clinicians to determine the appropriate course of action for a given patient. To address this issue, it is important for researchers to clearly define and justify the MCID threshold used in their study. This can help to ensure that the treatment effect is interpreted consistently and that the findings are comparable across studies.[82,83,84,85]
Despite the challenges and controversies surrounding the determination of MCID, it remains a crucial concept in modern clinical trials. Moreover, there is a growing appreciation for the need to incorporate the patient’s viewpoint into trials. Trial sponsors who demonstrate a comprehensive understanding of patients’ conditions and the obstacles they face in participating in the study can significantly improve recruitment and retention rates, while also generating more informative data.[86,87,88] Implementing a patient-centred approach can further facilitate the identification and rectification of hitherto neglected outcome domains, such as sleep disturbances or experiences with the intervention performed on them.[89,90] MCID serves as a benchmark for measuring meaningful improvement or deterioration in patients and should be used by researchers and clinicians alike to better formulate and guide clinical practice in future. Lastly, because the MCID thresholds are dynamic and context-specific, using the previously established MCID thresholds is strongly discouraged especially when the study population, intervention and the outcomes are different to the original one.
MCIDs are the smallest change of scores that are subjectively meaningful to patients. While anchor-based approach estimates MCID by comparing change scores using an external anchor, distribution-based approach estimates MCID values based on the statistical characteristics of scores within a sample. MCID is dynamic and context-specific and there is no ‘gold standard’ method for estimating MCID. A range of MCID threshold should be defined using multiple methods for a disease under targeted intervention rather than a single absolute value.