Why Payers Won’t Use a Microscope to See the Benefit of Lecanemab

October 12, 2022

Article by:

Camm Epstein
Founder
Currant Insights

Optical microscopes — sometimes called light microscopes — use a range of objective lenses to view small objects. Objective lenses come in various magnification powers commonly ranging from 4x to 100x. 1,000x is about the maximum magnification power of optical microscopes because of the limited resolving power of visible light.

Payers regard very small differences in efficacy as insignificant, because these changes typically are not clinically significant. A difference that is statistically significant may not be clinically significant — a truth that is worth repeating 1,000 times.

Eisai and Biogen recently reported that lecanemab’s Phase 3 clinical trial, somewhat ironically named Clarity, showed a reduction of clinical decline in patients with early Alzheimer’s disease. Despite these companies’ effort to shine a light on what they term a “highly statistically significant” reduction, payers likely won’t view this reduction as being clinically significant and will surely see covering lecanemab as inflationary.

The very small benefit

The manufacturers’ spin on the Clarity trial was that, compared with placebo at 18 months, lecanemab treatment reduced cognitive decline by 27% on the Clinical Dementia Rating – Sum of Boxes (CDR-SB). The CDR-SB measures performance in six domains: memory, orientation, judgment and problem solving, community activities, home and hobbies, and personal care. Each domain is evaluated on a scale where 0 is no impairment, 0.5 is questionable, 1 is mild, 2 is moderate, and 3 is severe dementia. These six scores are then summed to create a “sum of the boxes” score that can range from 0 to 18. Compared with placebo, the total treatment difference in score change at 18 months was minus 0.45.

The manufacturers have released only scant information about the outcomes of this trial, making it nearly impossible to evaluate the results. But the 0.45 difference will surely require much closer examination. For now, converting the reported difference in score change to more familiar scales can help us put the effect size in perspective.

A 5-point scale from 1 to 5 is commonly used by researchers and even has a name — a Likert scale. A 0.45 difference on a 19-point scale like the CDR-SB converts to a 0.12 difference on a 5-point scale. That may sound like a rounding error.

An 11-point scale from 0 to 10 has useful psychometric properties — 0 has a familiar meaning and people have an easy time converting 10 to 100. Further, scales with additional points help to discriminate scores, an advantage CDR-SB has over other dementia measures using scales with fewer points. A 0.45 difference on a 19-point scale converts to a 0.26 difference on an 11-point scale. That kind of sounds like the needle moved, but not by much.

A 101-point scale from 0 to 100 is super easy to comprehend. A 0.45 difference on a 19-point scale converts to a 2.39 difference on a 101-point scale. Does that sound significant?

Using the CDR-SB scale, some experts define mild dementia as scores ranging between 4.5 and 9.0, a 4.5-point spread. Coincidently, that 4.5-point spread holds 10 differences of 0.45. Viewed that way, a 0.45 difference appears quite small — only one tenth of the range for mild dementia.

But playing with numbers like this goes only so far. Experts in the field will offer opinions as to whether a 0.45 difference is clinically significant. And some of these experts have already questioned whether it is.

The power needed to see a benefit

If the sample size is large enough, very small differences — even differences with no practical significance — will be statistically significant. Increase the sample size further, and the statistically significant difference can become “highly statistically significant.” So, what’s the solution? Well, a power analysis is a good place to start. While there are other complexities related to the type of statistical analysis used, let’s keep things simple and say a power analysis requires a few basic inputs.

First, the power must be selected. Power is the probability of determining a difference when there is a true difference — that is, correctly rejecting the null hypothesis. Most researchers recommend a power between 0.8 and 0.9. For example, assuming there is a true difference, a power of 0.9 means that a true difference would be detected in 90 out of 100 studies. Power equals 1 minus β, where β is the chance of a type-II error (the false negative rate).

Second, the alpha level must be selected. Alpha is the probability of determining a difference when there is no true difference. The alpha level is often set at 0.05, meaning there is a 5% chance of a type-I error (the false positive rate). Some clinical trial researchers prefer to use .001 (a 0.1% chance) to help ensure that results are clinically significant.

Third, the effect size must be determined. While there are different statistical approaches to this, the smallest effect size deemed to be clinically significant should be selected — that is, the smallest effect one should care about. If, for instance, there is consensus among experts that a 1-point difference on the CDR-SB scale is the smallest clinically significant difference, then the effect size should be at least 1.

If one has values for power, alpha level, and clinically significant effect size, then it is easy to calculate and, importantly, justify the sample size. And when using that sample size, statistical significance and clinical significance are aligned.

Sample size clarity

FDA guidance for industry states that a clinical trial’s sample size is usually determined by the primary objective of the trial, and if it is determined on some other basis, then this should be justified.

How did the Clarity trial initially estimate the sample size? Was a power analysis used? If so, what values were used for power, alpha level, and effect size? The FDA should require clarity and transparency for these values. And of these three values, the a priori effect size is of particular importance because it can be more objective.

When there is intense public and political pressure to approve a drug, as is the case with Alzheimer’s disease, asking experts a posteriori whether the effect size in the results is clinically significant is unnecessary and undesirable. While the experts will remain objective, the FDA — which abdicated its regulatory responsibility to reject Aduhelm — may cave. Manufacturers should be required to justify the effect size up front when designing the trial, not on the back end when the results are in and emotions high.

For the Clarity trial, why was the sample size initially estimated in 2019 at 1,556, but then increased to 1,766 in 2021 and increased yet again to 1,906 in 2022? The delta from the initial estimated enrollment to the actual enrollment is an increase of 22.5%. Was that because enrolling more participants was cheap? No, clinical trials are very expensive. Did the manufacturers have an early sense that the effect size was small and feared that a statistically significant difference would not be detected unless they increased the sample size? Or that they could achieve a “highly statistically significant” difference if they did so? Did an unanticipated safety concern emerge? Was the dropout rate due to adverse drug events higher than expected? Presumably, the sample-size adjustment was based on modified assumptions, and presumably these changes were justified and documented.

The FDA and its advisory committee should focus on the justification of the effect size when the sample size was initially determined, and closely examine the justification for increasing the sample size.

Under the microscope

Given what appears to be a very small effect and the significant cost implications, payers will examine the Clarity results under a microscope. Payers want larger samples for stronger evidence of safety — in this case, brain swelling — and subgroup analyses showing which groups are harmed (or benefit) more. They don’t want large samples simply to yield a statistically significant difference for efficacy, and certainly don’t need or want a “highly statistically significant” difference. Ultimately, real-world evidence is what payers really, really want.

If you’ve looked through a microscope, you’ve likely experienced how disorienting a lens with higher magnification power can be. The object previously visible at a lower power can be seemingly lost at a higher power. With too much magnification power or too large a sample size, you can’t see the forest for the trees. That’s why payers likely won’t see the clinical benefit of lecanemab. And if that’s the case, then CMS and other payers either won’t cover lecanemab or will invoke Clarity’s inclusion and exclusion criteria to limit access.

No Comments

Leave a Reply