John Beigel’s problem was that not enough people were dying of Covid-19. Not that he wanted them to, of course. It’s just that, as bad as the pandemic has gotten, it hasn’t killed as many people as it seemed like it might. And Beigel, a researcher at the National Institute of Allergy and Infectious Diseases, was designing a big study to see if an experimental antiviral drug called remdesivir would work against the disease. He and his team needed the right “end point”—the thing they could count, turn into data, and analyze.
Mortality is a great end point. It’s right there in the name.
“But when we were thinking of end points, we thought a mortality study would need 3,000 or 4,000 people,” says Beigel, associate director for clinical research in NIAID’s Division of Microbiology and Infectious Diseases. Those high numbers would give his team enough “events”—which is to say, deaths—to get a statistically significant measurement of the drug at work. And they didn’t have enough time to get that many people enrolled in their US study. “We thought it was important to get the study done, to get a clinically meaningful end point, without taking the time we would need to do a mortality study,” he says.
This was back in February, before anybody really knew anything about Covid-19. So Beigel’s team tried a different tack, one familiar to scientists and regulatory agencies like the Food and Drug Administration (which issued the Emergency Use Authorization under which people are studying the drug). Participants in the study would get scores, every day, calculated on what’s called an ordinal scale. Healthy and released from the hospital, you get a score of 1. Dead, you get an 8. The other numbers were for everything in between—like whether the person has to get admitted to the hospital or needs oxygen or has to go on a mechanical ventilator.
Then they found another problem. “When we first wrote the protocol, the end point chosen was ordinal score at day 15,” Beigel says. “That is something we’ve used for influenza studies before, so we knew the FDA would be OK with it, and is something that actually matters to the subject.” That’s a good end point: It’s not just a statistical entity. It’s long enough to show an effect in many diseases, and it has clear clinical relevance.
“In March we started hearing reports that the course of the disease might be much longer, and that there were people in the hospital for three weeks, up to four weeks,” Beigel says. “What happens if the recovery is much later than day 15? You might actually have a significant difference, yet you wouldn’t show it.”
So Beigel’s team changed their end point: ordinal score at day 28. They hadn’t seen their data yet when they made the switch. That would’ve been an ethical no-no, chasing statistical significance by juking their methods. But juking ahead of data? Kosher, but they knew it’d be controversial. “It raised suspicion for our study,” Beigel says. “If this was something that is well known, like influenza, and we switched in the middle of the study, that would be really suspicious. For Covid, we haven’t seen anything like this.”
In a study published in late May in The New England Journal of Medicine (and previewed during a press conference at the White House), the team concluded that patients taking remdesivir recovered in a median time of 11 days, while people given a placebo took a median of 15. It was enough to get remdesivir added to the US standard of care, pronto—the first drug identified as having a beneficial effect on the new disease.
That sounds like an ending to remdesivir’s story, but it isn’t. Other data on the drug’s effectiveness has an even more slippery relationship with picking the right end points. In the midst of a global crisis, scientists are trying to solve an epistemologically intractable question. Defining whether a drug “works” has never been easy, a task vexed by methodological uncertainty, commercial pressures, statistical errors, or sometimes straight-out bad practices. Facing a new disease, researchers have to rethink what success even means. Is it lower mortality? Less disability upon recovery? Faster recovery? The answers are cryptic because the questions are just educated guesses.
Scientific studies have to reduce all the messiness of the real world to crunchable numbers. But the point of all this, remember, is to find actual treatments that help people fight Covid-19. When a study is over, health care workers have to be able to turn those fine-grain statistics back into something useful, into clinical procedures. That was hard enough in the Before Time. Trying to do useful science on a disease that’s only eight months old means coping not only with goalposts that move, but an undulating playing field and rules of the game that keep changing.
“What makes Covid special is that at the time of planning the studies, there’s been a lot of uncertainty about trajectory, about the different natural progression of the disease,” says Thomas Jaki, a statistician at Lancaster University who has written about designing Covid-19 trials. “For setting up and running a trial, especially early in the pandemic, one of the earliest challenges has been around time frame.”
Some researchers aren’t even sure that Beigel’s NIH-based study says enough about how useful the drug actually is. The NIH paper defined “improvement” as “either discharge from the hospital or hospitalization for infection-control purposes only,” and said the drug reduced the time it took to get that improvement by 31 percent. (It also counted mortality, but just as Beigel and the team worried, didn’t show a significant difference.) “Do you think that’s clinically very meaningful? I don’t know,” says Lee-Jen Wei, a biostatistician at Harvard’s School of Public Health, of that four-day difference. “You have to do more than just statistical significance. You need a clinically meaningful improvement. Otherwise why should I bother to pay?”
In June, the CEO of Gilead, the pharmaceutical company that makes remdesivir, wrote an open letter defending the drug’s high cost—$2,340 for six vials, a full five-day course of treatment. In terms of time-to-improvement, Daniel O’Day wrote, that’s good value: “Taking the example of the United States, earlier hospital discharge would result in hospital savings of approximately $12,000 per patient.”
Beigel, too, says that his group’s results meet that minimum level of proof the drug works. “If my parents had the disease, I’d want them to get this drug,” he says. “But we need to do better. That’s why we’re doing more studies.”
Those more definitive studies might still want to go after mortality as an end point. Other, bigger studies around the world have been able to marshal enough participants to study mortality at a certain time point. In the United Kingdom, the Randomised Evaluation of Covid-19 Therapy (or “Recovery”) trial has enrolled thousands of people and has released policy-changing studies of the antimalarial drug hydroxychloroquine (didn’t help) and the steroid dexamethasone (helped). But no central authority has emerged to turn the huge numbers of Covid-19 cases and hospitalizations into the kind of major studies that produce bomb-proof scientific conclusions. Instead, the US has a patchwork of smaller studies that use other end points to get statistical significance.
In fact, the other studies of remdesivir have been, so far, a little less comforting as to the drug’s necessity. Gilead funded a study on the appropriate dosing to use against Covid-19—5 days versus 10 days, with the end point being improvement on an ordinal scale on day 14. The difference in how many people in each group improved was negligible; about half the people in both groups improved by two points on the ordinal scale. (The study, conducted around the world and run by Gilead, appeared in NEJM just a few days after Beigel’s team’s work.)
Other researchers have questioned the scientific utility of the Gilead-run dosing study. It makes a little more sense in terms of the economics of marketing the drug, though. “It is extremely unusual to test two dosings of a drug when you don’t know whether it works at all, but it is highly strategic,” says Peter Bach, a physician and head of the Drug Pricing Lab at Memorial Sloan Kettering Cancer Center. “From the company’s perspective, there’s billions in them thar hills if they can get a positive study. But if it’s a negative study, it’s a tough row to hoe.”
In other words, Bach says, if Gilead does a robust, rigorous study of its own drug, the company can’t win. If the NIH’s result is good news, as it was, the company has wasted time and money confirming it. If it’s bad news, Gilead’s positive results don’t help. And if Gilead found that the drug didn’t work as well as the NIH’s study said, well, that’s even more bad.
Again, though, Beigel’s group at the NIH is already doing that rigorous evaluation. In fact, says a Gilead spokesperson, that’s part of why they skipped to studying dosing. “Our goal with these studies was to answer the question of treatment duration,” emails Chris Ridley, senior director for media relations at Gilead. “Due to the overburdening of health care systems, the limited current supply of remdesivir, and the unestablished safety profile of remdesivir in this population, the ability to shorten treatment duration without reducing efficacy was an important question.” A five-day course would mean treatment is less expensive, and it would spread the limited supply of the drug out around the world. (Except that the US government quickly bought most of the global supply.)
Real Life. Real News. Real Voices
Help us tell more of the stories that matterBecome a founding member
That’s capitalism for you. Without these economic constraints, maybe Gilead could have mounted a bigger study to get at mortality more clearly. “This is what I’ve been lamenting. It’s the incentives themselves. The pharma industry’s acolytes and legions of economists who make money off the drug industry say it’s important that we pay high prices for drugs, to encourage innovation,” Bach says. “It also puts checks on doing the most rigorous evaluation of your hypotheses.”
But there’s more. In early July, Gilead researchers also presented a different cut of the study from which they’d taken the dosing data. This time they aimed not at ordinal status but for the ostensibly more clarifying end point of mortality. But the company released that data not as a peer-reviewed journal article or even a preprint, but as a conference poster and then as a press release. The Gilead data compared outcomes for 312 people with severe Covid-19 not to a randomized control group getting a placebo, but with a “retrospective cohort” of 818 people who did not participate in the study but had severe Covid-19 and got standard-of-care hospital treatment, without remdesivir. According to the company, the mortality rate was 7.6 percent on the drug and 12.5 percent without it, measured at day 14 of treatment.
This is weird for all sorts of reasons. First, if you don’t randomize the cohorts in a trial—who gets the drug and who gets the placebo, as a control—you risk the two groups getting different treatment. Sicker people might get more attention, or overall protocols might vary from hospital to hospital. That makes the two groups fundamentally incomparable. “Even in the absolute best of circumstances, we really struggle to do this and convince ourselves that what we’re seeing is a signal of the drug’s effect, as opposed to any number of confusing or confounding factors,” Bach says.
For something like Covid-19—when even mortality rates have ranged from small to upwards of 15 percent depending on when and where you look around the world—it’s even more challenging. And as the disease has spread, hospitals have altered their therapeutic approaches, changing when people go on ventilators, whether they lay on their backs or stomachs in bed, whether they get steroids, and so on. That means that subjects enrolled at the beginning of a study might get different treatment than subjects who enroll later. Again: very hard to compare outcomes.
Gilead’s July data-squirt didn’t account for any of that—because, says their spokesperson, the company wasn’t able to make enough of the non-drug they’d use in the trial as a placebo. “In the early stages of the pandemic, we not only had a limited supply of remdesivir but also a limited supply of the matched placebo required for placebo-controlled studies. We chose to prioritize manufacturing active drug over placebo, and we provided our supply of placebo to China and NIAID for their studies of remdesivir,” Ridley writes. “While the mortality benefit seen in this data analysis presented at the Virtual Covid-19 Conference (AIDS 2020: Virtual) is encouraging, it requires confirmation in prospective clinical trials.”
Observers indeed agree that further confirmation is necessary. “It’s literally not worthy of comment. It’s silly,” Bach says. “It just fails the most basic test for when you would want to rely on such an analysis.” (Even so, the announcement buoyed the entire US stock market, and goosed Gilead’s share price by 2 percent.)
So all these studies will continue. It’s still hard to know, exactly, how much real benefit remdesivir confers. Beigel’s team at NIH is still crunching the daily data—including virology and lab results—they collected from their cohorts. They’ve finished enrolling people into a study of the combined use of remdesivir and an anti-inflammatory drug called baricitinib. Gilead’s spokesperson says a fuller, peer-reviewed version of their mortality study is under review at a journal. Unbelievable as it may seem, the pandemic is still in its early stages. No one is anywhere near the end point.
More Great WIRED Stories
- How masks went from don’t-wear to must-have
- 13 YouTube channels we geek out over
- Tech confronts its use of the labels “master” and “slave”
- Poker and the psychology of uncertainty
- Keeping up with the coronas—or why the virus is winning
- 👁 Prepare for AI to produce less wizardry. Plus: Get the latest AI news
- 🎙️ Listen to Get WIRED, our new podcast about how the future is realized. Catch the latest episodes and subscribe to the 📩 newsletter to keep up with all our shows
- 💻 Upgrade your work game with our Gear team’s favorite laptops, keyboards, typing alternatives,
Subscribe to the newsletter news
We hate SPAM and promise to keep your email address safe