Evidence behind a digital tech intervention remains scant
In the world of digital health, Silicon Valley-based Mindstrong stands out. It has a star-studded team and tens of millions in venture capital funding, including from Jeff Bezos’ VC firm.
It also has a captivating idea: that its app, based on cognitive functioning research, can help detect troubling mental health patterns by collecting data on a person’s smartphone usage—how quickly they type or scroll, for instance.
The promise of that technology has helped Mindstrong build incredible momentum since it launched last year; already more than a dozen counties in California have agreed to deploy the company’s app to patients.
Does the app live up to its promise? There’s no way to tell. Almost no one outside the company has any idea whether it works. Most of the company’s key promises or claims aren’t yet backed up by published, peer-reviewed data—leading some experts to wonder if the technology is ready for the real world.
“I wouldn’t waste all that time and money in the wild until they get sure that some of those things are as specific as they hope they are,” said Rosalind Picard, a researcher at MIT Media Lab who is familiar with Mindstrong’s work and tries to use data from smartphones and wearables to detect a person’s mood.
Even as one of the company’s executives, Dr. Tom Insel, acknowledged to STAT that the app isn’t perfect, the company’s CEO emphasized that Mindstrong could provide unprecedented insight into conditions like depression.
Mindstrong is not alone in pushing the frontiers of smartphone-based digital health. Many companies use so-called digital phenotyping, collecting scientific data on a person’s digital life, to gain insights into his or her physical or mental health.
The company’s app collects information about how people are typing and runs it through a machine learning algorithm to determine which data can predict their emotional state. Mindstrong has already used it in controlled clinical settings and trials—including one run by a company developing new antidepressants and another done in a ketamine clinic. The app is available in Apple’s app store, but requires a participant code to access it.
“We’ve done the validation work against the gold-standard clinical tests for depression, for anxiety, for cognitive decline, whether it’s memory or executive function,” said Dr. Paul Dagum, the company’s founder. “We’re confident, we’re already seeing some really exciting results.”
In the last year, Mindstrong’s footprint and reach have already grown exponentially. The Palo Alto-based company’s workforce has doubled to 42 employees and it made a sizable gift to Harvard’s school of public health. In February, it launched a partnership with Takeda to develop new biomarkers that will be able to aid the pharmaceutical giant’s clinical trials for depression treatments.
The idea is to use that data to establish a “normal” pattern—so it can be compared against someone’s typing habits on any given day. If the habits look off, slower or more agitated than normal, the app can alert a health care provider.
Abnormal patterns, Mindstrong says, might show up if a person is more depressed or anxious, or if just about anything else about their mental health changes. When asked which disorders Mindstrong might be able to detect, Dagum replied, “all of them.” (Dagum, a data scientist and physician, founded the company in 2017 with Rick Klausner, the founder and director of CAR-T pioneer Juno Therapeutics and Grail, a liquid biopsy company.)
Mindstrong officials told STAT that among their most encouraging results is that its app can even predict how a person will feel next week, or at least how a person will perform on the Hamilton Rating Scale for depression—kind of like a weather app for your mood.
The data behind this claim is being published soon, said Insel, who is the former head of the National Institute of Mental Health and came to the company in 2017 after a short stint at Verily.
The app can detect a seven-point change on the Hamilton scale, Insel said. That kind of difference could indicate a patient who is not normally depressed now shows signs of mild or moderate depression, or that a person with moderate depression is now showing signs of a very severe condition.
“For a clinician and for someone taking care of a patient, knowing that, it could be very, very powerful,” Dagum said.
The company’s momentum has taken it to the cusp of a real-world deployment in California. About 15 counties—including the most populous county in the United States, Los Angeles County—will be spending about $60 million over the next four years to bring companies like Mindstrong and other apps into their health care system.
These counties hope apps will help them get better services to people with mental illnesses like depression, schizophrenia, bipolar disorder, and post-traumatic stress disorder.
The Mindstrong program itself is limited: Patients can choose voluntarily whether to use the app, which will be free to them, and that decision won’t affect the rest of the mental health services they can access.
So far, the Mindstrong app has only been used in controlled clinical settings and trials—including one run by a company developing new antidepressants and another done in a ketamine clinic. The company has also claimed that a “nationwide employer” and private substance abuse clinics in D.C. are using the app.
But other than the change on the Hamilton scale—which hasn’t yet gone through peer review and was disclosed to STAT in an interview—almost no data about how well Mindstrong’s technology works is available to independent observers.
The company’s website describes five completed clinical trials, but it has not yet published the results of any. Only a handful of other published works—all from the last year—have hinted at how well it works or its scope with data to back up the claims.
The company published a 27-person pilot study in the journal npj Digital Medicine earlier this year. Dagum is also an author on a poster presentation given at the American College of Neuropsychopharmacology’s 2017 conference, another poster that reported results from a very wide variety of digital phenotyping techniques—not just typing—and a paper describing a clinical trial protocol—not results.
As Mindstrong steps toward a wider rollout, the scientific studies behind its claims will matter. Federal regulators, for one, have cracked down on commercial apps that misleadingly reference a study’s conclusions to market their app.
Based on her own research, at least one expert in digital health and mood said she’s skeptical that Mindstrong can, in a general population, work as well as the company claims. MIT’s Picard said that while there are ways to predict or detect mood changes, you usually need more than just a single type of data to do so.
“I’m suspicious that a single modality like typing is going to be sufficient. It would be like saying there’s a single question [on a screening questionnaire] that a doctor could be using,” said Picard, who is also CEO of a company that works on digital phenotyping, like Mindstrong does.
“My guess is that their specificity to depression is going to be relatively low,” Picard said.
Her own research, for example, relies on temperature and skin conductivity as well as calls and the amount of time spent on a phone to predict mood changes. It is about 80 percent accurate.
Especially in the field of digital mental health, “we need more peer review,” said Dr. Steven Steinhubl, the director of digital medicine at Scripps Research Translational Institute. (Steinhubl is also the co-editor-in-chief of npj Digital Medicine.) Though he said he strongly believes in the potential of apps like Mindstrong, Steinhubl cautioned that peer review has a purpose.
“[Peer review] is a very imperfect system, but there’s really nothing in the peer-reviewed literature. That means that other experts aren’t able to weigh in,” he said. “If you have committees and other people reviewing something who maybe don’t have the same level of expertise, you’ll have people saying, ‘Yeah, that sounds good.’”
Other researchers have also found that neuropsychological tests, more broadly, have relatively low accuracy rates.
In a study that examined people who were already being treated for depression, one computerized test could only accurately predict their condition in about 40 percent of cases. Another showed a 44 percent accuracy rate for a similar computerized test used to examine people with major depression.
A neuropsychological test—if it’s used as a screening test—is “going to miss a lot of people who are depressed,” said Richard Porter, a psychology researcher based at the University of Otago in New Zealand who conducted one of the studies.
And even if depressed people do show some kind of cognitive impairment, it’s impossible to tell what caused it.
Many things other than mental illness might cause a person to perform poorly on cognitive tests—like living with another disorder, having a lower baseline performance on cognitive tests, having a drink or taking prescription, over-the-counter, or illegal drugs.
Mindstrong’s leaders aren’t worried about that kind of noise in their data. Some of those factors are important to note, both for patients and the health care professionals working with them.
“You’re hungover, you didn’t sleep well, you didn’t take your medication, you have a medication side effects, you’re having stress and challenges at work and at home. Those are things that we want to measure,” Dagum said.
But even Insel admitted that there are plenty of issues that could affect typing speed—and which Mindstrong hasn’t figured out how to sort out yet. Sticky fingers after lunch, full hands at an airport, wearing gloves during winter, or a broken hand might also plausible affect a person’s typing speed—and, therefore, the app’s performance.
“One thing we’ve thought about is how we factor in those unusual environmental issues,” Insel said. “We’re working on that. But I can’t say that we’ve solved all of those possible issues.”
Insel and others linked with the company are fond of comparing their app to a smoke detector—something that’s intended to enhance humans’ senses to detect danger.
But part of the value of a smoke detector is that if it’s functioning properly, we know it isn’t going off at random. It only goes off in certain conditions and carries a specific message: Your house is on fire or about to be. Do something.
At least for now, that’s where Mindstrong differs from a smoke detector. There’s no way to tell, yet, how specific it is or how sensitive its algorithm is.
Insel said that information is coming. He said the company has the data about the app’s accuracy—but he declined to provide those figures, citing papers pending publication. “[They] square very well with clinically used biomarkers,” he said.
California authorities suggest they have been shown some of that data. But they’re nevertheless cautious about how the app will work in their new, different setting.
One official said there will be “clear writing” included with the state’s version of the app about what it can do, what it cannot do, and what goals the counties hope it will help them achieve.
Those goals are pretty lofty. At least some counties eventually plan to use it not only to supplement the existing system, but potentially to bring more people into its fold.
“We might be able to go to colleges, emergency departments, other places,” said Debbie Innes-Gomberg, a deputy director at the Los Angeles County Department of Mental Health. “There’s a process of identifying that they’re symptomatic, but [our target population is] people that are in our system and people who maybe need to be.”
And even before the app launched in the original five counties that had signed on, the pilot has expanded. Another 11 counties have recently decided to join.
Still, Innes-Gomberg said, it’s going to be rolled out with caution. “We’re not going to oversell this.”
By Kate Sheridan for Scientific American