Faced with a deluge of data from functional magnetic resonance imaging (fMRI) studies, what are researchers to do? To deal with the problem, Tal Yarkoni, a postdoctoral fellow in the lab of Tor Wager, University of Colorado at Boulder, and colleagues developed NeuroSynth, a text mining and data extraction tool that facilitates large-scale analyses of neuroimaging studies. As reported in a paper published online June 26 in Nature Methods, the new software automatically extracts fMRI data from published papers, tags the data based on the words used most frequently in the papers—such as “attention,” “emotion,” or “pain”—and uses those data to construct meta-analyses. Yarkoni took a break from the recent Human Brain Mapping meeting in Quebec City, Canada, to talk with PRF about what the software can (and cannot) do, what he has learned about pain, and what further developments are already in the works.
PRF: Could you briefly explain the tool you’ve developed?
Sure. The idea is that the neuroimaging literature at this point is pretty enormous—there are well over a thousand studies published every year now. We learn something from every study, but an individual study also has lots of limitations, because it’s very expensive to collect data. So, in practice, any given study only gives us a fractional picture of what’s going on.
The solution to that, which is a solution that people use in lots of other areas, is to do meta-analyses. You take a bunch of studies and combine the results so you get a consensus estimate. And the consensus estimate presumably should be closer to the truth than any individual study.
That’s been done a fair amount in the last five to 10 years in neuroimaging. But it’s really time consuming. Typically, people track down all the articles they think are relevant to the question, and then they have to manually code and enter all the brain activations reported in that paper; this can take literally hundreds of hours for something that involves, say, 100 studies. And as the literature grows, it’s going to become more and more unmanageable.
So the goal here was to automate as much of that as we can—to figure out an automated way to mine the literature so we can directly extract information.
PRF: So text mining is really the key thing here.
Yes. We extract two kinds of information. [First,] we take each article and count up the words used in that article. So if an article uses the term “pain” a lot, we assume it’s probably an article about pain. It’s not always true—the assumption can fail, but by and large it’s a reasonable assumption. And then we also extract [fMRI] activation coordinates from the text. There’s nothing state-of-the-art about it from a text processing perspective—it’s really just applying it to neuroimaging that’s novel.
PRF: So then once you’ve got the information pulled out of papers and put into a database, what can the software do with it?
We can do a number of different things. One is to simply generate meta-analysis images for many different terms—for instance, “pain,” “emotion,” “language,” and so on. Our database allows us to extract all the activations associated with a particular term, such as pain, and then we can combine those activations to produce a brain image showing where studies that talk about pain tend to report activation. Which, to a first approximation, gives us a consensus estimate of what the brain is doing when you’re in pain.
We can also address something called the problem of reverse inference, where you’re trying to ask: If you see a given pattern of activity, what does that imply about the mental state participants were in?
The “reverse” question has been difficult to address in previous studies. The reason is that, when you’re asking what regions are most specific to pain, you need to contrast pain with everything else—if you want to know what’s special about pain, you need to know what’s not special to emotion, working memory, all these other domains. And, of course, no single study can capture everything, and even manual meta-analyses can’t really do this, because nobody’s going to sit there and code 5,000 studies manually. That’s a unique benefit of our approach.
PRF: So you’re saying that many patterns of brain activity aren’t unique to particular cognitive states. Do you feel like that often leads to misleading conclusions in neuroimaging studies?
Absolutely—it’s a huge problem. Here’s an example. There are specific parts of the brain that tend to activate very widely, in lots of different tasks. The reason is that we think they’re doing something fairly broad—like supporting goal-directed attention, by which I mean that if I put you in a scanner and ask you to do a task, you have to focus your attention on something. So what often happens is people will report activation in some of these [attention-related] regions, like lateral frontal cortex and dorsal medial frontal cortex, and because they’re thinking of it in terms of what they’re interested in—they did a study of emotion, for instance—they’ll find regions that are activated, and they’ll conclude that these are involved in the processing of emotion. Or in the processing of pain. Or in almost anything, really. That’s a problem, because it’s very hard to determine whether that region is specific to any given task, unless you have all these other things to compare it to.
PRF: You’ve tried out the tools on a number of different cognitive states, and pain is one of them. Have you seen that any particular brain regions are less indicative of pain than has been assumed?
Yes. For example, people have talked a lot about the “pain matrix,” which refers to a set of regions that include the anterior cingulate cortex, the anterior insula, thalamus, and sensory cortices. The idea has been that the anterior regions code for the affective, or emotional, aspects of pain, whereas the posterior ones code for the sensory aspects. Both sets are important to the experience of pain, but what our results show is that the anterior ones are not specific to pain at all, and in some cases are actually less involved in pain than in other things. So, sure, it may be true that the anterior insula helps code the affective component of pain. But because it plays a similar role in many other domains, too, it’s not particularly diagnostic of pain. The regions that we think are more strongly diagnostic of pain are the posterior sensory regions.
I should also say that this isn’t really a new point, and we’re actually affirming what other people have said in other recent studies. But because of the size and comprehensiveness of our database, I think we have better evidence for it than people have had in the past.
PRF: Any other hypotheses that your findings raise about pain?
Because we can generate these maps for lots of different terms, we can compare how easy it is to distinguish pain from emotion, or language, or verbal processing, or lots of other tasks or states. And in some of the analyses we did, it looks like it’s easier to distinguish pain from other terms—easier than almost anything else. One way to interpret that is that pain seems to have a fairly distinctive neural signature. That’s interesting because people have sometimes suggested that pain really is just a particular kind of emotional state. While that may be true to some extent, it’s interesting that the neural signature seems more distinctive than that of almost any other state we’ve looked at. We don’t yet know why that is.
Another thing we can do is use this approach to classify images from individual subjects. So if you hand us a brain image and say, “We don’t know what this person was doing, or what state he or she was in,” we can try to determine which of two or more states is most likely—for instance, is it emotion, or pain? We can do this on a very limited scale, so it’s not like we’re reading people’s minds.
One thing that’s interesting is that, when we try to classify people who were in pain experiments, sometimes the classifier gets it right, and sometimes it gets it wrong. And when you look at the subjects that get classified wrong, there’s some evidence that those subjects’ neural signature actually looks more like emotion than pain. That’s interesting because we know that pain is a complex state, and part of what goes into it is your emotional and cognitive state—do you feel anxious, how much pain do you expect to receive, and so on? So, to speculate a bit, one potential application of this approach is to try to identify different mechanisms that contribute to pain—for instance, is someone reporting being in more pain because of anxiety, as opposed to because of an increase in some sensory signal?
PRF: How do you envision that these meta-analysis tools might be useful to pain researchers?
Right now, I don’t think anyone’s going to look at the activation maps we’ve generated for pain-related terms and say, “Oh, nobody’s identified these brain regions before!” We’re really just confirming what people have previously shown. But what would be nice is if, as our database grows, we can start to do more specific analyses that tease apart different forms of pain—for instance, electrical shock, thermal pain, mechanical pain, chronic pain conditions, and so on—things for which people may not have published meta-analyses yet.
Down the line, we’re hopeful that this type of approach could actually have some diagnostic value—meaning researchers or clinicians will be able to take activation maps from individual subjects or patients and determine probabilistically how much pain those people are in, or what kind of pain is involved. But that’s probably a long way off.
In the nearer term, there are plenty of more minor but still useful applications. For example, one issue that comes up a lot in imaging studies is: Where do you look in the brain? You could look everywhere, but for statistical reasons, there are penalties to be paid for that. So people really want to be able to figure out ahead of time what are the best regions to look at to test a hypothesis. So you can take our map and say, I’m going to take the regions that show the strongest association with pain, and those are ones I’m going to focus on. It’s not so much that we didn’t know we should look at those regions before, but it provides a way of very concretely defining the boundaries of those regions, based on this consensus estimate.
PRF: One technical question. When the software finds activation coordinates in a paper about pain, it doesn’t know that it’s actually looking at a table of activations in a pain state compared to a non-pain state, right? It might be some other kind of experiment altogether?
I would say that’s the biggest limitation to this right now. All we can say is that we have these coordinates, and they came from a study that used the term “pain” with high frequency. When we first started this, I wasn’t actually sure this would work at all. In practice, it does quite well, at least for very broad domains, like “pain.” But it definitely becomes much more of an issue if you want to know what, say, thermal pain looks like in the brain versus mechanical pain. We’re working on that, but as of now we don’t have any way to discriminate exactly what conditions the activations reflect.
PRF: Are there any improvements to the tools that people can expect to see coming along soon?
We have a bunch of things in the pipeline. We’d like to make the tools more accessible on the Web. Right now you can visualize the images on the Web, and you can download them, but you can’t generate new analyses that aren’t already on there. We’d like to move to a model where you can actually do that dynamically. I think that would be quite useful to people.
We also want to give people the ability to upload their own fMRI images, so they can use our tools to “decode” their images. So they’ll be told that, based on the literature, there’s a higher probability that the pattern of activation [in their image] implies pain than emotion, for instance. So that’s one line of future work: making the tools more accessible.
Another one is improving our analysis tools. We’re working on more sophisticated meta-analysis approaches that we’re hoping will give substantially more accurate results. That will be helpful because then we can really start pushing towards being able to decode individual images, and maybe one day even use this type of approach for purposes of clinical diagnosis.
PRF: Have you gotten any interesting feedback or new ideas coming in?
Yes, lots of people have suggested great ideas, and I maintain a long to-do list. One idea that a lot of people suggest is to look at trends over time. One issue in the imaging literature, like any other science, is that there are fads, so we know the literature and results people report are colored by their biases. So, for instance, everyone thinks that the amygdala is involved in emotion. And so, clearly, if you do a study of emotion, that’s the first place you’re going to look, and you might not look at other places that we don’t tend to think of as being involved in emotion. So there are these biases, and they come and go with time. One thing that I’m planning to do is to break it down by era, so to ask, for example, are the neural correlates of pain different in 2010 from 2000? Of course, we don’t think people process pain differently now. So if there are any differences, that would suggest it has to be something about the way researchers are reporting research.
PRF: Anything else that you think is important?
I just would include the qualifier that our ability to “decode” images right now is very, very limited. We can’t use these tools for any sort of clinical purpose at the moment. I don’t want anyone to think we can read minds with these tools, because we can’t. If you say you’re in a lot of pain, and our classifier says that based on your brain activity you’re not, I would trust your word over ours!
PRF: Thank you so much for discussing the new tools.