Monday, January 21, 2019

Please stop using this graph to argue your topic is popular

It seems to have become quite common to use this sort of "number of publications" graph to argue for the importance of one's own research area:


The graph shows that the number of papers including a particular search term in their title, abstract, or keywords, has risen dramatically over the last few decades. In the example above, the search term in this case is "meditation OR mindfulness", following an analysis reported by Van Dam et al. (2017).
These were just some data I had easily to hand - there's no intention to imply here that mindfulness research is particularly prone to this kind of analysis.

One problem with this kind of analysis is that the number of scientific publications per year is also increasing for most disciplines. It's fairly easy to add this information to such a graph. For example, let's plot the number of papers including the word "psychology" in their title, abstract, or keywords, on the same axes:

This puts a slightly different perspective on things, and provides little support for Van Dam's claim that: "Over the past two decades, writings on mindfulness and meditation practices have saturated the public news media and scientific literature". The literature seems very far from saturated on this particular topic (depending on what 'saturated' is taken to mean - clearly not saturated in the same way that e.g. the US market for refrigerators is saturated).

If we express the number of "mindfulness OR meditation" papers as a percentage of the number of "psychology" papers, we get the following:

This graph gives us a different perspective to the one offered by Van Dam. The 'saturation' of mindfulness research in psychology was around a fairly stable low level of 1-2% from 1975 to 2000. It rose to a peak of about 6%  in 2012, but has been declining since.

Of course, there would be other, probably better ways, of calculating the 'market share' of a concept in the scientific literature than the method used here. The main point here is simply to demonstrate that raw counts are a very poor measure.