In this, the 4th post in this series (the others on video abstracts, object oriented paper writing and freelance postdocs are here: 1,2,3), I would like to chat about a tough but important problem and present some proposals to address it, which vary from conservative to bordering on the extreme. Crazy ideas can be stimulating and fun, and I hope the proposals achieve at least one of these. They might even turn out to be useful. One can hope.
The question we address here is:
How should one rate a scientist’s research impact/quality? (and by extension that of a single paper or article).
As cans of worms go, this is a big one, and no blog post can do the subject justice. There are numerous proposals already out there but all of them have failings in one way or another. Indeed this is an optimisation problem: find the best way to rank scientists. But the no free lunch theorem of optimisation theory already suggests to us that we are doomed to failure given that the definition of “best” is highly subjective. We will not succeed; at best we can find partial fixes.
But this is an extremely important question with a very real impact on the future of science. Faculty positions and funding are more and more being decided by bibliometric indicators and since supervisors tend, I suspect, to spawn students in their own image, this will have implications for the long-term health of many academic fields.
With this brief background, I offer four proposals that address four different aspects of this question:
(1) How do we reward really influential papers?
Instead of awarding a paper a single citation if it is cited, as is currently done, count the multiplicity of citations within a paper. If your article is cited 10 times in paper X, it is likely to have been more important to X’s genesis than another article cited only once (probably in the introduction along with many others only cited once). Presumably there is an 80-20-like rule for the multiplicity of citations suggesting that multiple citations within a single paper should be weighted highly (in fact, I would suspect that only a couple of papers are cited more than once on average in a given paper).
I think this would provide a good system for estimating the true value of a paper. It would favour large papers which made substantial advances in our understanding of a field rather than light-weight papers which get large numbers of citations simply by being oft cited along with others in historial introductions. You know, the introductory sentences of the sort that go:
Previous work on brane worlds/dark energy/large extra dimension/dark matter annihilation includes [1-25].
It has long been argued that for several reasons authors tend to simply cut and paste from the bibliographies of their past papers when they are writing new articles. Getting large numbers of citations then requires only that you make it into many peoples’ bibliographies once. The “persistence of citation memory” will do the rest for you. Counting the multiplicity of citations will strongly disfavour these less important citations (I am tempted to call them unphysical/gauge citations). The fairly minor downside is that it would require more time to compute and would require trawling through the files of the papers to extract the multiplicities.
(2) How do we measure influence?
Use something like the Google pagerank model*: rank all scientists in a field by some statistic, e.g. citation counts. Then go back to every citation and reweight it by the rankings of the authors doing the citing. Now you have ensured that getting cited by Ed Witten (or your favourite top-cited scientist) counts a lot more than being cited by the person ranked 1000th on the list. Once you have done this for all citations, compute the new rankings including all these new weightings and then iterate until convergence and Nirvana are achieved.
Although this would be time-consuming the first time round, it would be relatively easy to update thereafter. In fact, something similar is used in cricket to rank players, by folding in the skill of the players they were competing against.
One major criticism of this is the possibility that it will be a “rich get richer while the poor get poorer” scheme. This statistic would, I think, tend to reward centres of power since scientists do tend to cite their collaborators, friends and acquaintances more than strangers, if for no other reason than they are more likely to be aware of the work of people that have coffee with. On the other hand, it certainly does seem reasonable that a citation from a top expert in your field should count more than the citation of the local village idiot.
(3) How do we compare people in different sub-fields?
One of the key problems in this subject is how to compare apples and oranges. Someone in pure mathematics can legitimately feel aggrieved at listening to cosmologists talk about their multiple top-250 citation papers. There simply are fewer citations floating around in the world of pure maths. So how do we normalise different sub-fields to allow for meaningful comparison between e.g. a string theorist and a person working on number theory?
Sub-fields are notoriously difficult to define, so lets avoid the thorns altogether. Instead, let’s use the bibliography of a paper to define the background against which the paper will be compared. So for example, imagine paper X cites 15 papers (call them y1,y2…y15) in its bibliography. This set defines a natural reference framework. Now compute the mean or median number of citations to the y1-y15 articles and compare this to the number of citations that X has received. The ratio of these two provides a fairly natural (I think) normalisation of the value of paper X.
What I like about this is that it automatically normalises papers to the sub-field as defined by the authors themselves by their choice of references and influences. This statistic tries to capture what fraction of relevant, available citations went to this particular paper. If a sub-field tends to cite very few articles on average, that is fine, it is already accounted for.
Can this statistic be gamed? Yes: simply cite papers that have low numbers of citations and avoid papers with large numbers of citations. But by doing so one will pump up the citations of these unknown papers making them less useful next time. Inappropriate citations can also be easily spotted by fellow scientists in the field, risking the loss of reputation, so it may not be too much or a problem and would certainly help in selection committees where one is comparing candidates coming from very different cultures.
(4) How do we incorporate the sentiment of the entire community into a ranking?
Here is a much, much more radical proposal: the stockmarket model. We are trying to evaluate the worth of a paper or person. One way is to define deterministic statistics, such as citation number or h-index, that can be independently measured. The great problem with these is that since the agents being measured are intelligent, they can adapt their behaviour to game the system.
The only way to get around this is to have a system that is too complex to game or one which uses the intelligence of the other agents in the system to counteract this. This is, of course, exactly what peer-review does or is meant to do. Except that peer-review is severely limited by shot noise and personal politics of friends and enemies.
The radical proposal here is to construct a stock market for academics. Each scientist would have an associated share “price” and, upon entering the market as a trader, be given a fixed amount of “money” with which to construct a portfolio by buying the shares of other scientists. Using the standard stock market rules, supply and demand would modulate the prices of shares of scientists, folding in the evaluation of the community of scientists. In a sense it is a nonperturbative resummation.
What I like about this is that it immediately makes clear the (perhaps) not-well-known truth that committees often make hiring decisions based, not on what the candidates have achieved, but rather on the perception of what they will achieve in the future. In stock terms, one does not buy the most expensive shares one buys the shares one thinks are going to increase in value in the future. Buy low, sell high. (Except that of course often one cannot sell one’s colleagues, no matter how much one would like to at times!)
I also like the fact that everyone would have two numbers associated with them: their stock price, reflecting the community opinion of their work, and their portfolio value. People who are great at picking winners would be recognised and valued for their insight into people, which might have a real impact on selection committees.
On the down side, part of me hates the whole idea with a passion. The recent recession has not exactly endeared the stock market and traders, with their rather sordid excesses and speculation, to the general public, and the idea of being publicly traded on a stock market is demeaning. There are also serious and potentially insurmountable questions around insider trading, trading circles, manipulating the system, speculation and so on as well as the possibility of “Lord of the Flies” style ganging up on unfortunate victims who are seen as unfashionable. It is also not clear at all how one would implement this system in practise and how to get the scientists to play ball.
I should end by saying that even this most radical proposal is not completely new. Stock markets for ideas have been around for a while, with one example being the trading of shares around the question of when the Higgs will be discovered.
Thanks for reading and, as usual, I am interested in your thoughts…
– Bruce (@cosmo_bruce)
* I came up with this before I heard of the pagerank algorithm, honest.