Rethinking peer review

Today, eLife announced a radical change: it will not reject papers after review anymore see the Editorial (Eisen et al., 2022). Rather, the reviewed paper will be posted together with the reviews as an eLife reviewed preprint. A short evaluation summary described in this Blog will be added to the paper above the abstract. The authors then can respond to the reviews, post a revision, and have the paper re-evaluated. Once they are happy with the result, they can finalize the paper as the version of record to be listed on Pubmed.

When I talk with colleagues about this, I get a wide range of reactions. Some predict that this will be the end of eLife. Because eLife publishes anything, publishing in eLife will loose any value.

Others say that the change does not go far enough. Maybe, papers should only ever be posted on preprint servers (BioRxiv) and we should leave it to the community of readers to evaluate papers after publication. I think both sides have good points. But here are the reasons why I think that the new system is the right step forward.

What peer review should be

Peer review has two important functions. First, peer-review should be a service to the authors to help them to improve their paper. When I get reviews, I want to know which parts of the paper the reviewers liked and found important. I want to know which parts of the paper are unclear. I want to know which aspects of the analysis or interpretation the reviewers are concerned about, so I can think about the reasons and discuss them. And then I want to take all this feedback to make the paper better. Sometimes I even want to write a public response to the reviewers to be able to provide more details without disrupting the flow of the paper. After all, my reviewers are my ideal readers - they are usually well-informed and will take more time for understanding the paper than almost any other reader. So trying to convince them is the best way to ensure that I can convince anybody else.

Secondly, peer-review should help readers evaluate papers more efficiently. Sure - after I have read a paper in your own field, I can form my own opinion. But when I read even slightly outside of my field, I often do not have enough knowledge to fully evaluate the data and methods. Knowing that the paper has been peer reviewed certainly helps. It would be even better to have more detailed information about which claims are - in the eyes of the reviewers - well supported and which ones are not. This is especially important for the consumers of science who often need to rely on expert judgement to separate well supported claims from fake news.

Why peer review is broken

Currently, peer review does not serve either of these functions well. The entire process culminates in a reject or accept decision. Thus, the journal name becomes the only bit of evaluation available to the reader. It serves - for the better or the worse - as a proxy for quality: we expect that a Nature Neuroscience paper is better than a PLoS One paper. To protect their brand, journals strive to have high rejection rates. The editorial triage stage in some journals is extremely selective and solid work is being turned away because it is not "novel or interesting" enough. This in turn leads authors to overstate the importance of their work and to hide weaknesses.

In this climate, peer have reviews have become unnecessarily harsh. Rather than trying to improve a paper, reviewers often feel they need to point out the shortcomings to signal that the paper should not be accepted. And after rejection the papers need to be re-reviewed in a different journal, wasting more time from editors, reviewers and authors.

Why evaluate at all?

So, if peer-review is broken, why not simply get rid of it? Wouldn't it be much better if we would actually read the papers that we wanted to evaluate? But if you ever had to rank 200 CVs for a job search committee, you know that you start looking for shortcuts - latest at 3am in the morning. The number of papers - and where those papers are published - becomes an important indicator of quality. These superficial indicators assume higher weight the less information we have about the papers themselves. So, if we like it or not, we often rely on quick heuristics to judge scientific quality A case of bounded rationality: We need to make decisions with limited and resources for information gathering (Simon, 1972).. If all papers are only posted on preprint servers, then journal names are removed as a quick-and-dirty quality indicator. Without providing an alternative, I fear we would start relying on the name of the institution, the number of social media engagements, or other metrics that are even more fraught with social bias.

eLife new evaluation summary

So what would a better system look like? We need something that carries more than one bit of information, but that still can be used as a quick heuristic. A 2-3 sentence evaluation summary seems like the right level of detail. To make these evaluations informative and comparable, eLife now asks editors and reviewers to evaluate the paper on two dimensions: the Significance of the findings and the Strength of Support. While other dimensions are possible, these seem to be the main two factors that currently inform accept / reject decisions. These two dimensions should be judged independently. Technically impressive work that is mostly interesting to a restricted field may be judged as a Valuable / Exceptional paper. In contrast, a paper with a novel and potentially revolutionary idea - but with only preliminary support for the central claims - could be judged to be Fundamental / Incomplete. Clearly, both papers deserve to be published and read.

Table1: Two dimensions of evaluation. In our original proposal, Mihaela Iordanova and I used letters (A-F) and numbers (1-6) for the evaluation categories. They do allow for a very succinct summary - we can talk about a C3 or B5 paper. But most people were - justifiably - too much reminded of primary school grades. By embedding the terms into an evaluation summary, a much more nuanced view can be expressed. Which claims are incompletely supported? Why exactly is the paper valuable? The evaluation summary forces the reader to take in a few more bits of information than can be conveyed by a 2-dimensional grade.

Significance of Findings Strength of Support
A. Landmark: Findings with profound implications that are expected to have widespread influence 1. Exceptional: Exemplary use of existing approach that establish new standards for the field
B. Fundamental: Findings that substantially advance our understanding of major research questions 2. Compelling: Evidence that features methods, data and analyses more rigours than the current state of the art
C. Important: Findings that have theoretical or practical implications beyond a single subfield 3. Convincing: Appropriate and validated methodology in line with the current state of the art
D. Valuable: The insights have theoretical or practical implications for a subfield 4. Solid: Methods, data and analyses broadly support the claims with only minor weaknesses
E. Useful: Findings that have focused importance and scope 5. Incomplete: Main claims are only partially supported.
F. If none of above words is used, the significance of the findings has not been made sufficiently clear 6. Inadequate:Methods, data, and analyses do not support the primary claims.

A number of people have argued that peer-review should only evaluate how well the claims are supported by evidence, not how novel, interesting or important the claims are. I do agree that we are not very good in predicting the impact of a paper - that ultimate judgement is best left to post-publication review. Nonetheless, which paper is published in which journal is currently determined mostly by evaluations on the Significance of Findings dimension. By providing a 2-dimensional scale, this dimension of perceived significance can be clearly separated from the dimension of technical quality. Having both axes allows the reader to flexibly focus on one or the other.

Balancing the power between editors and authors

If we want to design a publication system that promotes good science, we need to strike the right power balance between editors and reviewers on one side and authors on the other. In some journals, authors are being held hostage to the whim of individual reviewers that demand meaningless additional control experiments, or that are trying to remove claims from the paper they find offending. The post-review discussion in eLife, where the editor and reviewers comment on each others review, is one great tool to avoid such excesses Post-review discussion will remain an integral part of the process, now with the additional function to agree on an evaluation summary..

The new system goes one step further, and lets the authors decide when a paper has reached its final stage. This certainly allows authors to publish the paper they want to see on their record, not the paper that the editor or the reviewers want to see.

With this much power, will authors just ignore the reviews? And will reviewers stop reviewing, as they are being ignored anyway? Because eLife is not rejecting papers anymore, I can certainly see the danger of this happening. Some authors may start to publish as many papers as possible - regardless of quality.

As one tool against this, eLife will still triage papers before review. Currently, editors discuss submitted papers and only select paper for review that have a chance of being ultimately accepted. By rejecting a lot of papers before review, eLife protects the brand of the journal name. With the new policy, we do not need to do this anymore - now the limiting factor is capacity. Can we find an editor who is willing to review the paper and who is confident that they can find 2 more reviewers to do the same? With this, the triage stage will hopefully become less selective. Ultimately, however, reviewer's and editor's time is a scarce resources and needs to be allocated to the papers that are most likely worth the attention.

Once a paper is sent out for review, the evaluation summary and the published reviews are the only way for editors and reviewers to ensure that the paper is as clear as it can be. For this, the evaluation summaries must be incisive and highly visible to the audience. It is the only way by which misleading or unsupported claims can be flagged, and exceptional paper can be highlighted. For this reason, it will be printed at the top of the eLife reviewed preprint or version of record.

One part of me believes that most people in science are driven by good intentions: Authors want to publish useful and clear papers, and reviewers want to provide constructive feedback. But given that especially early-career researchers often work under immense pressure and that a single high-profile publication can make or break careers, I also know that any system will be gamed. Our job as a community is to set the individual game-theoretical payoffs in a way, such that the entire system rewards better, fairer, and more transparent science. I really hope that eLife's new way of evaluating papers will contribute to this.

References

Eisen et al. (2022). Peer Review without gatekeeping. Elife.

Simon (1972). Theories of bounded rationality. Decision and organization.