istelmak_phd_mld_2022.pdf (2.96 MB)
Download file

Making Scientific Peer Review Scientific

Download (2.96 MB)
posted on 18.11.2022, 18:39 authored by Ivan StelmakhIvan Stelmakh

Nowadays many important applications such as hiring, university admissions, and scientific peer review rely on the collective efforts of a large number of individuals. These applications often operate at an extremely large scale which creates both opportunities and challenges. On the opportunity side, the large amount of data generated in these applications enables a novel data science perspective on the classical problem of decision-making. On the challenge side, in many of these applications, human decision-makers need to interact with various interfaces and algorithms, and follow various policies. When not carefully designed, such interfaces, algorithms, and policies may lead to unintended consequences. Identifying and overcoming such unintended consequences is an important research problem. In this thesis, we explore these opportunities and tackle these challenges with a general goal of understanding and improving distributed human decision-making in a principled manner.

One application where the need for improvement is especially strong is scientific peer review. On the one hand, peer review is the backbone of academia, and scientific community agrees on the importance of improvement of the system. On the other hand, peer review is a microcosm of distributed decision-making that features a complex interplay between noise, bias, and incentives. Thus, insights learned from this specific domain apply to many other areas where similar problems arise. All in all, in this thesis, we aim at developing a principled approach towards scientific peer review—an important prerequisite for fair, equitable, and efficient progression of science.

The three broad challenges that arise in peer review are noise, bias, and incentives. In this thesis, we work on each of these challenges:

  • Noise and reviewer assignment. A suitable choice of reviewers is a cornerstone of peer review: poor assignment of reviewers to submissions may result in a large amount of noise in decisions. Nowadays, the scale of many publication venues makes it infeasible to manually assign reviewers to submissions. Thus, stakeholders rely on algorithmic support to automate this task. Our work demonstrates that when such algorithmic support is not designed with application-specific constraints in mind, it can result in unintended consequences, compromising fairness and accuracy of the process. More importantly, we make progress in developing better algorithms by (i) designing an assignment algorithm with strong theoretical guarantees and reliable practical performance, and (ii) collecting a dataset that enables other researchers to develop better algorithms for estimating expertise of reviewers in reviewing submissions.
  • Bias and policies. Human decision-making is susceptible to various biases, including identity-related biases (e.g., race and gender) and policy-related biases (e.g., primacy effect). To counteract these biases in peer review, it is crucial to design peer-review policies in an evidence-based manner. With this motivation, we conduct a series of real-world experiments to collect evidence that informs stakeholders in their policy decisions. Our work reveals that while some of the commonplace biases (e.g., herding) are not present in peer review, there are other application-specific biases (e.g., resubmission bias) that significantly impact decisions. Additionally, we demonstrate that reliable testing for biases in peer review often requires novel statistical tools as off-the-shelf techniques may result in false conclusions.
  • Incentives and reviewing. Honesty is a core value of science and peer review is built on the assumption of honesty of everyone involved in the process. However, fierce competition in the academic job market and the large power a single reviewer has over an outcome of a submission create incentives for reviewers to consciously or subconsciously deviate from honest behavior. Our work offers (i) tools to test for such deviations, (ii) empirical evidence of the presence of wrong incentives, and (iii) potential solutions on how to incentivize reviewers to put more effort in writing high-quality reviews.




Degree Type



Machine Learning

Degree Name

  • Doctor of Philosophy (PhD)


Nihar B. Shah, Aarti Singh