Author : Ferdinando Patat
Peer review is the most common mechanism in place for assessing requests for resources in a large variety of scientific disciplines. One of the strongest criticisms to this paradigm is the limited reproducibility of the process, especially at largely oversubscribed facilities. In this and in a subsequent paper we address this specific aspect in a quantitative way, through a statistical study on proposal ranking at the European Southern Observatory.
For this purpose we analysed a sample of about 15000 proposals, submitted by more than 3000 Principal Investigators over 8 years. The proposals were reviewed by more than 500 referees, who assigned over 140000 grades in about 200 panel sessions.
After providing a detailed analysis of the statistical properties of the sample, the paper presents an heuristic model based on these findings, which is then used to provide quantitative estimates of the reproducibility of the pre-meeting process.
On average, about one third of the proposals ranked in the top quartile by one referee are ranked in the same quartile by any other referee of the panel. A similar value is observed for the bottom quartile.
In the central quartiles, the agreement fractions are very marginally above the value expected for a fully aleatory process (25%). The agreement fraction between two panels composed by 6 referees is 55+/-5% (50% confidence level) for the top and bottom quartiles.
The corresponding fraction for the central quartiles is 33+/-5%. The model predictions are confirmed by the results obtained from boot-strapping the data for sub-panels composed by 3 referees, and fully consistent with the NIPS experiment. The post-meeting phase will be presented and discussed in a forthcoming paper.