Why are these publications missing? Uncovering the reasons behind the exclusion of documents in free-access scholarly databases

Authors : Lorena Delgado-QuirósIsidro F. AguilloAlberto Martín-MartínEmilio Delgado López-CózarEnrique Orduña-MaleaJosé Luis Ortega

This study analyses the coverage of seven free-access bibliographic databases (Crossref, Dimensions—non-subscription version, Google Scholar, Lens, Microsoft Academic, Scilit, and Semantic Scholar) to identify the potential reasons that might cause the exclusion of scholarly documents and how they could influence coverage.

To do this, 116 k randomly selected bibliographic records from Crossref were used as a baseline. API endpoints and web scraping were used to query each database. The results show that coverage differences are mainly caused by the way each service builds their databases.

While classic bibliographic databases ingest almost the exact same content from Crossref (Lens and Scilit miss 0.1% and 0.2% of the records, respectively), academic search engines present lower coverage (Google Scholar does not find: 9.8%, Semantic Scholar: 10%, and Microsoft Academic: 12%). Coverage differences are mainly attributed to external factors, such as web accessibility and robot exclusion policies (39.2%–46%), and internal requirements that exclude secondary content (6.5%–11.6%).

In the case of Dimensions, the only classic bibliographic database with the lowest coverage (7.6%), internal selection criteria such as the indexation of full books instead of book chapters (65%) and the exclusion of secondary content (15%) are the main motives of missing publications.

URL : Why are these publications missing? Uncovering the reasons behind the exclusion of documents in free-access scholarly databases

DOI : https://doi.org/10.1002/asi.24839

Large-scale comparison of bibliographic data sources: Scopus, Web of Science, Dimensions, Crossref, and Microsoft Academic

Authors : Martijn Visser, Nees Jan van Eck, Ludo Waltman

We present a large-scale comparison of five multidisciplinary bibliographic data sources: Scopus, Web of Science, Dimensions, Crossref, and Microsoft Academic. The comparison considers scientific documents from the period 2008-2017 covered by these data sources. Scopus is compared in a pairwise manner with each of the other data sources.

We first analyze differences between the data sources in the coverage of documents, focusing for instance on differences over time, differences per document type, and differences per discipline.

We then study differences in the completeness and accuracy of citation links. Based on our analysis, we discuss strengths and weaknesses of the different data sources. We emphasize the importance of combining a comprehensive coverage of the scientific literature with a flexible set of filters for making selections of the literature.

URL : https://arxiv.org/abs/2005.10732