Authors : Jeremy Leipzig, Daniel Nüst, Charles Tapley Hoyt, Stian Soiland-Reyes, Karthik Ram, Jane Greenberg
Reproducible computational research (RCR) is the keystone of the scientific method for in silico analyses, packaging the transformation of raw data to published results.
In addition to its role in research integrity, RCR has the capacity to significantly accelerate evaluation and reuse. This potential and wide-support for the FAIR principles have motivated interest in metadata standards supporting RCR.
Metadata provides context and provenance to raw data and methods and is essential to both discovery and validation. Despite this shared connection with scientific data, few studies have explicitly described the relationship between metadata and RCR.
This article employs a functional content analysis to identify metadata standards that support RCR functions across an analytic stack consisting of input data, tools, notebooks, pipelines, and publications.
Our article provides background context, explores gaps, and discovers component trends of embeddedness and methodology weight from which we derive recommendations for future work.
URL : The role of metadata in reproducible computational research
Original location : https://arxiv.org/abs/2006.08589
Authors : Markus Konkol, Daniel Nüst, Laura Goulier
Funding agencies increasingly ask applicants to include data and software management plans into proposals. In addition, the author guidelines of scientific journals and conferences more often include a statement on data availability, and some reviewers reject unreproducible submissions.
This trend towards open science increases the pressure on authors to provide access to the source code and data underlying the computational results in their scientific papers.
Still, publishing reproducible articles is a demanding task and not achieved simply by providing access to code scripts and data files. Consequently, several projects develop solutions to support the publication of executable analyses alongside articles considering the needs of the aforementioned stakeholders.
The key contribution of this paper is a review of applications addressing the issue of publishing executable computational research results. We compare the approaches across properties relevant for the involved stakeholders, e.g., provided features and deployment options, and also critically discuss trends and limitations.
The review can support publishers to decide which system to integrate into their submission process, editors to recommend tools for researchers, and authors of scientific papers to adhere to reproducibility principles.
URL : https://arxiv.org/abs/2001.00484
Authors : Roman Vainshtein, Gilad Katz, Bracha Shapira, Lior Rokach
A multitude of factors are responsible for the overall quality of scientific papers, including readability, linguistic quality, fluency,semantic complexity, and of course domain-specific technical factors.
These factors vary from one field of study to another. In this paper, we propose a measure and method for assessing the overall quality of the scientific papers in a particular field of study.
We evaluate our method in the computer science domain, but it can be applied to other technical and scientific fields.Our method is based on the corpus linguistics technique. This technique enables the extraction of required information and knowledge associated with a specific domain.
For this purpose, we have created a large corpus, consisting of papers from very high impact conferences. First, we analyze this corpus in order to extract rich domain-specific terminology and knowledge.
Then we use the acquired knowledge to estimate the quality of scientific papers by applying our proposed measure. We examine our measure on high and low scientific impact test corpora.
Our results show a significant difference in the measure scores of the high and low impact test corpora. Second, we develop a classifier based on our proposed measure and compare it to the baseline classifier.
Our results show that the classifier based on our measure over-performed the baseline classifier. Based on the presented results the proposed measure and the technique can be used for automated assessment of scientific papers.
URL : https://arxiv.org/abs/1908.04200
Authors : Simona Ibba, Filippo Eros Pani, John Gregory Stockton, Giulio Barabino, Michele Marchesi, Danilo Tigano
One of the main tasks of a researcher is to properly communicate the results he obtained. The choice of the journal in which to publish the work is therefore very important. However, not all journals have suitable characteristics for a correct dissemination of scientific knowledge.
Some publishers turn out to be unreliable and, against a payment, they publish whatever researchers propose. The authors call “predatory journals” these untrustworthy journals.
The purpose of this paper is to analyse the incidence of predatory journals in computer science literature and present a tool that was developed for this purpose.
The authors focused their attention on editors, universities and publishers that are involved in this kind of publishing process. The starting point of their research is the list of scholarly open-access publishers and open-access stand-alone journals created by Jeffrey Beall.
Specifically, they analysed the presence of predatory journals in the search results obtained from Google Scholar in the engineering and computer science fields. They also studied the change over time of such incidence in the articles published between 2011 and 2015.
The analysis shows that the phenomenon of predatory journals somehow decreased in 2015, probably due to a greater awareness of the risks related to the reputation of the authors.
We focused on computer science field, using a specific sample of queries. We developed a software to automatically make queries to the search engine, and to detect predatory journals, using Beall’s list.
URL : Incidence of predatory journals in computer science literature
DOI : https://doi.org/10.1108/LR-12-2016-0108
Authors : Marco Schmitt, Robert Jäschke
Twitter communication has permeated every sphere of society. To highlight and share small pieces of information with possibly vast audiences or small circles of the interested has some value in almost any aspect of social life.
But what is the value exactly for a scientific field? We perform a comprehensive study of computer scientists using Twitter and their tweeting behavior concerning the sharing of web links.
Discerning the domains, hosts and individual web pages being tweeted and the differences between computer scientists and a Twitter sample enables us to look in depth at the Twitter-based information sharing practices of a scientific community.
Additionally, we aim at providing a deeper understanding of the role and impact of altmetrics in computer science and give a glance at the publications mentioned on Twitter that are most relevant for the computer science community.
Our results show a link sharing culture that concentrates more heavily on public and professional quality information than the Twitter sample does. The results also show a broad variety in linked sources and especially in linked publications with some publications clearly related to community-specific interests of computer scientists, while others with a strong relation to attention mechanisms in social media.
This refers to the observation that Twitter is a hybrid form of social media between an information service and a social network service.
Overall the computer scientists’ style of usage seems to be more on the information-oriented side and to some degree also on professional usage. Therefore, altmetrics are of considerable use in analyzing computer science.
URL : What do computer scientists tweet? Analyzing the link-sharing practice on Twitter
DOI : https://doi.org/10.1371/journal.pone.0179630
The case for open computer programs :
“Scientific communication relies on evidence that cannot be entirely included in publications, but the rise of computational science has added a new layer of inaccessibility. Although it is now accepted that data should be made available on request, the current regulations regarding the availability of software are inconsistent. We argue that, with some exceptions, anything less than the release of source programs is intolerable for results that depend on computation. The vagaries of hardware, software and natural language will always ensure that exact reproducibility remains uncertain, but withholding code increases the chances that efforts to reproduce results will fail.”
URL : http://www.nature.com/nature/journal/v482/n7386/full/nature10836.html