FAIRness of Research Data in the European Humanities Landscape

Authors : Ljiljana Poljak Bilić, Kristina Posavec

This paper explores the landscape of research data in the humanities in the European context, delving into their diversity and the challenges of defining and sharing them. It investigates three aspects: the types of data in the humanities, their representation in repositories, and their alignment with the FAIR principles (Findable, Accessible, Interoperable, Reusable).

By reviewing datasets in repositories, this research determines the dominant data types, their openness, licensing, and compliance with the FAIR principles. This research provides important insight into the heterogeneous nature of humanities data, their representation in the repository, and their alignment with FAIR principles, highlighting the need for improved accessibility and reusability to improve the overall quality and utility of humanities research data.

URL : FAIRness of Research Data in the European Humanities Landscape

DOI : https://doi.org/10.3390/publications12010006

Analysis on open data as a foundation for data-driven research

Authors : Honami Numajiri, Takayuki Hayashi

Open Data, one of the key elements of Open Science, serves as a foundation for “data-driven research” and has been promoted in many countries. However, the current status of the use of publicly available data consisting of Open Data in new research styles and the impact of such use remains unclear.

Following a comparative analysis in terms of the coverage with the OpenAIRE Graph, we analyzed the Data Citation Index, a comprehensive collection of research datasets and repositories with information of citation from articles. The results reveal that different countries and disciplines tend to show different trends in Open Data.

In recent years, the number of data sets in repositories where researchers publish their data, regardless of the discipline, has increased dramatically, and researchers are publishing more data. Furthermore, there are some disciplines where data citation rates are not high, but the databases used are diverse.

URL : Analysis on open data as a foundation for data-driven research

DOI : https://doi.org/10.1007/s11192-024-04956-x

Handling Open Research Data within the Max Planck Society — Looking Closer at the Year 2020

Authors : Martin Boosen, Michael Franke, Yves Vincent Grossmann, Sy Dat Ho, Larissa Leiminger, Jan Matthiesen

This paper analyses the practice of publishing research data within the Max Planck Society in the year 2020. The central finding of the study is that up to 40\% of the empirical text publications had research data available. The aggregation of the available data is predominantly analysed.

There are differences between the sections of the Max Planck Society but they are not as great as one might expect. In the case of the journals, it is also apparent that a data policy can increase the availability of data related to textual publications.

Finally, we found that the statement on data availability “upon (reasonable) request” does not work.

URL : Handling Open Research Data within the Max Planck Society — Looking Closer at the Year 2020

Arxiv : https://arxiv.org/abs/2402.18182

From Data Creator to Data Reuser: Distance Matters

Authors : Christine L. Borgman, Paul T. Groth

Sharing research data is complex, labor-intensive, expensive, and requires infrastructure investments by multiple stakeholders. Open science policies focus on data release rather than on data reuse, yet reuse is also difficult, expensive, and may never occur. Investments in data management could be made more wisely by considering who might reuse data, how, why, for what purposes, and when.

Data creators cannot anticipate all possible reuses or reusers; our goal is to identify factors that may aid stakeholders in deciding how to invest in research data, how to identify potential reuses and reusers, and how to improve data exchange processes.

Drawing upon empirical studies of data sharing and reuse, we develop the theoretical construct of distance between data creator and data reuser, identifying six distance dimensions that influence the ability to transfer knowledge effectively: domain, methods, collaboration, curation, purposes, and time and temporality.

These dimensions are primarily social in character, with associated technical aspects that can decrease – or increase – distances between creators and reusers. We identify the order of expected influence on data reuse and ways in which the six dimensions are interdependent.

Our theoretical framing of the distance between data creators and prospective reusers leads to recommendations to four categories of stakeholders on how to make data sharing and reuse more effective: data creators, data reusers, data archivists, and funding agencies.

URL : From Data Creator to Data Reuser: Distance Matters

arXiv : https://arxiv.org/abs/2402.07926

Does the Use of Unusual Combinations of Datasets Contribute to Greater Scientific Impact?

Authors : Yulin Yu, Daniel M. Romero

Scientific datasets play a crucial role in contemporary data-driven research, as they allow for the progress of science by facilitating the discovery of new patterns and phenomena. This mounting demand for empirical research raises important questions on how strategic data utilization in research projects can stimulate scientific advancement.

In this study, we examine the hypothesis inspired by the recombination theory, which suggests that innovative combinations of existing knowledge, including the use of unusual combinations of datasets, can lead to high-impact discoveries. We investigate the scientific outcomes of such atypical data combinations in more than 30,000 publications that leverage over 6,000 datasets curated within one of the largest social science databases, ICPSR.

This study offers four important insights. First, combining datasets, particularly those infrequently paired, significantly contributes to both scientific and broader impacts (e.g., dissemination to the general public). Second, the combination of datasets with atypically combined topics has the opposite effect — the use of such data is associated with fewer citations.

Third, younger and less experienced research teams tend to use atypical combinations of datasets in research at a higher frequency than their older and more experienced counterparts.

Lastly, despite the benefits of data combination, papers that amalgamate data remain infrequent. This finding suggests that the unconventional combination of datasets is an under-utilized but powerful strategy correlated with the scientific and broader impact of scientific discoveries.

URL : https://arxiv.org/abs/2402.05024

Agile Research Data Management with Open Source: LinkAhead

Authors : Daniel Hornung, Florian Spreckelsen, Thomas Weiß

Research data management (RDM) in academic scientific environments increasingly enters the focus as an important part of good scientific practice and as a topic with big potentials for saving time and money. Nevertheless, there is a shortage of appropriate tools, which fulfill the specific requirements in scientific research.

We identified where the requirements in science deviate from other fields and proposed a list of requirements which RDM software should answer to become a viable option. We analyzed a number of currently available technologies and tool categories for matching these requirements and identified areas where no tools can satisfy researchers’ needs.

Finally we assessed the open-source RDMS (research data management system) LinkAhead for compatibility with the proposed features and found that it fulfills the requirements in the area of semantic, flexible data handling in which other tools show weaknesses.

URL : Agile Research Data Management with Open Source: LinkAhead

DOI : https://doi.org/10.48694/inggrid.3866

Building a Trustworthy Data Repository: CoreTrustSeal Certification as a Lens for Service Improvements

Authors : Cara Key, Clara Llebot, Michael Boock


The university library aims to provide university researchers with a trustworthy institutional repository for sharing data. The library sought CoreTrustSeal certification in order to measure the quality of data services in the institutional repository, and to promote researchers’ confidence when depositing their work.


The authors served on a small team of library staff who collaborated to compose the certification application. They describe the self-assessment process, as they iterated through cycles of compiling information and responding to reviewer feedback.


The application team gained understanding of data repository best practices, shared knowledge about the institutional repository, and identified areas of service improvements necessary to meet certification requirements. Based on the application and feedback, the team took measures to enhance preservation strategies, governance, and public-facing policies and documentation for the repository.


The university library gained a better understanding of top-notch data services and measurably improved these services by pursuing and obtaining CoreTrustSeal certification.

URL : Building a Trustworthy Data Repository: CoreTrustSeal Certification as a Lens for Service Improvements

DOI : https://doi.org/10.7191/jeslib.761