Skip to content
InfoDoc MicroVeille
Veille dédiée aux Sciences de l'Information et des Bibliothèques // Collecting and Sharing research papers in Library and Information science ISSN 2429-3938
  • À propos
  • About
  • Partager une publication
EN

Cluster Analysis of Open Research Data: A Case for Replication Metadata

Posted on 3 février 2023 by Hans Dillaerts

Author : Ana Trisovic

Research data are often released upon journal publication to enable result verification and reproducibility. For that reason, research dissemination infrastructures typically support diverse datasets coming from numerous disciplines, from tabular data and program code to audio-visual files. Metadata, or data about data, is critical to making research outputs adequately documented and FAIR.

Aiming to contribute to the discussions on the development of metadata for research outputs, I conducted an exploratory analysis to determine how research datasets cluster based on what researchers organically deposit together. I use the content of over 40,000 datasets from the Harvard Dataverse research data repository as my sample for the cluster analysis.

I find that the majority of the clusters are formed by single-type datasets, while in the rest of the sample, no meaningful clusters can be identified. For the result interpretation, I use the metadata standard employed by DataCite, a leading organization for documenting a scholarly record, and map existing resource types to my results.

About 65% of the sample can be described with a single-type metadata (such as Dataset, Software orReport), while the rest would require aggregate metadata types. Though DataCite supports an aggregate type such as a Collection, I argue that a significant number of datasets, in particular those containing both data and code files (about 20% of the sample), would be more accurately described as a Replication resource metadata type. Such resource type would be particularly useful in facilitating research reproducibility.

URL : Cluster Analysis of Open Research Data: A Case for Replication Metadata

DOI : https://doi.org/10.2218/ijdc.v17i1.833

Ana Trisovic, FAIR Data, Metadata, open research data, Replicability, research data
Hans Dillaerts
View all posts by Hans Dillaerts →

Post navigation

Older post
Model(s) of the future? Overlay journals as an overlooked and emerging trend in scholarly communication
Newer post
Influence of research on open science in the public policy sphere

Abonnement par mail

Email subscription

Vérifiez votre boite de réception ou votre répertoire d’indésirables pour confirmer votre abonnement. Please check your inbox to confirm your subscription.

Étiquettes

academic libraries Altmetrics article-processing charges Bibliometrics biomedical research business models case study Citation analysis copyright COVID-19 data reuse data sharing European Union France gold open access green road HSS institutional repositories Libraries OER open access open access journals open access policies open access publishing open data openness open repositories open science Peer Review Preprint research data research data management research impact Scholarly Communication scholarly journals Scholarly Publishing scientific communication scientific data scientific practices scientific pratices self-archiving state of the art UK USA wikipedia

Méta

  • Connexion
  • Flux des publications
  • Flux des commentaires
  • Site de WordPress-FR

Autres sites

Travaux en Info-Doc

Rencontres et Echanges Pro

© 2025 InfoDoc MicroVeille
Powered by WordPress | Theme: Graphy by Themegraphy