Skip to content
InfoDoc MicroVeille
Veille dédiée aux Sciences de l'Information et des Bibliothèques // Collecting and Sharing research papers in Library and Information science ISSN 2429-3938
  • À propos
  • About
  • Partager une publication
EN

Comparison of Feature Learning Methods for Metadata Extraction from PDF Scholarly Documents

Posted on 10 janvier 2025 by Hans Dillaerts

Authors : Zeyd Boukhers, Cong Yang

The availability of metadata for scientific documents is pivotal in propelling scientific knowledge forward and for adhering to the FAIR principles (i.e. Findability, Accessibility, Interoperability, and Reusability) of research findings. However, the lack of sufficient metadata in published documents, particularly those from smaller and mid-sized publishers, hinders their accessibility. This issue is widespread in some disciplines, such as the German Social Sciences, where publications often employ diverse templates. To address this challenge, our study evaluates various feature learning and prediction methods, including natural language processing (NLP), computer vision (CV), and multimodal approaches, for extracting metadata from documents with high template variance.

We aim to improve the accessibility of scientific documents and facilitate their wider use. To support our comparison of these methods, we provide comprehensive experimental results, analyzing their accuracy and efficiency in extracting metadata. Additionally, we provide valuable insights into the strengths and weaknesses of various feature learning and prediction methods, which can guide future research in this field.

URL : Comparison of Feature Learning Methods for Metadata Extraction from PDF Scholarly Documents

Arxiv : https://arxiv.org/abs/2501.05082

Cong Yang, metadata extraction, scientific communication, scientific documents, Zeyd Boukhers
Hans Dillaerts
View all posts by Hans Dillaerts →

Post navigation

Older post
A role for qualitative methods in researching Twitter data on a popular science article’s communication
Newer post
Patent research in academic literature. Landscape and trends with a focus on patent analytics

Abonnement par mail

Email subscription

Vérifiez votre boite de réception ou votre répertoire d’indésirables pour confirmer votre abonnement. Please check your inbox to confirm your subscription.

Étiquettes

academic libraries Altmetrics article-processing charges Bibliometrics biomedical research business models case study Citation analysis copyright COVID-19 data reuse data sharing European Union France gold open access green road HSS institutional repositories Libraries OER open access open access journals open access policies open access publishing open data openness open repositories open science Peer Review Preprint research data research data management research impact Scholarly Communication scholarly journals Scholarly Publishing scientific communication scientific data scientific practices scientific pratices self-archiving state of the art UK USA wikipedia

Méta

  • Connexion
  • Flux des publications
  • Flux des commentaires
  • Site de WordPress-FR

Autres sites

Travaux en Info-Doc

Rencontres et Echanges Pro

© 2025 InfoDoc MicroVeille
Powered by WordPress | Theme: Graphy by Themegraphy