Authors : Nadia Paola Valadez-de la Paz , Jose Antonio Vazquez-Lopez , Aidee Hernandez-Lopez , Jaime Francisco Aviles-Viñas, Jose Luis Navarro-Gonzalez, Alfredo Valentin Reyes-Acosta, Ismael Lopez-Juarez
Preliminary activities of searching and selecting relevant articles are crucial in scientific research to determine the state of the art (SOTA) and enhance overall outcomes.
While there are automatic tools for keyword extraction, these algorithms are often computationally expensive, storage-intensive, and reliant on institutional subscriptions for metadata retrieval. Most importantly, they still require manual selection of literature.
This paper introduces a framework that automates keyword searching in article abstracts to help select relevant literature for the SOTA by identifying key terms matching that we, hereafter, call source words. A case study in the food and beverage industry is provided to demonstrate the algorithm’s application.
In the study, five relevant knowledge areas were defined to guide literature selection. The database from scientific repositories was categorized using six classification rules based on impact factor (IF), Open Access (OA) status, and JCR journal ranking.
This classification revealed the knowledge area with the highest presence and highlighted the effectiveness of the selection rules in identifying articles for the SOTA. The approach included a panel of experts who confirmed the algorithm’s effectiveness in identifying source words in high-quality articles. The algorithm’s performance was evaluated using the 𝐹1 Score, which reached 0.83 after filtering out non-relevant articles.
This result validates the algorithm’s ability to extract significant source words and demonstrates its usefulness in building the SOTA by focusing on the most scientifically impactful articles.
URL : Automation Applied to the Collection and Generation of Scientific Literature