Bibliographic Metadata Harvesting to Support the Management of an Institutional Repository :
“This thesis approaches the problem of automatic harvesting of bibliographic metadata records from several indexing services, in the context of the population of institutional repositories. Since the manual insertion of records is a tedious and error-prone task, the automation of the process intends to facilitate the management of a repository. However, the automated harvesting of records has to deal with the problem of identifying authors and with the need to consolidate duplicate records retrieved from different services. In an approach to the automation of the aforementioned task, we introduce a system that proposes to harvest bibliographic metadata records from different information sources publicly available, identify and consolidate the retrieved records that are considered duplicates and make available the results of such consolidation to external parties that are interested in the information, such as an institutional repository. The proposed system was tested with real bibliographic metadata corresponding to scientific publications of a subset of faculty members at Instituto Superior T´ecnico. The results of the evaluation show that, despite the required time to identify and consolidate, the merged records contain a valid aggregation of all available information in the system and can be efficiently accessed by external entities through a machine-to-machine interface.”
URL : https://dspace.ist.utl.pt/bitstream/2295/1271450/1/dissertacao.pdf