Metadata enrichment in a medical institutional repository for evaluation purpose

Authors: 
Djeddou, Natalia
Godel, Sylvie
Iriarte, Pablo
Lausanne Medical University Library
Abstract: 

Background

At the Medical Faculty Lausanne, the institutional repository plays a dual role. On one hand it is an OAI-PMH server, disseminating research results financed by public funds. On the other hand it is a directory of academic publications, which can be used to evaluate the Faculty scientific output.

Methods

In Lausanne, the assessment of published medical articles – at an individual or institutional level – relies on the collection for each reference of the following data :

  • the citation count as indicated by Thomson Reuters Web of Science (WoS),
  • the journal impact factors (IF) calculated by the Journal Citation Report (JCR),
  • the Research Production Unit (RPU), which is an IF normalized by research field.

However, other characteristics are also taken into account such as the type of article (original article, case report, review, etc.), the total number of authors per article and the author’s position (first, last, middle).

To gather all these data, three main resources have to be used and their information merged:

  1. PubMed: provides publications metadata and publication types categorization.
  2. Web of Science: provides citation counts and complete authors lists with affiliations.
  3. JCR: provides IF and journal subject categories used for the RPU normalization.

Bibliometric data and some publication metadata elements are collected from WoS because PubMed presents some shortcomings, starting with the lists of authors and the affiliations. For example, between 1983 and 1996, the number of authors was limited to 10 in Pubmed, then to 25. Since 2000, PubMed lists all the authors, but, surprisingly, only the institutional affiliation of the first author is mentioned.

The merging of information from these different sources requires the use of identifiers, web services and AJAX techniques. It is not an easy process, since bridges have to be built between the different databases. Three main identifiers (PMID for PubMed, UT for WoS and ISSN for JCR) play a major role in order to map information between the databases. These identifiers have to be collected and aggregated with the other metadata stored in the repository.

The Lausanne Medical Library has taken the responsibility to build and maintain the techniques and processes connecting the international databases with the repository and to deliver accurate bibliometric information to the faculty research evaluation unit on a regular basis.

Results

Using PubMed and WoS web service, an AJAX technique was developed to import metadata into the repository entry form, allowing researchers and librarians to fill the bibliographic fields by typing in one identifier only (either PMID, or DOI or UT). This kind of metadata import avoids input errors and waste of time.

Besides, the process allows the collection of some important international identifiers such a DOI and ISSN for every single reference. When the repository metadata are complete and accurate, then the harvesting of bibliometric data runs smoothly. For example, a web service can retrieve the WoS unique identifier (UT) and citation count just by processing a PMID or DOI. The opposite is not so easy, but a DOI can link to both UT and PMID. Afterwards, the IF and RPU elements are automatically assigned to the publication metadata using the journal ISSN. Very often, the ISSN versions provided by PubMed differ with the ISSN versions indicated by JCR. The librarians had to build a table mapping the ISSNs used in both databases. This table was created using the complete list of ISSNs provided by the international ISSN registration agency (www.issn.org).

Standard numbers are not always used in a standardized way in the international databases. A good monitoring of this kind of discrepancies is part of the data curation role the librarians can play.

Keywords: 
Bibliometrics, Publishing, Bibliographic Databases
Category: 
Digital libraries
Type of presentation: 
Poster