The EHTOP: indexing Health resources in a multi-terminology/ontology and cross-lingual world

Authors: 
Grosjean, Julien, Rouen University Hospital, France
Kerdelhué, Gaétan, Rouen University Hospital, France
Merabti, Tayeb, Rouen University Hospital, France
Darmoni, Stéfan, Rouen University Hospital, France
Abstract: 

Introduction

The amount of Health and Biomedical information available online is constantly increasing. Many barriers can interfere with the universal access to this information such as domain-specific terminologies and languages. More and more institutions are already providing health information in several languages: e.g. MEDLINEplus is providing health information for lay people in English and Spanish, whereas the Europe Medicine Agency is providing drug information in every European language.

The European Health Terminology/Ontology Portal (EHTOP) is a repository aiming to host health terminologies and ontologies (T/O) in various languages. It evolved from a server providing the NLM Medical Subject headings in French and English to a service providing access to multiple terminologies and ontologies available in French and in English but also in German, Italian, Dutch, Spanish, Danish and other 17 languages. EHTOP can be used by humans and by computers via Web services. The main objective of EHTOP is to provide a single access point to health T/O, allowing dynamic browsing and navigation through a specific terminology and its translations or across different terminologies. This tool is mainly dedicated to librarians to index resources in a multi-terminology mode in order to cover several fields of knowledge. Many others use may appear due to its very broad scope.

Methods

To integrate T/O into EHTOP, three steps are necessary[1]:

  1. designing a meta-model into which each terminology and ontology can be integrated. Such a model must be sufficiently generic too represent each T/O whatever its structure is.
  2. developing a process to include terminologies into EHTOP resulting in format compatible with standard knowledge representation languages.
  3. building and integrating existing and new inter & intra-terminology semantic harmonization into EHTOP.

A generic model was designed for the database in order to fit all the terminologies into one global structure. Then, a model of each terminology was designed as a specialization of the meta-model. The purpose of this generic model is to factorize the artifacts (classes, relationships, attributes) common to all the terminologies, thus facilitating integration of multiple terminologies within a single platform. Some artifacts, although specific to certain terminologies, must nevertheless be represented in order to avoid losing information outside the generic model. Consequently, a trade-off has to be selected in order to faithfully represent a terminology with no loss of information while removing artifacts shared by terminologies in order, subsequently, to offer independent shared services related to a given terminology.

The EHTOP web site was designed as a graphic interface of a Web Service, entirely dedicated to information retrieval and associations between terms of several terminologies. Thus, the main objective was to dissociate the substance from the form, in particular the interface.

The EHTOP Web Service has been developed to respect Web Services Standards with SOAP (Simple Object Access Protocol) and WSDL (Web Service Description Language) signatures. It presents some methods to search terms by descriptor or by database unique identifier. A specific assessment of SQL queries on the database has been performed to obtain the best performance for an optimized response time.

As the EHTOP exploits a SKOS file (RDF), the graphic interface that renders the final HTML was build based on JSP (Java Server Pages) files including XSL (eXtensible Stylesheet Language) functions. Additional CSS (Cascading Style Sheets) and JavaScript functions are implemented to offer a better website design. The final HTML rendering is processed by the client navigator.

Results

EHTOP is available freely and gives access directly to the ICD-10 (International Statistical Classification of Diseases and Related Health Problems) and the FMA (Foundational Model of Anatomy) (URL: http://www.ehtop.eu/). The access to other T/O (MeSH, WHO-ATC, ICPC-2...) is restricted and available only for the scientific community. A total of 32 terminologies are included into EHTOP, with 980,000 concepts, 2,300,000 synonyms, 222,800 definitions and 4,000,000 relations. Currently, 600 unique machines are using this bilingual version, whereas 300 users are already registered. Two qualitative evaluations have been performed in the last two years on cohorts of Rouen Medical School students (September 2010 and September 2011): the "Interest for teaching" get a score of 83,7% and the "Design of the web site" get a score of 56.36%.

Since January 2010, the bilingual version of EHTOP is daily used by CISMeF librarians to index health resources in the CISMeF catalogue in a multi-terminology mode. The MeSH is  still the core language for indexing but it’s now complemented by the ATC (Anatomical Therapeutic Chemical classification) for the majority of drug information resources. The SNOMED (Systematized Nomenclature of Medicine, international, version 3.5) and ICD-10 are also used each time a concept is not present within the MeSH.

Within the regular users of the service, different other profiles have been identified: Health professionals using controlled vocabulary to conduct structured queries to databases such as MEDLINE, students using rich terminologies as knowledge sources, translators and linguists  looking for information in the biomedical field.

Thanks to the conceptual view of the EHTOP multi-terminological and cross-lingual model, the information retrieval of resources indexed with the EHTOP terms will be cross-lingual (e.g. if a librarian indexes a resource in French with the MeSH descriptor D001249 "Asthme", a Danish user can type a query directly in Danish: "Astma").

The inter & intra-terminology semantic harmonization of the EHTOP content allows interoperability between T/O. This interoperability can be very useful for different purposes (specific indexing, semantic enrichment, ...). It is therefore possible to navigate between T/O. Moreover, the Natural Language Processing tools developed and validated in the CISMeF team [2] allowed the leverage of several T/O (translations, addition of synonyms, definitions, relations, ...). Thus, integrating T/O into EHTOP can leverage the use of the T/O (e.g. the CISMeF team manually added more than 10,000 synonyms to the MeSH).

Discussion & Conclusion

The content of EHTOP can be compared to the UMLS[3]. Whereas the number of T/O is much larger for the UMLS, some original T/O have been integrated into EHTOP such as WHO-ATC, IUPAC and other terminologies in French language. As far as we know, this tool is the only one of its kind: several comparisons with BioPortal[4] or the EBI Ontology Lookup Service[5] have been performed by our team; those tools are less user-friendly and are not enough dedicated to "human" users.

EHTOP is a rich tool, useful for a wide range of applications and users for educational purpose, resources indexing, information retrieval or performing audits in terminology management. The next goal is to enhance our results and to provide the best possible service to users (better graphic user interface, adapted to specific profiles: students, librarians, translators, physicians, etc).

Keywords: 
Terminology as topic, Vocabulary, controlled
Legend Figure: 
EHTOP screen capture
AttachmentSize
514_Grosjean et al_Figure.png57.11 KB
References: 
  1. Grosjean J, Merabti T, Griffon N, Dahamna B, Darmoni SJ. Multilingual multiterminology model to create the European Health Terminology/Ontology Portal. TIA 2011.
  2. Merabti T: Methods to map health terminologies:  contribution to the semantic interoperability between health terminologies. PhD thesis, University of Rouen 2010.
  3. Bodenreider O. The Unified Medical Language System (UMLS): Integrating biomedical terminology. 2004, 32:267-270.
  4. Noy NF, Shah NH., Whetzel PL, Dai B, Dorf M, Musen MA, et al. BioPortal: ontology and integrated data resources at the click of a mouse. Nucleic Acids Research; Web Server Issue 10. 2009
  5. Cote RG, Jones P, Apweiler R, Hermjakob H. The ontology lookup service, a lightweight cross-platform tool for controlled vocabulary queries. BMC Bioinformatics. 2006 Feb 28;7(1):97 PMID: 16507094
Session: 
Session D. Global aspects of information
Ref: 
D2
Category: 
Health informatics
Type of presentation: 
Oral presentation