Health informatics

Semantic Search and Discovery in NLM Databases

Abstract: 

This presentation will describe the NLMplus semantic search and discovery engine. The NLMplus project was initiated by WebLib LLC in response to a challenge by the National Library of Medicine to make innovative use of NLM’s vast collection of biomedical data and services. The NLMplus award winning app showcases a variety of natural language processing tools and solutions to provide an improved level of access to NLM’s rich content offerings.

NLMplus combines a number of leading-edge semantic knowledge resources and technologies, such as a biomedical knowledge base, a semantic search engine, a federated search engine, and a variety of smart content analysis and discovery services.

Users can concurrently access 60 NLM databases to find trusted information ranging from consumer health topics to drugs, news, clinical trials and translational medicine.  One of the important innovations of NLMplus is WebLib’s Semantic Search Engine, which typically produces relevant search results with improved precision and recall from 1.6 million PubMed Review articles. The reviews and meta-analyses are semantically indexed and searched on a WebLib server. The NLMplus application also sends conceptually enhanced user queries to NLM’s PubMed system for improved search results.

Providing flexible access to heterogeneous databases is a common challenge in medical libraries, biomedical research institutions and the health care industry. The same is true for non-biomedical content and applications. WebLib’s  innovative semantic indexing and searching technology, in combination with universal search and discovery solutions for free and fee-based content, allows all types of organizations to better serve their diverse user communities, including the public, researchers, professionals, and policy and decision makers.

Type of presentation: 
Poster

The database of the publications by Estonian medical scientists at the Medical Information Centre of Tartu University Hospital

Abstract: 

Background

Estonia is among the smallest of the European Union member states. Although Estonia’s area is slightly larger than the area of the Netherlands or Denmark, its population is only 1,339,662. In 2012 there are 5881 physicians in Estonia, i.e. 439 physicians per 100,000 inhabitants.Tartu University Hospital is the only university hospital in Estonia. The Medical Information Centre of Tartu University Hospital provides the whole medical staff in Estonia with information resources.The activity of using the information resources of the Hospital is also exemplified by the result of a webometrics study based on to the data of the Ranking Web of World Hospitals  Tartu University Hospital occupies the 65th place among the European Hospitals and the 257th place among hospitals worldwide.

Goal

One of the most important tasks of the Medical Information Centre of Tartu University Hospital was the creation of the database of Estonian medical literature (The Database of the Publications of Estonian Medical Scientists). This database serves as the basic material in analysing the activity of Estonian medical scientists. How many articles were published during ten years (2002 – 2011) by Estonian scientists and by Estonian medical scientists? How many papers were published by the medical staff of Tartu University Hospital? How many papers prepared in cooperation with other hospitals were published during five years (2007 – 2011) in Tartu University Hospital? In which international journals were the papers by medical scientists of Tartu University Hospital published?

Methods

The basic material for the bibliometric analysis consisted of the journals represented in the databases Thomson Reuters Web of Knowledge and the database of the publications by medical scientists available at the Medical Information Centre of Tartu University Hospital.

Results

During ten years (2002 – 2011) altogether 12,178 research papers by Estonian scientists were represented in the database of the Thomson Reuters Web of Knowledge (WOK), among them 3,292 papers by Estonian medical scientists. This accounts for 27% of the total publication production of Estonian scientists. In recent years the number of publications by Estonian medical scientists has continuously formed one-fourth of all published scientific papers in Estonia.It is evident that during the decade the number of published papers increased. When in 2002 the number of titles by Estonian medical scientists in the databases of the Web of Knowledge was 177, then their number had increased to 403 in 2011 i.e. more than twofold. The proportion of medical papers increased accordingly, the ratio being almost similar during the same decade. The present bibliometric analysis showed a steady increase in the publication productivity of Estonian medical scientists.

 Conclusion

As is evident, the proportion of journals with a high impact factor among those that have continuously published papers by Estonian medical scientists is relatively large.During the last five years the largest number of articles were contributed by the Children’s Clinic, the Neurology Clinic, the United Laboratories, the Internal Medicine Clinic  and the Women’s Clinic of Tartu University Hospital. The Andrology Centre has published 122 articles during its five-year existence, i.e. 24 articles per year. Such a result was achieved owing to the participation of the Centre’s physicians in several international research projects.In some respect, the distribution according to the clinics also demonstrates in which fields of medicine the research activity of the medical scientists of the hospital has been more productive.
Last, but not least – Tartu University Hospital every year acknowledges its most productive scientist with the Tartu University Hospital Science Award.

 Table 1. Ranking of the clinics of Tartu University Hospital on the basis of the number of published articles during the last 5 years:

Clinic

2011

2010

2009

2008

2007

Total

Children’s Clinic

53

106

60

43

55

317

Neurology Clinic

66

54

48

41

47

256

United Laboratories

33

77

46

45

42

243

Internal Medicine Clinic

29

49

50

48

44

220

Women’s Clinic

50

48

33

34

28

193

About half of the research papers by physicians of the Hospital were completed in cooperation with medical scientists from other countries and from different departments of the University of Tartu Faculty of Medicine.

Table 2. Papers prepared in cooperation with other Estonian hospitals and other countries:

The most important partners are from England, Sweden and Germany.

Table 3. Number of international journals (cited in Web of Knowledge) in which the medical scientists of the Tartu University Hospital have published their papers:

Journals with high impact factor, in which medical scientists of the Tartu University Hospital have published their papers.

New England Journal Medicine

The Lancet

Nature Review Drug Discovery

The Lancet Neurology

Annals of Neurology

Nature

Circulation

Table 4. Total number of publications by the medical staff of the Hospital and the number of articles published in international journals: 

Tartu University Hospital plays a very important role in the publication of articles on medical science as 80% of all medical publications in Estonia are completed by physicians of the Hospital and by the faculty members of the University of Tartu.

Legend Table: 
Table1. Ranking of the clinics of Tartu University Hospital on the basis of the number of published articles during the last 5 years
References: 
  1. Ranking Web of World Hospitals; [homepage on the Internet], c2012 [updated January 2012, cited 2012 April 10] Avalable from: http://hospitals.webometrics.info/
  2. Thomson Reuters Web of Knowledge; [database on the Internet], c2012 [cited 2012 April 10] Available from http://apps.webofknowledge.com/UA_GeneralSearch_input.do?product=UA&search_mode=GeneralSearch&SID=4ECC9cCKHAJmHAem9@i&preferencesSaved=
  3. The Database of the Publications of Estonian Medical Scientists; [homepage on the Internet], c2012 [updated April 2012, cited April 10, 2012] Available from: http://www.kliinikum.ee/infokeskus/index.php?option=com_content&view=article&id=32&Itemid=18

 

 

Type of presentation: 
Poster

Impactia: automating the study of the scientific production

Abstract: 

Introduction

The Andalusian Public Health System Virtual Library (Biblioteca Virtual del Sistema Sanitario Público de Andalucía, BV-SSPA) was set up in June 2006. It consists of a regional government action with the aim of democratizing the health professional access to quality scientific information, regardless of the professional workplace.

Andalusia is a region with more than 8 million inhabitants, with 100,000 health professionals for 41 hospitals, 1,500 primary healthcare centres, and 28 centres for non-medical attention purposes (research, management, and educational centres).

Objectives

The Department of Development, Research and Investigation (R+D+i) of the Andalusian Regional Government has, among its duties, the task of evaluating the hospitals and centres of the Andalusian Public Health System (SSPA)  in order to distribute its funding. Among the criteria used is the evaluation of the scientific output, which is measured using bibliometry.

It is well-known that the bibliometry has a series of limitations and problems that should be taken into account, especially when it is used for non-information sciences, such us career, funding, etc.

A few years ago, the bibliometric reports were done separately in each centre, but without using preset and well-defined criteria, elements which are basic when we need to compare the results of the reports. It was possible to find some hospitals which were including Meeting Abstracts in their figures, while others do not, and the same was happening with Erratum and many other differences.

Therefore, the main problem that the Department of R+D+i had to deal with, when they were evaluating the health system, was that bibliometric data was not accurate and reports were not comparable.

With the aim of having an unified criteria for the whole system, the Department of R+D+i  ordered the BV-SSPA to do the year analysis of the scientific output of the system, using some well defined criteria and indicators, among whichstands out the Impact Factor.

Materials and Methods

As the Impact Factor is the bibliometric indicator that the virtual library is asked to consider, it is necessary to use the database Web of Science (WoS), since it is its owner and editor. The WoS includes the databases Science Citation Index (SCI), Social Sciences Citation Index (SSCI) and Arts & Humanities Citation Index.  To gather all the documents, SCI and SSCI are used; to obtain the Impact Factor and quartils, it is used the Journal Citation Reports, JCR.

Unlike other bibliographic databases, such us MEDLINE, the bibliometric database WoS includes the address of all the authors. In order to retrieve all the scientific output of the SSPA, we have done general searches, which are afterwards processed by a tool developed by our library. We have done nine different searches using the field ‘address’; eight of them including ‘Spain’ and each one of the eight Andalusian Regions, and the other one combining ‘Spain’ with all those cities where there are health centres, since we have detected that there are some authors that do not use the region in their signatures. These are some of the search strategies:

  • AD=Malaga and AD=Spain
  • AD=Sevill* and AD=Spain
  • AD=SPAIN AND (AD=GUADIX OR AD=BAZA OR AD=MOTRIL)

Further more, the field ‘year’ is used to determine the period.

To exploit the data, the BV-SSPA has developed a tool called Impactia. It is a web application which uses a database to store the information of the documents generated by the SSPA. Impactia allows the user to automatically process the retrieved documents, assigning them to their correspondent centres.

In order to do the classification of documents automaticaly, it was necessary to detect the huge variability of names of the centres that the authors use in their signatures. Therefore, Impactia knows that if an author signs as “Hospital Universitario Virgen Macarena”, “HVM” or “Hosp. Virgin Macarena”, he belongs to the same centre. The figure attached shows the variability found for the Empresa Publica Hospital de Poniente.

Besides the documents from WoS, Impactia includes the documents indexed in Scopus and in other databases, where we do bibliographic searches using similar strategies to the later ones.

Aware that in the health centres and hospitals there is a lot of grey literature that is not gathered in databases, Impactia allows the centres to feed the application with these documents, so that all the SSPA scientific output is gathered and organised in a centralized place.
The ones responsible of localizing this gray literature are the librarians of each one of the centres. They can also do statements to the documents and indicators that are collected and calculated by Impactia.

The bulk upload of documents from WoS and Scopus into Impactia is monthly done.

One of the main issues that we found during the development of Impactia was the need of dealing with duplicated documents obtained from different sources. Taking into account that sometimes titles might be written differently, with slashes, comas, and so on, Impactia detects the duplicates using the field ‘DOI’ if it is available or comparing the fields: page start, page end and ISSN. Therefore it is possible to guarantee the absence of duplicates.

Results

The data gathered in Impactia becomes available to the administrative teams and hospitals managers, through an easy web page that allows them to know at any moment, and with just one click, the detailed information of the scientific output of their hospitals, including useful graphs such as percentage of document types, journals where their scientists usually publish, annual comparatives, bibliometric indicators and so on. They can also compare the different centres of the SSPA.

Impactia allows the user to download the data from the application, so that he can work with this information or include them in their centres’ reports.

This application saves the health system many working hours. It was previously done manually by forty one librarians, while now it is done by only one person in the BV-SSPA during two days a month.

To sum up, the benefits of Impactia are:

  • It has shown its effectiveness in the automatic classification, treatment and analysis of the data.
  • It has become an essential tool for all managers to evaluate quickly and easily the scientific production of their centers. 
  • It optimizes the human resources of the SSPA, saving time and money.
  • It is the reference point for the Department of R+D+i to do the scientific health staff evaluation.
Legend Figure: 
Name variability of Poniente Hospital
Type of presentation: 
Poster

The EHTOP: indexing Health resources in a multi-terminology/ontology and cross-lingual world

Abstract: 

Introduction

The amount of Health and Biomedical information available online is constantly increasing. Many barriers can interfere with the universal access to this information such as domain-specific terminologies and languages. More and more institutions are already providing health information in several languages: e.g. MEDLINEplus is providing health information for lay people in English and Spanish, whereas the Europe Medicine Agency is providing drug information in every European language.

The European Health Terminology/Ontology Portal (EHTOP) is a repository aiming to host health terminologies and ontologies (T/O) in various languages. It evolved from a server providing the NLM Medical Subject headings in French and English to a service providing access to multiple terminologies and ontologies available in French and in English but also in German, Italian, Dutch, Spanish, Danish and other 17 languages. EHTOP can be used by humans and by computers via Web services. The main objective of EHTOP is to provide a single access point to health T/O, allowing dynamic browsing and navigation through a specific terminology and its translations or across different terminologies. This tool is mainly dedicated to librarians to index resources in a multi-terminology mode in order to cover several fields of knowledge. Many others use may appear due to its very broad scope.

Methods

To integrate T/O into EHTOP, three steps are necessary[1]:

  1. designing a meta-model into which each terminology and ontology can be integrated. Such a model must be sufficiently generic too represent each T/O whatever its structure is.
  2. developing a process to include terminologies into EHTOP resulting in format compatible with standard knowledge representation languages.
  3. building and integrating existing and new inter & intra-terminology semantic harmonization into EHTOP.

A generic model was designed for the database in order to fit all the terminologies into one global structure. Then, a model of each terminology was designed as a specialization of the meta-model. The purpose of this generic model is to factorize the artifacts (classes, relationships, attributes) common to all the terminologies, thus facilitating integration of multiple terminologies within a single platform. Some artifacts, although specific to certain terminologies, must nevertheless be represented in order to avoid losing information outside the generic model. Consequently, a trade-off has to be selected in order to faithfully represent a terminology with no loss of information while removing artifacts shared by terminologies in order, subsequently, to offer independent shared services related to a given terminology.

The EHTOP web site was designed as a graphic interface of a Web Service, entirely dedicated to information retrieval and associations between terms of several terminologies. Thus, the main objective was to dissociate the substance from the form, in particular the interface.

The EHTOP Web Service has been developed to respect Web Services Standards with SOAP (Simple Object Access Protocol) and WSDL (Web Service Description Language) signatures. It presents some methods to search terms by descriptor or by database unique identifier. A specific assessment of SQL queries on the database has been performed to obtain the best performance for an optimized response time.

As the EHTOP exploits a SKOS file (RDF), the graphic interface that renders the final HTML was build based on JSP (Java Server Pages) files including XSL (eXtensible Stylesheet Language) functions. Additional CSS (Cascading Style Sheets) and JavaScript functions are implemented to offer a better website design. The final HTML rendering is processed by the client navigator.

Results

EHTOP is available freely and gives access directly to the ICD-10 (International Statistical Classification of Diseases and Related Health Problems) and the FMA (Foundational Model of Anatomy) (URL: http://www.ehtop.eu/). The access to other T/O (MeSH, WHO-ATC, ICPC-2...) is restricted and available only for the scientific community. A total of 32 terminologies are included into EHTOP, with 980,000 concepts, 2,300,000 synonyms, 222,800 definitions and 4,000,000 relations. Currently, 600 unique machines are using this bilingual version, whereas 300 users are already registered. Two qualitative evaluations have been performed in the last two years on cohorts of Rouen Medical School students (September 2010 and September 2011): the "Interest for teaching" get a score of 83,7% and the "Design of the web site" get a score of 56.36%.

Since January 2010, the bilingual version of EHTOP is daily used by CISMeF librarians to index health resources in the CISMeF catalogue in a multi-terminology mode. The MeSH is  still the core language for indexing but it’s now complemented by the ATC (Anatomical Therapeutic Chemical classification) for the majority of drug information resources. The SNOMED (Systematized Nomenclature of Medicine, international, version 3.5) and ICD-10 are also used each time a concept is not present within the MeSH.

Within the regular users of the service, different other profiles have been identified: Health professionals using controlled vocabulary to conduct structured queries to databases such as MEDLINE, students using rich terminologies as knowledge sources, translators and linguists  looking for information in the biomedical field.

Thanks to the conceptual view of the EHTOP multi-terminological and cross-lingual model, the information retrieval of resources indexed with the EHTOP terms will be cross-lingual (e.g. if a librarian indexes a resource in French with the MeSH descriptor D001249 "Asthme", a Danish user can type a query directly in Danish: "Astma").

The inter & intra-terminology semantic harmonization of the EHTOP content allows interoperability between T/O. This interoperability can be very useful for different purposes (specific indexing, semantic enrichment, ...). It is therefore possible to navigate between T/O. Moreover, the Natural Language Processing tools developed and validated in the CISMeF team [2] allowed the leverage of several T/O (translations, addition of synonyms, definitions, relations, ...). Thus, integrating T/O into EHTOP can leverage the use of the T/O (e.g. the CISMeF team manually added more than 10,000 synonyms to the MeSH).

Discussion & Conclusion

The content of EHTOP can be compared to the UMLS[3]. Whereas the number of T/O is much larger for the UMLS, some original T/O have been integrated into EHTOP such as WHO-ATC, IUPAC and other terminologies in French language. As far as we know, this tool is the only one of its kind: several comparisons with BioPortal[4] or the EBI Ontology Lookup Service[5] have been performed by our team; those tools are less user-friendly and are not enough dedicated to "human" users.

EHTOP is a rich tool, useful for a wide range of applications and users for educational purpose, resources indexing, information retrieval or performing audits in terminology management. The next goal is to enhance our results and to provide the best possible service to users (better graphic user interface, adapted to specific profiles: students, librarians, translators, physicians, etc).

Legend Figure: 
EHTOP screen capture
References: 
  1. Grosjean J, Merabti T, Griffon N, Dahamna B, Darmoni SJ. Multilingual multiterminology model to create the European Health Terminology/Ontology Portal. TIA 2011.
  2. Merabti T: Methods to map health terminologies:  contribution to the semantic interoperability between health terminologies. PhD thesis, University of Rouen 2010.
  3. Bodenreider O. The Unified Medical Language System (UMLS): Integrating biomedical terminology. 2004, 32:267-270.
  4. Noy NF, Shah NH., Whetzel PL, Dai B, Dorf M, Musen MA, et al. BioPortal: ontology and integrated data resources at the click of a mouse. Nucleic Acids Research; Web Server Issue 10. 2009
  5. Cote RG, Jones P, Apweiler R, Hermjakob H. The ontology lookup service, a lightweight cross-platform tool for controlled vocabulary queries. BMC Bioinformatics. 2006 Feb 28;7(1):97 PMID: 16507094
Session: 
Session D. Global aspects of information
Ref: 
D2
Type of presentation: 
Oral presentation