Digital libraries

Text Mining and copy right laws : a case for change in the medical research field

Abstract: 

Background

Mid 2011, the research team at the Department of Ambulatory Care and Community Medicine, University of Lausanne, decided to start a project on a new research topic : Shared Decision Making (SDM). The objective was to identify publication trends about SDM in 15 major internal medicine journals over the last 15 years. It was decided to use a "text mining" approach to systematically  review all the articles published in these main journals and automatically search for the different occurrences of SDM. The research team turned to the medical library for help to collect the electronic publication files.

Methods

The software applications used in text mining allow to search through large sets of unstructured texts. The results are then clustered to extract trends, facts and build new knowledge. In order to work consistently, all the text sources should be aggregated on one single local platform. However electronic scientific publications are currently stored as licensed materials on publishers' sites. Bulk download of thousands of articles are not commonly permitted by licences. The library teamed up with researchers in order to get all the permissions to compile the files for research purpose.

Results

Contacts with publishers and exchange of information over the reseach project were particularly cumbersome and time consuming. After 6 months, only 5 out of 15 publishers had agreed to grant an licence extension that gave the right to systematically download the articles for research purpose. Permission was usually granted under one main condition : all downloaded content for text based analysis should be destroyed when research is complete.

Conclusion

Due to the rapidly expanding body of electronic biomedical literature, text mining should become an essential process for research in the medical field. To allow this new research method to expand, copyright law and licences for electronic access have to be amended and new competences have to emerge in libraries and research centers.

Type of presentation: 
Poster

MATARKA: Cross-country development in the searchable Hungarian periodicals Table of Contents (TOC) database

Abstract: 

Introduction

In the field of information service the librarians usually meet the problem of literature searching. Information mostly could be found in journals. Sometimes a user knows the title or author of an article without the exact data of the published article. Many times this article could not be found in any bibliography or website. In these cases the user or librarian has to check many issues, volumes of the periodicals. This could take lots of time and lots of persistence.

The table of contents (TOC) of journals could give the quickest information about the complete content of an issue.

This gave the idea of building up a database helping in this service: librarians should collect the TOCs of the journals and make them searchable.

Objectives

In our poster we would like to present how important MATARKA is for the Hungarian librarianship. Furthermore, we would like to show the presence of the subject of medicine in this project.

Brief description

MATARKA is an abbreviation: Hungarian Periodicals Table of Contents Database (Magyar Folyóiratok Tartalomjegyzékeinek Kereshető Adatbázisa). This name covers a database of table of contents (TOC) of nearly 1300 Hungarian periodicals. The web address of the database: http://www.matarka.hu (The page has an English interface too.)

MATARKA is valuable not only in searching and browsing current TOCs, but could also be used as:

  • a publication database of authors publishing in Hungarian journals,
  • checking of bibliographic information,
  • holdings and location information of the journals,
  • TOCs of papers that have ceased publication,
  • in some case even the first volumes of periodicals could be found.

 The database has some other, important benefits, for example it is available online and free of charge. A lot of journals - in a wide range of sciences - could be searched using only one website.

History

Beginning

In 2002, an independent, unique project was launched in the Library of the University of Miskolc (Hungary). The librarians planned an online tool for the TOCs of their subscribed periodicals (15 technical journals). They developed a database with open source software: MySQL, PHP. The librarians filled up the database with the data. They chose the 15 most important journals of their holding.

After a year other 8 libraries joined this project: libraries of Hungarian colleges, universities and the library of the Budapest City Archives. This shows that the subjects were broadened: not only technical journals appeared in MATARKA, but even journals in the humanities, social sciences, applied sciences, too. Each joined library focused on further journals that are useful for them and recorded the TOCs in MATARKA.

At present

The project is still based on voluntary, daily work of the librarians.

In 2004, a consortium was established. The cooperation is managed by the Library of University of Miskolc.

In 2006, a MATARKA Association was created, now the association could compete for support and sponsorship.

MATARKA has now become known: librarians, researchers, students use it every day. It is mentioned and built in a lot of cross-national content provider projects.

Having received financial support through a national tender, last year 600,000 records were put into the database.

This year MATARKA celebrates its 10th anniversary.

Developments

Completed developments

One of the first developments was the possibility to attach URL links to the titles. If an article has a freely accessible full-text online, then this URL link will be attached (the URLs are checked every day).

As a next step, on-site registration was worked out, personal registration could be processed, RSS alert could be set.

MATARKA is now serviceable with other software and databases: bibliographic data could be downloaded of selected articles in txt, html, MARC and RIS format.

In 2007, a copy service was started. The National Széchényi Library (national library of Hungary) assumed this service. Users could select articles in MATARKA and send order to the national library through the order form of MATARKA.

Developments in the future

In the future, MATARKA will continue to improve its services: for example automatic TOC sending to e-mail addresses (on request). According to the new mobile tools, MATARKA will be operated on portable communications devices, too. MATARKA wants to end the cooperation with other national services like the Hungarian common catalogue (MOKKA) and the national document sending system (ODR).

Numeric data:

In the database 10 subjects are represented:

 Documentation. Books. Libraries.

  1. Religion
  2. Social Sciences: Statistics. Demography. Sociology, Economy, Economics, Law, Politics, Administration, Education and teaching,
  3. Natural Sciences: Conservation, Environmental science, Mathematics, Physics, Chemistry. Mineralogy, Earth sciences, Biology
  4. Applied sciences: Medicine, Health, Engineering, Agriculture, Industry
  5. Arts, games, sport
  6. Linguistics
  7. Literature
  8. History
  9. Geography

Statistics: today (20/04/2012) 1,252 journal titles, 1,730,830 article titles, 253,898 authors are in the database; 301,182 titles have online full-text links. In March 2012, the average daily record processing was about 500 new article titles. Usage statistics: 81,303 searches were run and 30,902 articles were downloaded.

Medical science in MATARKA

 Our library, the National Health Policy Library, publishes the Hungarian Medical Bibliography (HMB), which is a part of the Hungarian National Bibliography System. HMB has a long history dating back to 1950’s. So it was obvious that our library would cooperate with MATARKA.

We joined to this initiative in 2006. Our library was the first to process medical journals in this database. Our first large uploading was the last ten years of Orvosi Hetilap (the leading clinical weekly paper in Hungary). Ten years: 520 issues, about 5,200 titles.

For example we record the TOC of

  • Egészségügyi Menedzsment (=Health Management)
  • Egészségügyi Gazdasági Szemle (=Journal of Health Economy, accessible TOC in MATARKA from the starting year: 1963)
  • IME (=Informatics and Management in Healthcare, this journal is published in print and online version, too. The online full-text is freely accessible and is linked in MATARKA)
  • Orvosi Könyvtáros (=Health Librarian)

Our library records 8 current health policy periodicals into MATARKA. Five colleagues work together as daily routine in this project.

Other medical libraries

At present 14 libraries with 87 medical journals (more than 100 thousand articles) participate in this work.

In Hungary, a medical bibliography and many information portals, publisher sites exist. HMB is a professional bibliography of selected articles; peer-reviewed papers could be searched in it. A lot of medical information portals provide access to medical journals and some publisher have website, too. Without skills it is very difficult to find the right portal or website. That is why MATARKA is very useful for medical librarianship: any user could find TOCs on one site: easily, quickly and free of charge.

Summary

The project was started by one library as a small unbidden work, to give better, current information about contents of periodicals to the users. In ten years it has become a nationwide cooperation. MATARKA now has a consortium and an association.

Being an innovative, independent venture, this work effects a priceless service. That is the reason why MATARKA is so prosperous and will continue to exist.

National Institute for Quality- and Organizational Development in Healthcare and Medicines (GYEMSZI) - Directorate General of IT and System Analysis - National Health Policy Library, Budapest, Hungary

References: 

  1. Burmeister E. MATARKA - magyar folyóiratok tartalomjegyzékeinek kereshetõ adatbázisa [MATARKA - Hungarian Periodicals Table of Contents Database]. Könyv, könyvtár, könyvtáros. 2003;12(12):36-43.
  2. Burmeister E, Kiss A, Gubán S. Nyolc könyvtár közös adatbázis építésének tapasztalatai [Experience of eight libraries in building common database]. Networkshop; Pécs. 2003.
  3. Bajnok L. MATARKA. Könyvtári kis híradó. 2007;12(1/2):9-10.
  4. Burmeister E. Újdonságok a MATARKÁ-ban [New features in MATARKA]. Könyvtári levelező/lap. 2009;21(3):15.
Type of presentation: 
Poster

Virtual Health Sciences Libraries in Social Networks

Type of presentation: 
Poster

The importance of “Threaded Publications” within the Health Libraries Community

Abstract: 

BioMed Central advocates free and complete access to scientific research. This is often associated with the publication of full open access journals however, BioMed Central has taken this concept further and is looking at access to all clinical trial-related publications, including scientific data, and their interconnectivity.

‘Threaded publications’ is an initiative that was in January 2011 by BioMed Central to help increase transparency in science communication and enhance the discoverability of evidence-based health information.  This concept seeks to address the current problem of disconnected articles relating to a specific clinical trial, and sound research failing to be published, by providing a complete solution.

The “Threaded publications” is not a new idea – it was proposed by Sir Iain Chalmers and Prof Doug Altman in an article in The Lancet 1999: “Electronic publication of a protocol could be simply the first element in a sequence of 'threaded' electronic publications, which continues with reports of the resulting research (published in sufficient detail to meet some of the criticisms of less detailed reports published in print journals), followed by deposition of the complete data set.”

Marketing, financial incentives, and improved online links between clinical trial registration databases and journal articles are already in place and we are now working with CrossRef on further  technical developments, and aims to make threaded publications interoperable between multiple journals and publishers.

What a reader or ‘user’ might gain from threaded publications depends on their role. Systematic reviewers need to be able to access all available evidence, and to view all articles and data that are available on a specific trial. Research funders also need to have access to this data to find out if funding for the research, such as a trial, is justifiable. Practitioners and health care professionals need to find as much information as possible on a treatment to make sure to offer the best possible patient care. To be able to offer these user groups the information they need is the principal task of the health libraries, and to improve information literacy among their users. The “Threaded Publications” initiative will make it easier for the librarians to find and provide related information and data to their users and enhance discoverability of evidence-based health information.

Type of presentation: 
Poster

Setting up an institutional repository based on the integrated library management system.

Abstract: 

INTRODUCTION

The Belgian Health Care Knowledge Centre (KCE) is a federal institution established in 2003. Its mission is to produce studies and reports to advise policy-makers about health care and health insurance; but also to ensure the extensive dissemination of those reports on the field.

Results of studies conducted at KCE are KCE reports, a PDF document published on the KCE web site. The PDF is also sent to the Belgian legal depot for archiving purpose.

A paper version of each KCE report is produced and placed at the library with a description in the library catalogue which act as a Z39.50 server. Records of the library catalogue are exported four times a year to the Belgian federal libraries catalogue: bib.belgium.be

Description of reports is then further disseminated to some specific databases like CRD HTA database for the HTA series. But this dissemination occurs manually and is thus difficult to extend in the context of a one person library.

Still, automation could be set up using other library standard, and more specifically the OAI-PMH interoperability standard. Institutional repositories are increasingly used by scientific community to disseminate their scientific production, but also to discover "grey literature" like KCE reports. Several dedicated document management systems exist for such a purpose, including Open Source software (no license cost).

Institutional repositories being a good way to automatically extend the dissemination of KCE reports , the KCE library decided in 2006 to evaluate the opportunity to set up such a repository at KCE.

OBJECTIVES

Identify the most efficient way to implement an institutional repository for a small institution (one person library).

METHODS

The different technical options to set up an Institutional repository were identified through several ways: learning from other by attending Conferences and workshops, keeping up to date through mailing lists, news and newsletter, professional journals, searching the Web, consulting the literature.

Setting up a specific Digital Asset Management System (DAMS) was compared to others identified options; the main criteria was the supplementary steps to add to the existing work-flow.

RESULTS

Three technical options were identified: Digital Assets Management Systems (DAMS, like GNU e-prints, DSpace,...), using the existing Integrated Library Management System (ILS), or the Website (Content Management System).

Creating an Institutional Repository (IR) based on a specific software (DAMS) was rejected: this option would imply to describe records one more time and to have a supplemental repository to manage.

Making use of the Website was not possible since the Content Management System in use a that time did not provide such a functionality.

Using the library workflow, based on the Integrated Library management System (ILS), was thus identified as most cost effective option since the roadmap of PMB, the ILS in use at KCE since 2006 (ref), listed this functionality. But the OAI-PMH server functionality of PMB not being already implemented, and without resources to develop this functionality or help to its development, it was decided to wait until this functionality was implemented by the community of developers.

In 2010, version 3.3.1 of PMB acquired the OAI-PMH server functionality, the Institutional Repository was thus activated. The records of the IR and the ILS are the same. Sets of the IR are based on the Collections and Sub Collections defined at the ILS level (KCE reports: Health Technology Assessment, Health services Research, Good Clinical Practice). The newly activated IR was then registered to ROAR, OpenDOAR, DRIVER and OAIster.

DISCUSSION

Setting up an institutional repository is very important for an organization where results of the publicly funded research are not systematically published in a peer reviewed international journals: it helps researcher from other institutions to be able to discover our reports.

IT aspects are of course important, but the human aspect turn to be the most important (1). In a small institution, with a one person library, using an existing work-flow is thus preferred to adding a supplemental information system. In this context, the ILS, where all publications of the institution are already described by a professional, appears to be the best option to host the Institutional repository: at KCE, adding a report to the IR requires only 4 supplemental clicks after the description in the ILS!

Setting up an institutional repository based on the integrated library management system is thus very cost efficient, especially when the ILS is open source. This approach also reinforce the role of the librarian in the dissemination process of the publication of the Institution.

References: 
  • Jakobsson A. Establishing an Institutional Repository: A Step by Step Approach, 10th European Conference of Medical and Health Libraries, Cluj (Romania), 14 septembre 2006.
Type of presentation: 
Poster