27 July 2010. Bioinformatics resources have proliferated in the years since the end of the Human Genome Project, obliging researchers to spend a quite a lot of time browsing the web in search of these resources. The Universidad Politécnica de Madrid' s Biomedical Informatics Group (GIB) based at the Facultad de Informática has developed an innovative methodology, the first capable of discovering and automatically classifying bioinformatics resources from the scientific literature.
Nowadays the scientific community has access to many on-line bioinformatics resources. This number grows exponentially day by day. In biomedical research, the scientific community has access to more and more resources generated by researchers -databases, software, multiple resources-, which should speed up scientific progress. Discovering, locating and learning how to use new applications has a cost -especially in terms of time- that most researchers cannot afford. For this reason, existing resources need to be organized to make these search tasks as straightforward as possible.
Led by Prof. Víctor Maojo, a team of researchers from the GIB at the UPM's Facultad de Informática (Guillermo de la Calle, Miguel García-Remesal, Diana de la Iglesia and Stefano Chiesa) have developed an innovative methodology designed to discover, retrieve and automatically classify bioinformatic resources from specialized scientific literature. The developed index of resources is freely available via the web application hosted at the server.
Natural Language Processing
The methodology is based on natural language processing and artificial intelligence techniques used to retrieve and automatically classify key information contained in scientific articles, primarily abstracts. Each article is analysed morphologically, syntactically and semantically in search of a series of set patterns that are able to automatically identify the names, functionality, access URL and, in some cases, the resource inputs and outputs without user intervention.
Additionally, the resources are classified by two dimensions: (i) the application domain (e.g. DNA or proteins) and (ii) the category (functionality/type) of the resource (e.g. alignment, database or annotation). For the purposes of classification, the application uses a taxonomy of domains and categories specially designed for this purpose and based on other existing taxonomies (for example, BLD - Bioinformatics Links Directory).
To validate the methodology, the UPM group ran a preliminary experiment on 400 articles indexed in the ISI Web of Knowledge. A search was run with the "bioinformatics resources" string and selected the top 392 most relevant articles by impact factor. The others articles were unrelated to bioinformatics resources and were entered as a control group to verify method robustness. A total of 376 names of resources were automatically retrieved from the above set of resources. This amounts to a success rate of almost 95%.
Additionally, a web services-based web application has been built for the scientific community to use to access the index and search resources by name, category and domain.
The key advantage of this method over existing resource indexes is that it is automatically created and updated. As it is a general-purpose methodology, it is being applied as part of the European ACTION-Grid project, the first European Grid Computing, Biomedical Informatics and Nanoinformatics Initiative, coordinated by Prof. Víctor Maojo.
Both the methodology and the results were published in leading sector congresses and journals, like BMC Bioinformatics: Guillermo de la Calle, Miguel García-Remesal, Stefano Chiesa, Diana de la Iglesia and Victor Maojo. BIRI: a new approach for automatically discovering and indexing available public bioinformatics resources from the literature. BMC Bioinformatics 2009, 10:320doi:10.1186/1471-2105-10-320 (2009).
This item on other websites
Medical News Today 29.07.2010
Thats Today 29.07.2010
Scientific Computing 29.07.2010
AlphaGalileo (english) 28.07.2010
Tendencias Informáticas 27.07.2010
Joven Club 27.07.2010
Enter The Grid 27.07.2010
ACM TechNews 27.07.2010