?>

Go to content

Home
ETSIINF en Twitter ETSIINF en Facebook
Inicio > About us > Press office > Euralex 2008

New tool for building thesauruses

Developed by the VAI, it is to be presented at the Euralex Congress next July

23 April 2008. The Validation and Business Applications Group (VAI), from the Universidad Politécnica de Madrid’s School of Computing (FIUPM), is to present Tesaurvai, a thesaurus building software tool, at the 13th Euralex International Congress, which is to be held at Barcelona from  July 15 to 19 next.

Tesaurvai can extract, annotate and organize specialized terms taken from a collection of digitalized texts. Tesaurvai complies with the ISO thesaurus building standard and was developed by the VAI in conjunction with the Spanish National Research Council’s Institute of Documentary Studies on Science and Technology (formerly CINDOC).

Euralex is Europe’s most influential lexicographical congress. The InfoLex research group, based at the Universidad Pompeu Fabra’s College of Applied Linguistics is organizing the 2008 event, which will bring together professional lexicographers, publishers, researchers, specialists and anyone with an interest in dictionaries of any kind.

2 in 1

Tesaurvai’s key innovation is that it combines a terminology extractor capable of ordering and selecting from 1- to 10-word terms with ISO standard-compliant thesaurus building capabilities in the same tool. The extractor identifies the terms located in digital texts that are to be transferred to the thesaurus builder. The thesaurus is a systematized list of domain-representative terms.

Tesaurvai conforms to international thesaurus building and management standards and has several implementations. First, the tool can build thesauruses from scratch, through information extraction to term creation, edition and annotation. It is easy to use to establish relationships between terms and run basic and advanced word searches. Second, the Tesaurvai tool can import and export text thesauruses to XML files. Finally, it can build alphabetical and systematized indices, which can be exchanged for printing or exportation as reports.

Available as of 2008

The tool has been developed in Java and works on a database. Tesaurvai is compatible with any database manager equipped with Java Database (JDBC) connectivity.

It was developed as part of the “Cultural heritage document search based on multilingual technical resources” (Patrilex) project, supported by the Ministry of Education with the aim of generating a methodology and tools for building multilingual lexical resources.

Tesaurvai is now undergoing massive testing. As of July 2008 it will be available to any Internet user.