UPM School of Computing researchers open up a new road for the computational representation of languages

Researchers create an intelligent computational model of the descriptive grammar of the Spanish language

[16 July 2008]. Researchers from the Validation and Business Applications Group (VAI) at the Universidad Politécnica de Madrid’s School of Computing (FIUPM) have developed an intelligent computational model of the descriptive grammar of the Spanish language. This opens up new possibilities for the computational representation of languages and natural language processing applications.

Computational linguistics draws primarily on linguistic theories to build language representation models for computational applications. The linguistic theories are formal (i.e. mathematically expressible) models. It takes from 5 to 10 years to develop a model in any particular language, whereas the coverage of the resulting model 55%. In other words, coverage is very limited and the cost is huge. This is an obstacle to languages, apart from English or more dominant languages, having useful applications.

To overcome this hurdle, the researchers Carolina Gallardo and Jesús Cardeñosa have examined the possibility of using descriptive grammars in place of linguistic theories. Even though they are not formal, descriptive grammars do represent the real language use.

Despite their not very formal “look”, descriptive grammars do contain a great deal of linguistic knowledge, these researchers explain. The real strength of these descriptive grammars is that they exist for all languages, they are low cost and they can be used in the absence of linguistic experts.

The School of Computing researchers experimented with the Royal Academy of the Spanish Language’s Descriptive Grammar of the Spanish Language (GDLE) and built a computational model that will be applicable to descriptive grammars of other languages. The innovation of this research is that knowledge elicitation methodologies proper to knowledge engineering (a branch of artificial intelligence) were applied to the GDLE, which was used as a source of knowledge.

This model has been tested on a blackboard-based application, which is one of artificial intelligence undisputed designs for somewhat complex distributed applications. It has been tested on numerous cases and the results are promising.

The model will be useful for building natural language processing applications ranging from language analysis to generation and will be applicable to any language where a natural language model with reasonable coverage needs to be developed relatively quickly.

A preview of this work was published in the proceedings of IKE’08 (2008 International Conference on Information and Knowledge Engineering), held at Las Vegas (USA) from 14^th to 17^th July.