Focus

The Italian language in the new generation of Internet

The Italian language in the new generation of Internet

Even in the digital era, language remains the primary and most natural
key
to knowledge. In spite of the fact that English represented so far a kind
of lingua franca of the Web, participation of national cultures to the
development of the global information society can be measured through the
quantity of documents in languages other than English available on the
web. The steeply increasing amount of multilingual texts strongly
requires
the development of human language technologies to transmit, receive,
decode and extract information from texts, with the overall goal of
optimizing the information access and exchange, while simultaneously
preserving cultural diversity.
The Istituto di Linguistica Computazionale (ILC) of CNR (National
Research
Council) in Pisa has been playing for many years a crucial role in the
development of technologies, products and services for the automatic
processing of Italian texts. ILC was among the first institutions in the
world undertaking research in the Computational Linguistics field: today
it has a widely recognised role as a Centre of Excellence both at the
international and national levels.
The research activity is carried out through constant cooperation with
the
major public and private institutions operating in the computational
linguistics field. ILC has often played a leading role in the
international community through its activities of strategic planning,
coordination of international initiatives and proposal of new research
paradigms.
The basic research carried out at ILC with both the ordinary CNR funding
and with considerable external (both European and national) funds has led
the Institute to the development of a robust platform of linguistic
resources, methods, models and tools for Italian natural language
processing. Among them it is worth mentioning:


  • basic linguistic resources, namely:

    1. large sets of formalised data, e.g. textual corpora annotated at
      the different levels of linguistic description, computational lexicons,
      ontologies, semantic networks, terminologies, grammars etc.;
    2. basic linguistic technologies and tools for multi-level analysis
      of texts;

  • tools and techniques for automatically acquiring linguistic and
    extra-linguistic knowledge (e.g. proper nouns, domain terminology) from
    wide corpora and from the web, and for tuning to specific domains; in
    this
    way innovative and dynamic linguistic resources are created which auto-expand and adapt to new contexts, oriented towards the treatment of
    multilingual semantic information and of the 'content';
  • definition of international standards for lexical, ontological,
    textual and multimodal resources.

From the application point of view, these technologies lie at the core of
a new generation of knowledge management applications that turn text
documents into structured digital knowledge. Another important effect of
these technologies is the promotion of Italian to access information in
the web, which represents a necessary prerequisite for Italian to be
included in the future of the Semantic Web. The application scenarios
based on these human language technologies include "question answering",
e-content production and management, man-machine interaction.