Focus

From text to knowledge: language technologies for Knowledge Management

The huge and ever growing amount of multimedia products and contents notwithstanding, the vast majority of digitised unstructured information we avail ourselves of for work, study or to meet the practical needs of our daily lives, is still conveyed through texts. Effective access to this information is much more than simply being able to locate and use digitised texts: it requires their quick, intelligent and responsive indexing and selection, in keeping with specific goals and perspectives. The growing development of computer technologies specifically aimed at the automated acquisition and management of text data is intended to meet these needs, since having too much information is tantamount to having no information. However, in spite of recent progress in this area, technologies such as Information Retrieval, Text Data Mining and Text classification are confronted with a classical bottleneck: content access requires an understanding of the linguistic structures representing content in texts. There is no knowledge without language knowledge.

The T2K (Text to Knowledge) system, designed and developed by the Institute of Computational linguistics together with the Department of Linguistics of the University of Pisa is aimed at offering a sophisticated battery of integrated tools for natural language processing, statistical text analysis and machine language learning, which are dynamically integrated to provide an accurate representation of the content of vast repositories of unstructured documents. Text interpretation ranges from acquisition of lexical and terminological resources, to advanced syntax and discourse representations and ontological/conceptual mapping. Interpretation results are annotated as XML metadata, thus offering the further bonus of a growing interoperability with automated content management systems aimed at personalised knowledge profiling.

Prototype versions of the system are currently running on public administration portals and have been used for indexing e-learning materials. We are currently working on an integration of T2K technology into the scientific document management system of CNR.

Immagini: