Institute of computational linguistics "Antonio Zampolli" (ILC)

Expertise

The research and development activities of ILC fall into four main areas of expertise:

Digital Humanities
Development of models, methods and techniques for the preservation, intelligent use, linguistic study (diachronic, synchronic, comparative) and philological study (ecdotic and interpretative) of texts of interest to the Social Sciences and Humanities, with a focus on historical and literary texts.
The acquisitions and knowledge of the Computer Sciences are combined with the methodological approaches and theoretical models of Text Analysis and Philology, thus contributing to the transformation of the ways in which literary, archival and library documents are preserved, used, studied and published.

Natural Language Processing and Knowledge Management
Development of methods, models and techniques based on symbolic and probabilistic algorithms and neural networks for Natural Language Processing (NLP) tasks in its different varieties of use and with a focus on the Italian language, and for the extraction and representation of knowledge encoded within texts.
The technological solutions proposed meet the needs of "intelligent" information research and management within large document bases in continuous evolution and can be used in numerous applications to meet the needs of society.

Language Resources, Standards and Research Infrastructures
Development and management of language resources (computational lexicons, terminological and ontological repositories, corpora), with a focus on the representation of data according to international standards that guarantee their sharing, interoperability and long-term preservation in line with Open Science principles.
The technological solutions developed in this area are aimed at the development of a distributed and cooperative research infrastructure to establish new access, interoperability and sharing functionalities for language resources and tools.

(Bio-)computational Models of Language Usage
Analysis of the factors governing the processes of comprehension, production, learning and variation of a language, and the dynamic interactions between them. In particular, theoretical models of language use and their empirical verification are developed through: probabilistic methods for the study of corpora, lexicons and databases; computational simulations; study of linguistic evidence of an experimental, clinical and acquisitional nature.
The methodologies of formal representation and symbolic modelling are combined with the methods, data and investigative tools of disciplinary fields more oriented to the analysis of language use in purposeful and controlled contexts, such as Psycho- and Neuro-Linguistics, Sociolinguistics and Glottodidactics.

The variety of the competence areas makes the Institute a unique reality on the national and international scene. The competences developed in the single areas are combined in a creative, innovative and productive manner within the different research projects, collaborations and laboratories, and involve different professional skills and expertise that extend across the disciplines of Linguistics, Computational Linguistics, Computer Science and Bio-Engineering.

Main research topics

o text analysis
o automatic multilevel linguistic annotation of text
o textual and multimodal, mono and multi-language corpora
o Digital Humanities
o knowledge extraction from domain document bases
o digital philology
o Information extraction and retrieval
o research infrastructures
o computational, mono- and multi-language lexicons
o digital lexicography
o minority languages
o machine learning and deep learning
o computational models of language use
o terminological repertoires and ontologies
o semantic web
o linguistic simplification
o sentiment analysis and opinion mining
o definition of representation standards for language resources
o text mining
o Computer-Assisted Translation
o Natural Language Processing (NLP)
o assessment of language competence