Research activities | Consiglio Nazionale delle Ricerche

Institute of computational linguistics "Antonio Zampolli" (ILC)

Ever since its origins, ILC has shared its activities among different lines of research. At the beginning, the research was focused on the historical "cores" of Computational Linguistics: on the one hand, "Humanistic Text Processing" (HTP), represented by the use of computational methods and techniques to support humanistic studies with particular regard to Philology, on the other hand "Natural Language Processing" (NLP) aimed at the analysis of linguistic structures underlying the text.
Over the years, from the late '80s, important synergies have been developed between the two lines of activity, which have given rise to a strategic line of research, dedicated to the design and construction of linguistic resources and infrastructures and to the definition of representation standards shared by the scientific community.
More recently, following the current developments in the field of Computational Linguistics, an innovative line of research has been created, focused on the definition of bio-computational models of language and cognition.

The lines of research outlined above continue to be strategic areas of activity which constitute the richness and uniqueness of ILC.

RESEARCH LINES

Digital Humanities
Development of models, methods and techniques for the preservation, intelligent use, linguistic study (diachronic, synchronic, comparative) and philological study (ecdotic and interpretative) of texts of interest to the Social Sciences and Humanities, with a focus on historical and literary texts.
The acquisitions and knowledge of the Computer Sciences are combined with the methodological approaches and theoretical models of Text Analysis and Philology, thus contributing to the transformation of the ways in which literary, archival and library documents are preserved, used, studied and published.

Natural Language Processing and Knowledge Management
Development of methods, models and techniques based on symbolic and probabilistic algorithms and neural networks for Natural Language Processing (NLP) tasks in its different varieties of use and with a focus on the Italian language, and for the extraction and representation of knowledge encoded within texts.
The technological solutions proposed meet the needs of "intelligent" information research and management within large document bases in continuous evolution and can be used in numerous applications to meet the needs of society.

Language Resources, Standards and Research Infrastructures
Development and management of language resources (computational lexicons, terminological and ontological repositories, corpora), with a focus on the representation of data according to international standards that guarantee their sharing, interoperability and long-term preservation in line with Open Science principles.
The technological solutions developed in this area are aimed at the development of a distributed and cooperative research infrastructure to establish new access, interoperability and sharing functionalities for language resources and tools.

(Bio-)Computational models of language usage
Analysis of the factors governing the processes of comprehension, production, learning and variation of a language, and the dynamic interactions between them. In particular, theoretical models of language use and their empirical verification are developed through: probabilistic methods for the study of corpora, lexicons and databases; computational simulations; study of linguistic evidence of an experimental, clinical and acquisitional nature.
The methodologies of formal representation and symbolic modelling are combined with the methods, data and investigative tools of disciplinary fields more oriented to the analysis of language use in purposeful and controlled contexts, such as Psycho- and Neuro-Linguistics, Sociolinguistics and Glottodidactics.