Institute of computational linguistics "Antonio Zampolli" (ILC)

Research activities

Ever since its origins, ILC has shared its activities among different lines of research. At the beginning, the research was focused on the historical "cores" of Computational Linguistics: on the one hand, "Humanistic Text Processing" (HTP), represented by the use of computational methods and techniques to support humanistic studies with particular regard to Philology, on the other hand "Natural Language Processing" (NLP) aimed at the analysis of linguistic structures underlying the text.
Over the years, from the late '80s, important synergies have been developed between the two lines of activity, which have given rise to a strategic line of research, dedicated to the design and construction of linguistic resources and infrastructures and to the definition of representation standards shared by the scientific community.
More recently, following the current developments in the field of Computational Linguistics, an innovative line of research has been created, focused on the definition of bio-computational models of language and cognition.

The lines of research outlined above continue to be strategic areas of activity which constitute the richness and uniqueness of ILC.

RESEARCH LINES

Digital Humanities
The acquisition and knowledge of computer sciences are combined with the methodological approaches and theoretical models of text analysis and philology, thus contributing to transform the conservation, usage, study and publication of literary, archival, and library documents. The technological solutions implemented offer new possibilities and prospects for knowledge investigation and sharing and are integrated in a multimodular system with independent but interconnected modules, which allow the interaction and integration of different methods of access, management, study and revision of the text.

Natural Language Processing and Knowledge Extraction
The developed techniques allow to access automatically to the text contents and to answer to a wide range of informative needs of the speakers: from the semantic access to the text, to the assessment of the text structure as indicators of accessibility and communicative efficiency. The technological solutions meet the requirements of research and "smart" management of the information available in large document databases and can be used for many commercial applications in order to meet the needs of the society.

Linguistic Resources, Standards and Research Infrastructures
Research in the fields of language engineering and language resources production is optimized by the adoption of standards, exchange of good practices for interoperability, recycling and re-employment of the results available in terms of data and tools. Main activities are related to defining models for the creation, representation, extension and maintenance of computational lexicons, terminological and ontological repositories, corpora and language technologies. Furthermore, technological solutions are designed for the development of a distributed and cooperative research infrastructure, aimed at establishing new functions of access, interoperability, and sharing of resources and linguistic tools.

(Bio-)computational models of language usage
The factors examined are those that drive language comprehension, production, learning and variation, and their dynamic interactions. In particular, the theoretical and computational models of language usage and their empirical assessment are developed through: probabilistic models for the investigation of corpora, lexicons and linguistic data bases; computer simulations; analysis of experimental, clinical and developmental evidence. These objectives can be achieved by integrating formal models of symbolic representation with data and methodologies of disciplines that focus on the analysis of language performance in ecological contexts of communicative interaction, such as psycho- and neuro-linguistics, socio-linguistics and language teaching.