The italian dictionary (OVI)

Research activities

1) The compilation of the Tesoro della Lingua Italiana delle Origini (TLIO). The TLIO is a new dictionary compiled directly from the text contained in the OVI databases. In the years 1996-97 the norms for the redaction were elaborated (later to be published in the Bollettino dell'OVI, vol. III, 1998 and in the OVI website); they have has been slightly modified in the course of years (the latest can be read in the OVI website). The first 1000 entries were ready at the end of 1998. The compilation has continued with an average of ca. 2000 entries a year, at the end of 2006 there were ca. 17,000 entries; in 2007 and in 2008 only 1000 new entries a year could be edited, because of insufficient funding. From 2009 the compilation is proceeding with the usal production rate, reaching in November 2020 about 45,000 entries delivered. The final dimensions have been estimated at ca. 57,000 entries.
The TLIO is published on-line on the Internet at the OVI website. At present, the web version of the dictionary can be consulted by entries as well as by word forms (which yield the entries were the forms in question are included), with different search options; to this can be added the option of free text search in the definitions. The entries are signed by compilators, and a special search function makes it possible to view a list of of all the entries compiled by any redactor. A new integrated system for compilating and the searching entries is being prepared, which will eventually lead to a completely structured dictionary.

2) The implementation, developement and maintenance of textual databases. The most important is the "Corpus TLIO per il vocabolario" (Old Italian database); it is also an independent linguistical research tool and is searchable through the Internet. In August 2020, it contains 29,208,359 word occurrencies in 2948 texts (536,261 different written forms). It includes the text of the Corpus TLIO, lemmatized and searchable separatly, that is the reference corpus for the compilation of the TLIO (22,029,916 occurrencies in 2729 texts and 479,510 different written forms). In the database are included, indexed and made searchable all the existing texts in reliable editions, written in a variety of Italian before the year 1375 (or datable within the XIV century). Less systematically, it also includes texts written in the beginning of XV century, texts in unsatisfactory editions but of enough importance to merit inclusion all the same (these are tagged in a way to remind database users of the quality of the text) and texts in other linguistical varieties (e.g. Friulan and Gallo-Romance dialects), deemed however useful for the study of Old Italian. The database has been partially lemmatized, and the lemmatisation work is still ongoing (4,243,012 occurrences lemmatized).

3) The development of software for lexicographical and linguistical purposes, in particular for the management of textual corpuses similar to the one upon which the TLIO is based. In particular, GATTO (version 4) permits to create, modify and manage textual databases (i.e. databases containing complete texts), to introduce lemmatization on different levels (lemmas and 'hyperlemmas'); GattoWeb permits to search on the Web corpora created using GATTO. OVI is elaborating a new tool for the editing and online publishing of dictionaries like TLIO (which is searcheable online, at present, thanks to a software also produced by OVI). Researches on authomatic lemmatization also are presently carried on, facing the specific problems of Old Italian texts.