Focus

The GATTO and GattoWeb Software System

GATTO©
In order to build TLIO (Tesoro della Lingua Italiana delle Origini -
[Ancient Italian Language Treasure]), the Institute "Opera del
Vocabolario Italiano" has for many years been making use of a
lexicographic software entirely designed and developed in-house.
This software, called GATTO© (Gestione degli Archivi Testuali del Tesoro
delle Origini - [Management of TLIO Textual Archives] - Copyright CNR
1999), is a tool for building electronic textual corpora based on texts,
previously typed and appropriately tagged with standard word processors.
So far the text files, written in ANSI, are encoded by a custom mark-up system.
A new version, accepting both Unicode characters and the XML mark-up system, is forthcoming.
Users are allowed to modify the corpora by texts insertion or removal.
Other sections of the program offer corpora lemmatization tools, which
operate by marking specific occurrencies by corresponding lemmas or
putting general links between forms and lemmas.
A multiple-levels lemmatization can be obtained by grouping forms or
lemmas by means of so called hyper-lemmas which can be organized in a
hierarchic structure.
Texts with associated lemmas and hyper-lemmas can be copied outside of
the corpus, modified using standard word processors and finally reinserted
into the original corpus or in a new one.
Each corpus, no matter whether it has been lemmatized or not, can be
searched to extract lexicographic informations from all texts or their
subsets, from time to time dynamically defined.
Searches aim at locating, inside corpus, occurrencies of specific forms -
which can be directly defined, obtained on the basis of their links with
lemmas or hyper-lemmas, or selected by means of properties like
grammatical category and disambiguators.
Proximity searching, including punctuation signs, is also possible;
results consist in a series of displayed contexts, which can be saved as
RTF files. Other actions which can be applied to corpora are generation
of indices locorum, form lists, lemma lists, incipitaria (lists of first
lines), numerical and graphical statistics.
Several options, available in all searching phases, allow to match
program's behaviour with very different requirements. As an example, it's
possible to copy and paste into an ad hoc window a piece taken from a
text to find out the occurrencies, inside the corpus, of forms included in
such a piece, at the same time highlighting the ones not encountered throughout the corpus.
At present GATTO consists of more than 50.000 code lines. When a search task is carried out; the software makes use of a database containing all information taken out from texts, instead of the text files contents. It's a specialized tool, thought and made for well-defined application fields, so it isn't easy to learn; for the above reason it has been equipped with a complete
documentation, both printed on paper and contextual. Nevertheless,
because of the wide range of its available functions, other research
groups outside OVI are using it.
GATTO is freely distributed at OVI's Internet Web
site www.ovi.cnr.it, along with full electronic documentation, tutorials and demo data.

GattoWeb(TM)
Corpora prepared for GATTO can be converted to the on-line use with the Web version of GATTO, named GattoWeb(TM). Functions offered, restricted to the search tasks, are quite the same as GATTO. Just like GATTO, GattoWeb bases its search operations on a relational database engine.
Starting at the web address gattoweb.ovi.cnr.it, lexicographical searches can be carried out on both the OVI textual corpus and other corpora managed by GattoWeb.