Database

CItA (Corpus Italiano di Apprendenti L1)

Institute

Institute of computational linguistics "Antonio Zampolli" (ILC)

Referent

Felice Dell'Orletta
Email: felice.dellorletta@ilc.cnr.it

Description

CItA (Corpus Italiano di Apprendenti L1), is the first freely available and digitalized corpus of essays written by Italian L1 learners. It was collected in 7 different lower secondary schools located in different areas of Rome: 3 schools are in the historical center and 4 schools in suburbs. The current version of the corpus contains 1,353 essays (for a total of 369,456 tokens) manually annotated for errors and corrections, but it is constantly updated. It is also accompanied by a questionnaire including 34 questions about biographical, socio-cultural and sociolinguistic background of students. The resource was jointly compiled by the ItaliaNLP Lab and the experimental pedagogists of the Department of Psychology of Developmental Processes and Socialization at the Sapienza University of Rome.

Web address

Url: http://www.italianlp.it/software-data/cita-corpus-italiano-di-apprendenti-l1/

Access mode

Freely downloadable from the Internet

Data tipology

Textual corpus

Database type

Corpus