Database

PaCCSS-IT (Parallel Corpus of Complex-Simple Sentences for ITalian)

Institute

Institute of computational linguistics "Antonio Zampolli" (ILC)

Referent

Felice Dell'Orletta
Email: felice.dellorletta@ilc.cnr.it

Description

PaCCSS-IT is a corpus of Complex-Simple Aligned Sentences for Italian of about 63,000 pairs of sentences extracted from the ItWaC corpus, the largest copy-right free corpus of contemporary Italian web texts. To build the resource was developed a new approach for automatically acquiring large corpora of paired sentences containing structural transformations able to intercept structural transformations (such as deletion, reordering, etc.) and particularly suitable for text simplification.

Web address

Url: http://www.italianlp.it/software-data/paccss-it-parallel-corpus-of-complex-simple-sentences-for-ital

Access mode

On-line

Data tipology

Italian - Italian aligned sentences for text simplification

Database type

Corpus