Banca dati

PaCCSS-IT (Parallel Corpus of Complex-Simple Sentences for ITalian)

Istituto

Istituto di linguistica computazionale "Antonio Zampolli" (ILC)

Referente

Felice Dell'Orletta
E-mail: felice.dellorletta@ilc.cnr.it

Descrizione

PaCCSS-IT is a corpus of Complex-Simple Aligned Sentences for Italian of about 63,000 pairs of sentences extracted from the ItWaC corpus, the largest copy-right free corpus of contemporary Italian web texts. To build the resource was developed a new approach for automatically acquiring large corpora of paired sentences containing structural transformations able to intercept structural transformations (such as deletion, reordering, etc.) and particularly suitable for text simplification.

Indirizzo internet

Url: http://www.italianlp.it/software-data/paccss-it-parallel-corpus-of-complex-simple-sentences-for-ital

Modalità di accesso

On-line

Tipologia di dati

Italian - Italian aligned sentences for text simplification

Tipo database

Corpus