PaCCSS-IT (Parallel Corpus of Complex-Simple Sentences for ITalian)
Istituto
Istituto di linguistica computazionale "Antonio Zampolli" (ILC)
Referente
Felice Dell'Orletta
E-mail: felice.dellorletta@ilc.cnr.it
Descrizione
PaCCSS-IT is a corpus of Complex-Simple Aligned Sentences for Italian of about 63,000 pairs of sentences extracted from the ItWaC corpus, the largest copy-right free corpus of contemporary Italian web texts. To build the resource was developed a new approach for automatically acquiring large corpora of paired sentences containing structural transformations able to intercept structural transformations (such as deletion, reordering, etc.) and particularly suitable for text simplification.
Indirizzo internet
Url: http://www.italianlp.it/software-data/paccss-it-parallel-corpus-of-complex-simple-sentences-for-ital
Modalità di accesso
On-line
Tipologia di dati
Italian - Italian aligned sentences for text simplification
Tipo database
Corpus