PaCCSS-IT (Parallel Corpus of Complex-Simple Sentences for ITalian)
Institute
Institute of computational linguistics "Antonio Zampolli" (ILC)
Referent
Felice Dell'Orletta
Email: felice.dellorletta@ilc.cnr.it
Description
PaCCSS-IT is a corpus of Complex-Simple Aligned Sentences for Italian of about 63,000 pairs of sentences extracted from the ItWaC corpus, the largest copy-right free corpus of contemporary Italian web texts. To build the resource was developed a new approach for automatically acquiring large corpora of paired sentences containing structural transformations able to intercept structural transformations (such as deletion, reordering, etc.) and particularly suitable for text simplification.
Web address
Url: http://www.italianlp.it/software-data/paccss-it-parallel-corpus-of-complex-simple-sentences-for-ital
Access mode
On-line
Data tipology
Italian - Italian aligned sentences for text simplification
Database type
Corpus