Using PiTagger for Lemmatization and PoS Tagging of a Spontaneous Speech Corpus: C-Oral-Rom Italian (Contributo in atti di convegno)

Type
Label
  • Using PiTagger for Lemmatization and PoS Tagging of a Spontaneous Speech Corpus: C-Oral-Rom Italian (Contributo in atti di convegno) (literal)
Anno
  • 2004-01-01T00:00:00+01:00 (literal)
Alternative label
  • Panuzzi A., Picchi E., Moneglia M. (2004)
    Using PiTagger for Lemmatization and PoS Tagging of a Spontaneous Speech Corpus: C-Oral-Rom Italian
    in LREC 2004: Fourth International Conference on Language Resources and Evaluation, Lisbona, 26-27-28 May 2004
    (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autori
  • Panuzzi A., Picchi E., Moneglia M. (literal)
Pagina inizio
  • 563 (literal)
Pagina fine
  • 566 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#url
  • http://www.lrec-conf.org/lrec2004/ (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#titoloVolume
  • Proceedings: in LREC 2004: Fourth International Conference on Language Resources and Evaluation (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#volumeInCollana
  • 2 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#note
  • LREC 2004: Fourth International Conference on Language Resources and Evaluation, held in Memory of Antonio Zampolli. Lisbon, Portugal, 26th, 27th & 28 May 2004. Proceedings, Volume II, Paris, The European Language Resources Association (ELRA). 563-568. (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#pagineTotali
  • 4 (literal)
Note
  • PuMa (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#affiliazioni
  • CNR ILC Pisa (literal)
Titolo
  • Using PiTagger for Lemmatization and PoS Tagging of a Spontaneous Speech Corpus: C-Oral-Rom Italian (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#isbn
  • 2-9517408-1-6 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autoriVolume
  • M.T.Lino, M.F.Xavier, F.Ferreira, R.Costa, R.Silvia (literal)
Abstract
  • The automatic lemmatization and morpho-syntactic annotation of spoken language is a quite recent and complex task for Natural Language Processing. The state of the art on written corpora don't provide us with a satisfactory level of analysis regarding spontaneous spoken language (Uchimoto et al., 2002; Moreno & Guirao, 2003). The spontaneous speech corpus Italian C-ORALROM has been tagged with Part of Speech (Pos) and morpho-syntactic information, using and adapting an already existing tool trained on Italian written resources (PiTagger, developed by Eugenio Picchi, ILC-CNR Pisa). The incidence of spoken domain on the performance is within a 10% of errors detected in the manual evaluation procedure. Some issues concerning spoken language emerged. The definition of significant contexts for PoS statistics is to be provided by utterance boundaries; moreover, the relevance of a series of phenomena related to the prosodic parsing has been highlighted: fragmentation phenomena, a relative lack of information for all word adjacent to utterance boundaries; under-specification of PoS for words in connection to secondary prosodic breaks and one word utterances. (literal)
Editore
Prodotto di
Autore CNR
Insieme di parole chiave

Incoming links:


Prodotto
Autore CNR di
Editore di
Insieme di parole chiave di
data.CNR.it