Consiglio Nazionale delle Ricerche

Tipo di prodottoContributo in atti di convegno
TitoloFlexible Acquisition of Subcategorization Frames in Italian
Anno di pubblicazione2012
FormatoElettronico
Autore/iCaselli, Tommaso; Frontini, Francesca; Quochi, Valeria; Rubino, Francesco and Russo, Irene
Affiliazioni autoriIstituto di Linguistica Computazionale "A. Zampolli", CNR, Italy
Autori CNR e affiliazioni
  • TOMMASO CASELLI
  • IRENE RUSSO
  • FRANCESCA FRONTINI
  • FRANCESCO RUBINO
  • VALERIA QUOCHI
Lingua/e
  • inglese
AbstractLexica of predicate-argument structures constitute a useful tool for several tasks in NLP. This paper describes a web-service system for automatic acquisition of verb subcategorization frames (SCFs) from parsed data in Italian. The system acquires SCFs in an unsupervised manner. We created two gold standards for the evaluation of the system, the first by mixing together information from two lexica (one manually created and the second automatically acquired) and manual exploration of corpus data and the other annotating data extracted from a specialized corpus (environmental domain). Data filtering is accomplished by means of the maximum likelihood estimate (MLE). The evaluation phase has allowed us to identify the best empirical MLE threshold for the creation of a lexicon (P=0.653, R=0.557, F1=0.601). In addition to this, we assigned to the extracted entries of the lexicon a confidence score based on the relative frequency and evaluated the extractor on domain specific data. The confidence score will allow the final user to easily select the entries of the lexicon in terms of their reliability: one of the most interesting feature of this work is the possibility the final users have to customize the results of the SCF extractor, obtaining different SCF lexica in terms of size and accuracy.
Lingua abstractinglese
Altro abstract-
Lingua altro abstract-
Pagine da2842
Pagine a2848
Pagine totali7
Rivista-
Numero volume della rivista-
Serie/Collana-
Titolo del volumeProceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)
Numero volume della serie/collana-
Curatore/i del volumeNicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet U?ur Do?an, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis
ISBN9782951740877
DOI-
Editore
  • European Language Resources Association ELRA, Paris (Francia)
Verificato da refereeSì: Internazionale
Stato della pubblicazione-
Indicizzazione (in banche dati controllate)
  • DBLP (Codice:conf/lrec/CaselliRFRQ12)
  • ISI Web of Science (WOS) (Codice:000323927702149)
Parole chiavelexicon, automatic acquisition, subcategorisation frames
Link (URL, URI)http://www.lrec-conf.org/proceedings/lrec2012/summaries/390.html
Titolo convegno/congressoEight International Conference on Language Resources and Evaluation (LREC'12)
Luogo convegno/congressoIstanbul, Turkey
Data/e convegno/congresso23-25 Maggio 2012
RilevanzaInternazionale
RelazioneContributo
Titolo parallelo-
Note/Altre informazioni-
Strutture CNR
  • ILC — Istituto di linguistica computazionale "Antonio Zampolli"
Moduli/Attività/Sottoprogetti CNR
  • IC.P02.005.001 : Risorse e Tecnologie Linguistiche: modelli, metodi di sviluppo, applicazioni, disegno di strategie internazionali
Progetti Europei
Allegati