Consiglio Nazionale delle Ricerche

Tipo di prodottoContributo in atti di convegno
TitoloA lexicon for biology and bioinformatics: the BOOTStrep experience
Anno di pubblicazione2008
Formato
  • Elettronico
  • Cartaceo
Autore/iQuochi V.; Monachini M.; Del Gratta R.; Calzolari N.
Affiliazioni autoriIstituto di Linguistica Computazionale "A. Zampolli"
Autori CNR e affiliazioni
  • VALERIA QUOCHI
  • RICCARDO DEL GRATTA
  • MONICA MONACHINI
  • NICOLETTA ZAMORANI
Lingua/e
  • inglese
Abstract-
Lingua abstractinglese
Altro abstractThis paper describes the design, implementation and population of a lexical resource for biology and bioinformatics (the BioLexicon) developed within an ongoing European project. The aim of this project is text-based knowledge harvesting for support to information extraction and text mining in the biomedical domain. The BioLexicon is a large-scale lexical-terminological resource encoding different information types in one single integrated resource. In the design of the resource we follow the ISO/DIS 24613 "Lexical Mark-up Framework" standard, which ensures reusability of the information encoded and easy exchange of both data and architecture. The design of the resource also takes into account the needs of our text mining partners who automatically extract syntactic and semantic information from texts and feed it to the lexicon. The present contribution first describes in detail the model of the BioLexicon along its three main layers: morphology, syntax and semantics; then, it briefly describes the database implementation of the model and the population strategy followed within the project, together with an example. The BioLexicon database in fact comes equipped with automatic uploading procedures based on a common exchange XML format, which guarantees that the lexicon can be properly populated with data coming from different sources.
Lingua altro abstract-
Pagine da2285
Pagine a2292
Pagine totali-
Rivista-
Numero volume della rivista-
Serie/Collana-
Titolo del volume-
Numero volume della serie/collana-
Curatore/i del volume-
ISBN2-9517408-4-0
DOI-
Editore
  • European Language Resources Association ELRA, Paris (Francia)
Verificato da refereeSì: Internazionale
Stato della pubblicazione-
Indicizzazione (in banche dati controllate)
  • ISI Web of Science (WOS) (Codice:000324028902062)
Parole chiaveLexicon, Ontologies, Lexical database
Link (URL, URI)http://www.lrec-conf.org/proceedings/lrec2008/pdf/576_paper.pdf
Titolo convegno/congressoLREC 2008, Sixth International Conference on Language Resources and Evaluation
Luogo convegno/congressoMarrakech, Marocco
Data/e convegno/congresso26-05/1-06-2008
RilevanzaInternazionale
RelazioneContributo
Titolo parallelo-
Note/Altre informazioni-
Strutture CNR
  • ILC — Istituto di linguistica computazionale "Antonio Zampolli"
Moduli/Attività/Sottoprogetti CNR
  • IC.P02.005.001 : Risorse e Tecnologie Linguistiche: modelli, metodi di sviluppo, applicazioni, disegno di strategie internazionali
Progetti Europei-
Allegati
A lexicon for biology and bioinformatics: the BOOTStrep experience (documento privato )
Tipo documento: application/pdf

Dati storici
I dati storici non sono modificabili, sono stati ereditati da altri sistemi (es. Gestione Istituti, PUMA, ...) e hanno solo valore storico.
Area disciplinareLanguage & Linguistics
Area valutazione CIVRScienze dell'Antichità, filologico-letterarie e storico-artistiche
NoteIn: LREC - LREC 2008, Sixth International Conference on Language Resources and Evaluation (Palais des Congrès Mansour Eddahbi, Marrakech, Maroc, 26 May - 1 June 2008). Proceedings, pp. 2285 - 2292. Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Daniel Tapias (eds.). European Language Resources Association (ELRA), 2008.
Descrizione sintetica del prodottoABSTRACT: This paper describes the design, implementation and population of a lexical resource for biology and bioinformatics (the BioLexicon) developed within an ongoing European project. The aim of this project is text-based knowledge harvesting for support to information extraction and text mining in the biomedical domain. The BioLexicon is a large-scale lexical-terminological resource encoding different information types in one single integrated resource. In the design of the resource we follow the ISO/DIS 24613 "Lexical Mark-up Framework" standard, which ensures reusability of the information encoded and easy exchange of both data and architecture. The design of the resource also takes into account the needs of our text mining partners who automatically extract syntactic and semantic information from texts and feed it to the lexicon. The present contribution first describes in detail the model of the BioLexicon along its three main layers: morphology, syntax and semantics; then, it briefly describes the database implementation of the model and the population strategy followed within the project, together with an example. The BioLexicon database in fact comes equipped with automatic uploading procedures based on a common exchange XML format, which guarantees that the lexicon can be properly populated with data coming from different sources.