Consiglio Nazionale delle Ricerche

Tipo di prodottoArticolo in rivista
TitoloMultilingual POS Tagging by a Composite Deep Architecture Based on Character-Level Features and On-the-Fly Calculation of Enriched Word Embeddings
Anno di pubblicazione2019
Formato-
Autore/iM. Pota, F. Marulli, M. Esposito, G. De Pietro, H. Fujita
Affiliazioni autoriInstitute for High Performance Computing and Networking - National Research Council of Italy (ICAR-CNR), Naples, Italy Faculty of Software and Information Science, Iwate Prefectural University, Iwate, 020-0193, Japan
Autori CNR e affiliazioni
  • FIAMMETTA MARULLI
  • GIUSEPPE DE PIETRO
  • MASSIMO ESPOSITO
  • MARCO POTA
Lingua/e
  • inglese
AbstractNatural Language Processing (NLP) field is taking great advantage from adopting models and methodologies from Artificial Intelligence. In particular, Part-Of-Speech (POS) tagging is a building block for many NLP applications. In this paper, a POS tagging system based on a deep neural network is proposed. It is made of a static and task-independent pre-trained model for representing words semantics enriched by morphological information, by approximating the Word Embedding representation learned from an unlabelled corpus by the fastText model, so as to handle consistently common and known words as well as rare and Out-of-Vocabulary words. A character-level representation of words is dynamically learned according to the POS tagging task, and is concatenated to the previous one. This joint representation is fed to the main network, comprising a Bi-LSTM layer, trained to associate a sequence of tags to a sequence of words. The effectiveness of the contributions of the proposed system with respect to the state-of-the-art is proven by an extensive experimental campaign, which provides evidence that improvements are gained in POS tagging accuracy by using Word Embeddings enriched with morphological information, by estimating embeddings for both known and unknown words, and by concatenating Word Embeddings with character-level information of the same size. Similar trends are obtained for two languages of different characteristics, namely English and Italian: in both cases, the overall accuracy on the POS tagging test set was increased with respect to the most advanced existing systems, with particular improvements on the accuracy of Out-of-Vocabulary words. Finally, the method has a general basis, and could be proficiently used for all languages, particularly for those showing a wide morphological richness.
Lingua abstractinglese
Altro abstract-
Lingua altro abstract-
Pagine da309
Pagine a323
Pagine totali15
RivistaKnowledge-based systems
Attiva dal 1987
Editore: Butterworths, - London
Paese di pubblicazione: Regno Unito
Lingua: inglese
ISSN: 0950-7051
Titolo chiave: Knowledge-based systems
Titolo proprio: Knowledge-based systems.
Titolo abbreviato: Knowl.-based syst.
Numero volume della rivista164
Fascicolo della rivista-
DOI10.1016/j.knosys.2018.11.003
Verificato da refereeSì: Internazionale
Stato della pubblicazionePublished version
Indicizzazione (in banche dati controllate)-
Parole chiaveNLP, POS tagging, Deep neural networks, Bi-LSTM, Out of Vocabulary
Link (URL, URI)https://doi.org/10.1016/j.knosys.2018.11.003
Titolo parallelo-
Licenza-
Scadenza embargo-
Data di accettazione-
Note/Altre informazioni-
Strutture CNR
  • ICAR — Istituto di calcolo e reti ad alte prestazioni
Moduli/Attività/Sottoprogetti CNR
  • DIT.AD022.050.001 : Sistemi Cognitivi
Progetti Europei-
Allegati
PDF (documento privato )
Tipo documento: application/pdf