Consiglio Nazionale delle Ricerche

Tipo di prodottoArticolo in rivista
TitoloWord-class embeddings for multiclass text classification
Anno di pubblicazione2021
Formato
  • Elettronico
  • Cartaceo
Autore/iMoreo A.; Esuli A.; Sebastiani F.
Affiliazioni autoriCNR-ISTI, Pisa, Italy; CNR-ISTI, Pisa, Italy; CNR-ISTI, Pisa, Italy
Autori CNR e affiliazioni
  • ANDREA ESULI
  • ALEJANDRO DAVID MOREO FERNANDEZ
  • FABRIZIO SEBASTIANI
Lingua/e
  • inglese
AbstractPre-trained word embeddings encode general word semantics and lexical regularities of natural language, and have proven useful across many NLP tasks, including word sense disambiguation, machine translation, and sentiment analysis, to name a few. In supervised tasks such as multiclass text classification (the focus of this article) it seems appealing to enhance word representations with ad-hoc embeddings that encode task-specific information. We propose (supervised) word-class embeddings (WCEs), and show that, when concatenated to (unsupervised) pre-trained word embeddings, they substantially facilitate the training of deep-learning models in multiclass classification by topic. We show empirical evidence that WCEs yield a consistent improvement in multiclass classification accuracy, using six popular neural architectures and six widely used and publicly available datasets for multiclass text classification. One further advantage of this method is that it is conceptually simple and straightforward to implement. Our code that implements WCEs is publicly available at https://github.com/AlexMoreo/word-class-embeddings.
Lingua abstractinglese
Altro abstract-
Lingua altro abstract-
Pagine da911
Pagine a963
Pagine totali-
RivistaData mining and knowledge discovery
Attiva dal 1997
Editore: Kluwer Academic Publishers - Dordrecht ;
Paese di pubblicazione: Stati Uniti d'America
Lingua: inglese
ISSN: 1384-5810
Titolo chiave: Data mining and knowledge discovery
Numero volume della rivista35
Fascicolo della rivista3
DOI10.1007/s10618-020-00735-3
Verificato da refereeSì: Internazionale
Stato della pubblicazionePublished version
Indicizzazione (in banche dati controllate)
  • Scopus (Codice:2-s2.0-85101294470)
  • ISI Web of Science (WOS) (Codice:000619695200001)
Parole chiaveMachine learning, Text classification, Language models, Neural networks, Deep learning
Link (URL, URI)https://link.springer.com/article/10.1007/s10618-020-00735-3
Titolo parallelo-
Licenza-
Scadenza embargo18/02/2022
Data di accettazione-
Note/Altre informazionipreprint depositato anche in ArXiv: https://arxiv.org/abs/1911.11506
Strutture CNR
  • ISTI — Istituto di scienza e tecnologie dell'informazione "Alessandro Faedo"
Moduli/Attività/Sottoprogetti CNR-
Progetti Europei
Allegati
WORD-CLASS EMBEDDINGS FOR MULTICLASS TEXT CLASSIFICATION
Descrizione: Pre-print
Tipo documento: application/pdf
Word-class embeddings for multiclass text classification (documento privato )
Descrizione: Published version
Tipo documento: application/pdf