Spanish word embeddings learned on word association norms


Por: Gómez-Adorno H., Reyes-Magaña J., Bel-Enguix G., Sierra G.

Publicada: 1 ene 2019
Categoría: Computer science (miscellaneous)

Resumen:
Word embeddings are vector representations of words in an n-dimensional space used for many natural language processing tasks. A large training corpus is needed for learning good quality word embeddings. In this work, we present a method based on the node2vec algorithm for learning embeddings based on paths in a graph. We used a collection of Word Association Norms in Spanish to build a graph of word connections. The nodes of the network correspond to the words in the corpus, whereas the edges correspond to a pair of words given in a free association test. We evaluated our word vectors in human annotated benchmarks, achieving better results than those trained on a billion-word corpus such as, word2vec, fasttext, and glove. © 2019 CEUR-WS. All rights reserved.

Filiaciones:
Gómez-Adorno H.:
 Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Ciudad de México, Mexico

Reyes-Magaña J.:
 Instituto de Ingeniería, Universidad Nacional Autónoma de México, Ciudad de México, Mexico

 Facultad de Matemáticas, Universidad Autónoma de Yucatán, Mérida, Yucatán, Mexico

Bel-Enguix G.:
 Instituto de Ingeniería, Universidad Nacional Autónoma de México, Ciudad de México, Mexico

Sierra G.:
 Instituto de Ingeniería, Universidad Nacional Autónoma de México, Ciudad de México, Mexico
ISSN: 16130073
Editorial
CEUR-WS, Estados Unidos America
Tipo de documento: Conference Paper
Volumen: 2369 Número:
Páginas:

MÉTRICAS