Spanish word embeddings learned on word association norms
Por:
Gómez-Adorno H., Reyes-Magaña J., Bel-Enguix G., Sierra G.
Publicada:
1 ene 2019
Categoría:
Computer science (miscellaneous)
Resumen:
Word embeddings are vector representations of words in an n-dimensional space used for many natural language processing tasks. A large training corpus is needed for learning good quality word embeddings. In this work, we present a method based on the node2vec algorithm for learning embeddings based on paths in a graph. We used a collection of Word Association Norms in Spanish to build a graph of word connections. The nodes of the network correspond to the words in the corpus, whereas the edges correspond to a pair of words given in a free association test. We evaluated our word vectors in human annotated benchmarks, achieving better results than those trained on a billion-word corpus such as, word2vec, fasttext, and glove. © 2019 CEUR-WS. All rights reserved.
Filiaciones:
Gómez-Adorno H.:
Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Ciudad de México, Mexico
Reyes-Magaña J.:
Instituto de Ingeniería, Universidad Nacional Autónoma de México, Ciudad de México, Mexico
Facultad de Matemáticas, Universidad Autónoma de Yucatán, Mérida, Yucatán, Mexico
Bel-Enguix G.:
Instituto de Ingeniería, Universidad Nacional Autónoma de México, Ciudad de México, Mexico
Sierra G.:
Instituto de Ingeniería, Universidad Nacional Autónoma de México, Ciudad de México, Mexico
|