Tweets classification using corpus dependent tags, character and POS N-grams
Por:
González-Gallardo C.E., Montes A., Sierra G., Núñez-Juárez J.A., Salinas-López A.J., Ek J.
Publicada:
1 ene 2015
Categoría:
Computer Science (miscellaneous)
Resumen:
This paper is part of the Author Profiling task at PAN 2015 contest; in witch participants had to predict the gender, age and personality traits of Twitter users in four different languages (Spanish, English, Italian and Dutch). Our approach takes into account stylistic features represented by character N-grams and POS N-grams to classify tweets. The main idea of using character N-grams is to extract as much information as possible that is encoded inside the tweet (emoticons, character flooding, use of capital letters, etc.). POS N-grams were obtained using Freeling and certain token were relabeled with Twitter de-pendent tags. Obtained results were very satisfactory; our global ranking score was of 83.46%.
Filiaciones:
González-Gallardo C.E.:
Grupo de Ingeniería Lingüística, Instituto de Ingeniería, UNAM, Mexico
Montes A.:
Grupo de Ingeniería Lingüística, Instituto de Ingeniería, UNAM, Mexico
Sierra G.:
Grupo de Ingeniería Lingüística, Instituto de Ingeniería, UNAM, Mexico
Núñez-Juárez J.A.:
Grupo de Ingeniería Lingüística, Instituto de Ingeniería, UNAM, Mexico
Salinas-López A.J.:
Grupo de Ingeniería Lingüística, Instituto de Ingeniería, UNAM, Mexico
Ek J.:
Grupo de Ingeniería Lingüística, Instituto de Ingeniería, UNAM, Mexico
|