Tweets classification using corpus dependent tags, character and POS N-grams


Por: González-Gallardo C.E., Montes A., Sierra G., Núñez-Juárez J.A., Salinas-López A.J., Ek J.

Publicada: 1 ene 2015
Categoría: Computer Science (miscellaneous)

Resumen:
This paper is part of the Author Profiling task at PAN 2015 contest; in witch participants had to predict the gender, age and personality traits of Twitter users in four different languages (Spanish, English, Italian and Dutch). Our approach takes into account stylistic features represented by character N-grams and POS N-grams to classify tweets. The main idea of using character N-grams is to extract as much information as possible that is encoded inside the tweet (emoticons, character flooding, use of capital letters, etc.). POS N-grams were obtained using Freeling and certain token were relabeled with Twitter de-pendent tags. Obtained results were very satisfactory; our global ranking score was of 83.46%.

Filiaciones:
González-Gallardo C.E.:
 Grupo de Ingeniería Lingüística, Instituto de Ingeniería, UNAM, Mexico

Montes A.:
 Grupo de Ingeniería Lingüística, Instituto de Ingeniería, UNAM, Mexico

Sierra G.:
 Grupo de Ingeniería Lingüística, Instituto de Ingeniería, UNAM, Mexico

Núñez-Juárez J.A.:
 Grupo de Ingeniería Lingüística, Instituto de Ingeniería, UNAM, Mexico

Salinas-López A.J.:
 Grupo de Ingeniería Lingüística, Instituto de Ingeniería, UNAM, Mexico

Ek J.:
 Grupo de Ingeniería Lingüística, Instituto de Ingeniería, UNAM, Mexico
ISSN: 16130073
Editorial
CEUR-WS, Estados Unidos America
Tipo de documento: Conference Paper
Volumen: 1391 Número:
Páginas:

MÉTRICAS