Tweets classification using corpus dependent tags, character and POS N-grams

Por: González-Gallardo C.E., Montes A., Sierra G., Núñez-Juárez J.A., Salinas-López A.J., Ek J.

Publicada: 1 ene 2015

Categoría: Computer Science (miscellaneous)

Web: https://www.scopus.com/inward/record.uri?eid=2-s2.0-84982787149&partnerID=40&md5=c046b253958114512b5d0d865286830b

Resumen:
This paper is part of the Author Profiling task at PAN 2015 contest; in witch participants had to predict the gender, age and personality traits of Twitter users in four different languages (Spanish, English, Italian and Dutch). Our approach takes into account stylistic features represented by character N-grams and POS N-grams to classify tweets. The main idea of using character N-grams is to extract as much information as possible that is encoded inside the tweet (emoticons, character flooding, use of capital letters, etc.). POS N-grams were obtained using Freeling and certain token were relabeled with Twitter de-pendent tags. Obtained results were very satisfactory; our global ranking score was of 83.46%.

Filiaciones:
González-Gallardo C.E.:
Grupo de Ingeniería Lingüística, Instituto de Ingeniería, UNAM, Mexico

Montes A.:
Grupo de Ingeniería Lingüística, Instituto de Ingeniería, UNAM, Mexico

Sierra G.:
Grupo de Ingeniería Lingüística, Instituto de Ingeniería, UNAM, Mexico

Núñez-Juárez J.A.:
Grupo de Ingeniería Lingüística, Instituto de Ingeniería, UNAM, Mexico

Salinas-López A.J.:
Grupo de Ingeniería Lingüística, Instituto de Ingeniería, UNAM, Mexico

Ek J.:
Grupo de Ingeniería Lingüística, Instituto de Ingeniería, UNAM, Mexico

ISSN: 16130073

CEUR Workshop Proceedings

Editorial
CEUR-WS, Estados Unidos America

Tipo de documento: Conference Paper
Volumen: 1391 Número:
Páginas:

Tweets classification using corpus dependent tags, character and POS N-grams

MÉTRICAS