Author clustering using hierarchical Clustering analysis: Notebook for PAN at CLEF 2017


Por: Gómez-Adorno H., Aleman Y., Vilariño D., Sanchez-Perez M.A., Pinto D., Sidorov G.

Publicada: 1 ene 2017
Categoría: Computer science (miscellaneous)

Resumen:
This paper presents our approach to the Author Clustering task at PAN 2017. We performed a hierarchical clustering analysis of different document features: typed and untyped character n-grams, and word n-grams. We experimented with two feature representation methods, log-entropy model, and tf-idf; while tuning minimum frequency threshold values to reduce the dimensionality. Our system was ranked 1st in both subtasks, author clustering and authorship-link ranking.

Filiaciones:
Gómez-Adorno H.:
 Instituto Politécnico Nacional (IPN), Center for Computing Research (CIC), Mexico City, Mexico

Aleman Y.:
 Benemérita Universidad Autónoma de Puebla (BUAP), Faculty of Computer Science, Puebla, Mexico

Vilariño D.:
 Benemérita Universidad Autónoma de Puebla (BUAP), Faculty of Computer Science, Puebla, Mexico

Sanchez-Perez M.A.:
 Instituto Politécnico Nacional (IPN), Center for Computing Research (CIC), Mexico City, Mexico

Pinto D.:
 Benemérita Universidad Autónoma de Puebla (BUAP), Faculty of Computer Science, Puebla, Mexico

Sidorov G.:
 Instituto Politécnico Nacional (IPN), Center for Computing Research (CIC), Mexico City, Mexico
ISSN: 16130073
Editorial
CEUR-WS, Estados Unidos America
Tipo de documento: Conference Paper
Volumen: 1866 Número:
Páginas:

MÉTRICAS