Author clustering using hierarchical Clustering analysis: Notebook for PAN at CLEF 2017
Por:
Gómez-Adorno H., Aleman Y., Vilariño D., Sanchez-Perez M.A., Pinto D., Sidorov G.
Publicada:
1 ene 2017
Categoría:
Computer science (miscellaneous)
Resumen:
This paper presents our approach to the Author Clustering task at PAN 2017. We performed a hierarchical clustering analysis of different document features: typed and untyped character n-grams, and word n-grams. We experimented with two feature representation methods, log-entropy model, and tf-idf; while tuning minimum frequency threshold values to reduce the dimensionality. Our system was ranked 1st in both subtasks, author clustering and authorship-link ranking.
Filiaciones:
Gómez-Adorno H.:
Instituto Politécnico Nacional (IPN), Center for Computing Research (CIC), Mexico City, Mexico
Aleman Y.:
Benemérita Universidad Autónoma de Puebla (BUAP), Faculty of Computer Science, Puebla, Mexico
Vilariño D.:
Benemérita Universidad Autónoma de Puebla (BUAP), Faculty of Computer Science, Puebla, Mexico
Sanchez-Perez M.A.:
Instituto Politécnico Nacional (IPN), Center for Computing Research (CIC), Mexico City, Mexico
Pinto D.:
Benemérita Universidad Autónoma de Puebla (BUAP), Faculty of Computer Science, Puebla, Mexico
Sidorov G.:
Instituto Politécnico Nacional (IPN), Center for Computing Research (CIC), Mexico City, Mexico
|