Exploring Text Representations for Detecting Automatically Generated Text
Por:
Villegas-Trejo Z., Gómez-Adorno H., Ojeda-Trueba S.-L.
Publicada:
1 ene 2023
Categoría:
Computer science (miscellaneous)
Resumen:
In today's rapidly advancing world of technology, artificial intelligence (AI) models have emerged that can generate text automatically. It has become increasingly challenging to discern the difference between machine-generated text and human-written text simply by reading it. This capability of AI poses a problem when it comes to creating fake content or malicious use of these models. This article presents our approach to the AuTexTification task at IberLEF 2023, focusing on two subtasks. The first subtask involves binary classification, distinguishing between text written by humans and text generated by AI. The second subtask is a multi-class problem involving six text generation models (A, B, C, D, E, and F). Both subtasks are conducted in English and Spanish languages. Our objective is to accurately determine whether a given text is authored by a human or generated by AI and also to detect the text generation model used. We extract features such as Bag-of-Words (BoW), N-gram structure, and others. Experimental evaluation is performed using Logistic Regression, Random Forest, and Support Vector Machine algorithms. Our results demonstrate that incorporating additional features improves the accuracy of text identification. © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
Filiaciones:
Villegas-Trejo Z.:
Facultad de Ciencias, Universidad Nacional Autónoma de México, Mexico City, Mexico
Gómez-Adorno H.:
Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Mexico City, Mexico
Ojeda-Trueba S.-L.:
Instituto de Ingeniería, Universidad Nacional Autónoma de México, Mexico City, Mexico
|