Exploring Text Representations for Detecting Automatically Generated Text

Por: Villegas-Trejo Z., Gómez-Adorno H., Ojeda-Trueba S.-L.

Publicada: 1 ene 2023

Categoría: Computer science (miscellaneous)

Web: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85175298558&partnerID=40&md5=7f0b626b542a66f2bdae9552b036edd5

Resumen:
In today's rapidly advancing world of technology, artificial intelligence (AI) models have emerged that can generate text automatically. It has become increasingly challenging to discern the difference between machine-generated text and human-written text simply by reading it. This capability of AI poses a problem when it comes to creating fake content or malicious use of these models. This article presents our approach to the AuTexTification task at IberLEF 2023, focusing on two subtasks. The first subtask involves binary classification, distinguishing between text written by humans and text generated by AI. The second subtask is a multi-class problem involving six text generation models (A, B, C, D, E, and F). Both subtasks are conducted in English and Spanish languages. Our objective is to accurately determine whether a given text is authored by a human or generated by AI and also to detect the text generation model used. We extract features such as Bag-of-Words (BoW), N-gram structure, and others. Experimental evaluation is performed using Logistic Regression, Random Forest, and Support Vector Machine algorithms. Our results demonstrate that incorporating additional features improves the accuracy of text identification. © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

Filiaciones:
Villegas-Trejo Z.:
Facultad de Ciencias, Universidad Nacional Autónoma de México, Mexico City, Mexico

Gómez-Adorno H.:
Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Mexico City, Mexico

Ojeda-Trueba S.-L.:
Instituto de Ingeniería, Universidad Nacional Autónoma de México, Mexico City, Mexico

ISSN: 16130073

CEUR Workshop Proceedings

Editorial
CEUR-WS, Estados Unidos America

Tipo de documento: Conference Paper
Volumen: 3496 Número:
Páginas:

Exploring Text Representations for Detecting Automatically Generated Text

MÉTRICAS