``Bend the truth'': Benchmark dataset for fake news detection in Urdu language and its evaluation
Por:
Amjad, Maaz, Sidorov, Grigori, Zhila, Alisa, Gomez-Adorno, Helena, Voronkov, Ilia, Gelbukh, Alexander
Publicada:
1 ene 2020
Resumen:
The paper presents a new corpus for fake news detection in the Urdu
language along with the baseline classification and its evaluation. With
the escalating use of the Internet worldwide and substantially
increasing impact produced by the availability of ambiguous information,
the challenge to quickly identify fake news in digital media in various
languages becomes more acute. We provide a manually assembled and
verified dataset containing 900 news articles, 500 annotated as real and
400, as fake, allowing the investigation of automated fake news
detection approaches in Urdu. The news articles in the truthful subset
come from legitimate news sources, and their validity has been manually
verified. In the fake subset, the known difficulty of finding fake news
was solved by hiring professional journalists native in Urdu who were
instructed to intentionally write deceptive news articles. The dataset
contains 5 different topics: (i) Business, (ii) Health, (iii) Showbiz,
(iv) Sports, and (v) Technology. To establish our Urdu dataset as a
benchmark, we performed baseline classification. We crafted a variety of
text representation feature sets including word n-grams, character
n-grams, functional word n-grams, and their combinations. After applying
a variety of feature weighting schemes, we ran a series of classifiers
on the train-test split. The results show sizable performance gains by
AdaBoost classifier with 0.87 F1(Fake) and 0.90 F1(Real). We provide the
results evaluated against different metrics for a convenient comparison
of future research. The dataset is publicly available for research
purposes.
Filiaciones:
Amjad, Maaz:
Inst Politecn Nacl, Ctr Invest Comp CIC, Mexico City, DF, Mexico
Sidorov, Grigori:
Inst Politecn Nacl, Ctr Invest Comp CIC, Mexico City, DF, Mexico
Zhila, Alisa:
Inst Politecn Nacl, Ctr Invest Comp CIC, Mexico City, DF, Mexico
Gomez-Adorno, Helena:
Univ Nacl Autonoma Mexico, Inst Invest Matemat Aplicadas & Sistemas IIMAS, Mexico City, DF, Mexico
Voronkov, Ilia:
Moscow Inst Phys & Technol, Moscow, Russia
Gelbukh, Alexander:
Inst Politecn Nacl, Ctr Invest Comp CIC, Mexico City, DF, Mexico
|