admin@publications.scrs.in   
Advancements in Communication and Systems

Unified Detection: Enhancing Information Reliability through Machine Learning Classification of Fake, Spam, and Legitimate Content

Authors: Saharsh Gupta, Ayush Goyal, Mehul Kumar, Saurav Kumar and Nishi Jain


Publishing Date: 13-09-2024

ISBN: 978-81-955020-7-3

DOI: https://doi.org/10.56155/978-81-955020-7-3-48

Abstract

The continued increase of spam and fake information in the digital age has raised serious concerns about the veracity and authenticity of digital content. To tackle this issue, we have developed a unified machine learning-based classification system thatdistinguishes between spam, fake, and legitimate information, addressing a gap in existing solutions which typically focuses either on spam or on fake information only. In this research, we collected a diverse dataset which includes both data from YouTube spam collection, email and SMS spam databases, as well as fake news from the some of the largest fake news datasets available—WELFake fake news dataset and GossipCop fake news dataset. Our approach underwent preprocessing and feature extraction steps and thereafter the implementation of different machine learning models like logistic regression, k-Nearest Neighbors, XGBoost, Extra Trees Classifier, andRandom Forest. The performance of these models, were examined using various performance metrics like accuracy, precision, recall and F1 Score. Result: the best-performing model, with the most optimal results was Extra Trees Classifier among the baseline models followed by XGBoost and Random Forest, and the Voting Classification significantly improved the accuracy of these baseline models. This unified approach offers a comprehensive solution for automating the filtering of digital content, substantially enhancing the reliability of information by simultaneously addressing multiple types of misinformation. This study contributes to the development of scalable tools that can be deployed across various platforms to ensure the integrity of digital information.

Keywords

Spam and fake information, Classification system, Machine learning, Voting Classification, Unified approach

Cite as

Saharsh Gupta, Ayush Goyal, Mehul Kumar, Saurav Kumar and Nishi Jain, "Unified Detection: Enhancing Information Reliability through Machine Learning Classification of Fake, Spam, and Legitimate Content", In: Ashish Kumar Tripathi and Vivek Shrivastava (eds), Advancements in Communication and Systems, SCRS, India, 2024, pp. 541-556. https://doi.org/10.56155/978-81-955020-7-3-48

Recent