Fake News Classification in 2024 News Articles

Session Number

CMPS(ai) 15

Advisor(s)

Courtland VanDam, MIT Lincoln Laboratory

Discipline

Computer Science

Start Date

17-4-2025 10:45 AM

End Date

17-4-2025 11:00 AM

Abstract

Strong machine learning models for identifying fake news have been developed due to the spread of false information in digital news outlets. Using a labeled dataset, this study investigates how well different classification and embedding strategies can differentiate between fake and authentic news. We compare deep learning designs like convolutional neural networks (CNNs) and transformers with conventional machine learning classifiers like logistic regression, support vector machines, and random forests. In order to evaluate the effects of word embedding techniques on classification performance, we also examine Word2Vec, TF-IDF, and BERT embeddings. According to our findings, transformer-based models—in particular, refined BERT variants— perform better than conventional methods in terms of precision and recall, making better use of contextual semantics. However, lightweight models utilizing TF-IDF with logistic regression provide competitive performance with significantly lower computational costs.

Share

COinS
 
Apr 17th, 10:45 AM Apr 17th, 11:00 AM

Fake News Classification in 2024 News Articles

Strong machine learning models for identifying fake news have been developed due to the spread of false information in digital news outlets. Using a labeled dataset, this study investigates how well different classification and embedding strategies can differentiate between fake and authentic news. We compare deep learning designs like convolutional neural networks (CNNs) and transformers with conventional machine learning classifiers like logistic regression, support vector machines, and random forests. In order to evaluate the effects of word embedding techniques on classification performance, we also examine Word2Vec, TF-IDF, and BERT embeddings. According to our findings, transformer-based models—in particular, refined BERT variants— perform better than conventional methods in terms of precision and recall, making better use of contextual semantics. However, lightweight models utilizing TF-IDF with logistic regression provide competitive performance with significantly lower computational costs.