Detection and Identification of Fake News: Binary Content Classification with Pre-trained Language Models
Fake news has emerged as a critical problem for society and professional journalism. Many individuals consume their news via online media, such as social networks and news websites. Therefore, the demand for automatic fake news detection is increasing. There is still no agreed upon definition for fake news, since it can include various concepts, such as clickbait, propaganda, satire, hoaxes, and rumors. This results in a broad landscape of machine learning approaches, which have a varying
... ave a varying accuracy in detecting fake news. This masterthesis focused on a binary content-based classification approach, with a bidirectional Transformer ( BERT ), to detect fake news in online articles. BERT creates a pretrained language model during training and is fine-tuned on a labeled dataset. The FakeNewsNet dataset is used to test two variants of the model (cased / uncased) with articles, using only the body text, the title, and a concatenation of both. Additionally, both models were tested with different preprocessing steps. The models gain in all 29 carried out experiments high accuracy results, without overfitting. Using the body text and the concatenation resulted in five models with an accuracy of 87% after testing, whereas using only titles resulted in 84%. This shows that short statements could be already enough for fake news detection using language models. Also, the preprocessing steps seem to have no major impact on the predictions. It is concluded that transformer models, such as BERT , are a promising approach to detect fake news, since it achieves notable results, even without using a large dataset.