A Comparative Study of Fake News Classification in Brazilian Portuguese using Retrieval-Augmented Generation

Autores

  • Victoria Reis
  • Felipe Ramos de Oliveira
  • Nelson Francisco Favilla Ebecken

Palavras-chave:

Fake news , RAG, portuguese

Resumo

This study explores the use of a supervised Retrieval-Augmented Generation (RAG) pipeline for automatic misinformation classification in Brazilian Portuguese. The method combines semantic retrieval based on dense embeddings with traditional machine learning classification using TF–IDF and Support Vector Machine. Experiments were conducted on the largest publicly available Brazilian Portuguese fake news dataset, previously consolidated by the authors from multiple sources, and performance was compared with results from the literature using classical methods such as SVM. The results indicate that while supervised RAG provides competitive performance, its gains over traditional approaches may be limited when the dataset is balanced and linguistically homogeneous. A detailed error analysis is presented, and the potential of supervised RAG in more challenging and low-resource scenarios is discussed.

Publicado

2025-12-01

Edição

Seção

Artigos