Detecting Hate Speech on Brazilian Social Media: New Dataset and Analysis

Autores

  • Felipe Ramos de Oliveira UFRJ - Universidade Federal do Rio de Janeiro
  • Victoria Dias Reis UFRJ - Universidade Federal do Rio de Janeiro
  • Nelson Francisco Favilla Ebecken UFRJ

DOI:

https://doi.org/10.55592/cilamce.v6i06.8208

Palavras-chave:

Dataset, classification, machine learning

Resumo


Social media plays a crucial role in human interaction, facilitating communication and self-expression. However, the proliferation of hate speech on these platforms poses significant risks to individuals and communities. Detecting and addressing hate speech is particularly challenging in languages like Portuguese due to its rich vocabulary, complex grammar, and regional variations. To address this challenge, we introduce TuPy-E, the largest annotated Portuguese corpus dedicated to hate speech detection. Through a comprehensive analysis utilizing advanced techniques such as BERT and GPT-2 models, our research contributes to both academic understanding and practical applications in this field.

Downloads

Publicado

2024-12-02

Edição

Seção

Computational Intelligence Techniques for Optimization and Data Modeling