Article ID: 2025SMP0005
Social networks have become essential communication channels; however, they simultaneously enable the propagation of harmful content that undermines societal well-being. Existing methods for detecting harmful posts predominantly use binary classification frameworks, which fail to distinguish between specific harmful content types and encounter significant challenges with class imbalance when extended to multiclass scenarios. In this study, we present a novel heterogeneous graph-based approach for the multiclass classification of harmful social media content, specifically addressing the multiclass imbalance problem inherent in this domain. We propose a structure-aware oversampling technique that extends the heterogeneous graph transformer architecture to identify three distinct categories of harmful content: misinformation, biased opinions, and inflammatory rhetoric. Our method generates synthetic nodes while preserving the complex interconnections characteristic of social media networks by enhancing the GraphSMOTE algorithm with network-specific constraints. These constraints maintain the semantic integrity of user-post and post-element relationships while addressing class imbalance. In extensive experiments on Japanese COVID-19 vaccine-related social media data, we demonstrated our method's effectiveness.