RADICAL IDEOLOGY MINING IN ARABIC TWEETS USING MACHINE LEARNING
Abstract
Nowadays, the internet and social media platforms are being misused by extremists and terrorists to spread their propaganda, disseminate their messages, and recruit new members. Arabic is the primary language used by extremist Islamists. While there is a significant body of research on English-language content, there is little work on Arabic text processing for extracting the main idea. Automated processing of Arabic dialects is challenging due to the lack of orthographic standards and the scarcity of annotated data and public resources. Compared to English, extracting the main idea from Arabic texts remains immature, with fewer publications and resources. The lack of studies on detecting extremism in Islamic networks, the linguistic ambiguity, and the use of metaphorical texts are some of the most challenging problems facing Arabic NLP researchers. To address the limited availability of data, the dataset of 40,000 Arabic tweets presented in this research has been carefully tagged and filtered to include both radical and non-radical tweets. Machine Learning (ML) was employed to automate the identification of extremist content. The model was trained using TF-IDF features and evaluated on 20,004 test samples with a Support Vector Machine (SVM) using the RBF kernel, achieving an accuracy of 91%.
Downloads
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.