Full Program »
Detection of Racism On Multilingual Social Media: An Nlp Approach
This paper presents a comparison between various text vectorization and machine learn-ing algorithms for solving the problem of detection of racism on multi-lingual social media. We train classification models on Facebook comments and tweets in three differ-ent languages: English, French and Arabic. Our findings suggest that for the English-language comments, the combination of KNN with TF-IDF works best with an accuracy of 78.34%, while for French, the use of the SVM classifier with BOW provides an accu-racy of 82.56%. For Arabic we obtain an accuracy of 91.13% when KNN is coupled with BOW. Overall, our results suggest that the combination of SVM and TF-IDF is the best choice for detection of racism on social media that contains content in English, French and Arabic at the same time. As part of this work, we also present a new annotat-ed dataset of social media comments in three languages.