Classifying Hate Speech Using a Two-Layer Model
Social media and other online sites are being increasingly scrutinized as platforms for cyberbullying and hate speech. Many machine learning algorithms, such as support vector machines, have been adopted to create classification tools to identify and potentially filter patterns of negative speech. While effective for prediction, these methodologies yield models that are difficult to interpret. In addition, many studies focus on classifying comments as either negative or neutral, rather than further separating negative comments into subcategories. To address both of these concerns, we introduce a two-stage model for classifying text. With this model, we illustrate the use of internal lexicons, collections of words generated from a pre-classified training dataset of comments that are specific to several subcategories of negative comments. In the first stage, a machine learning algorithm classifies each comment as negative or neutral, or more generally target or nontarget. The second stage of model building leverages the internal lexicons (called L2CLs) to create features specific to each subcategory. These features, along with others, are then used in a random forest model to classify the comments into the subcategories of interest. We demonstrate our approach using two sets of data. Supplementary materials for this article are available online.