logo
ResearchBunny Logo
Decoding violence against women: analysing harassment in Middle Eastern literature with machine learning and sentiment analysis

Humanities

Decoding violence against women: analysing harassment in Middle Eastern literature with machine learning and sentiment analysis

H. Q. Low, P. Keikhosrokiani, et al.

This groundbreaking study by Hui Qi Low, Pantea Keikhosrokiani, and Moussa Pourya Asl employs advanced natural language processing and machine learning techniques to delve into the nuances of sexual harassment as depicted in twelve Middle Eastern novels. With a remarkable 75.8% accuracy in harassment classification, the findings reveal a predominance of negative sentiment, even in instances of physical harassment. Discover the fascinating insights from their analysis!

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses the challenge of extracting reliable, structured insights on sexual harassment from Middle Eastern literary texts, a task hindered by the scale of data, cognitive limits, and interpretive biases in manual analysis. It aims to (1) identify and classify instances of sexual harassment in Anglophone Middle Eastern literature into physical and non-physical offenses, and (2) characterize the sentiments and emotions associated with these instances. The work situates itself in the context of rising reported harassment in the region and the increasing use of computational methods to complement traditional interpretive scholarship. By leveraging NLP, rule-based detection, machine learning, and deep learning, the authors seek a scalable, less biased approach for typology detection and sentiment/emotion characterization to inform broader understanding and intervention efforts.
Literature Review
Background outlines three sexual harassment categories—gender harassment, unwanted sexual attention, and sexual coercion—and discusses how patriarchal norms, honor cultures, and modesty expectations in Middle Eastern societies shape and perpetuate these behaviors, often silencing victims and rationalizing harassment. The review surveys computational literary studies and text classification techniques (e.g., Rocchio, boosting/bagging, logistic regression, naïve Bayes, k-NN, SVM, decision trees, random forests, CRF, semi-supervised methods; and deep learning families such as RNNs, attention, memory-augmented, GNNs, Siamese networks, and transformer-based models). Prior harassment-related NLP works include Rezvan et al. (2020) on harassment classification on Twitter (GBM outperforming others), Wright et al. (2017) on street harassment narratives with POS-based analysis, and Yin et al. (2009) on online harassment detection with hybrid models. Sentiment/emotion analysis techniques are contrasted between supervised approaches and lexicon-based methods (SentWordNet, MPQA, LIWC), noting trade-offs (domain specificity vs. semantic coverage). Related works include Alwaneh/Alavi et al. (2021) on harassment detection with multiple ML classifiers (SGD/Linear SVC ~80% accuracy) and Aslam et al. (2022) showing LSTM-GRU outperforming for sentiment (0.99) and emotion (0.91) on Twitter cryptocurrency data. The review emphasizes both the promise and biases of computational approaches in social and literary analysis.
Methodology
Data: Twelve Anglophone novels set in Middle Eastern contexts (e.g., Barakat’s Balcony on the Moon, Mikhail/Weiss’s The Beekeeper, Navai’s City of Lies, etc.). Texts (EPUB/PDF) were converted to plain text using EbookLib and PyPDF2. Preprocessing: Sentence tokenization (NLTK), contraction expansion, POS tagging, word tokenization (alphabetic-only regex), lowercasing, stopword removal (plus frequent verbs and character names), and lemmatization. Counts across the corpus total 58,458 sentences (with per-book statistics reported). Rule-based detection and manual labeling: A published sexual harassment lexicon (Rezvan et al., 2020) was used to flag sentences (initial 570 sentences or ~0.1% of the corpus). Human review assessed conceptual relevance: 108 sentences were deemed true sexual harassment instances. These were labeled with (a) sexual harassment type (gender harassment, unwanted sexual attention, sexual coercion) and (b) offense type (physical vs. non-physical). Distribution: 65 physical and 43 non-physical instances. Examples illustrate that lexical hits may not imply harassment without context, necessitating manual interpretation. Model 1: Sexual harassment offense-type classification (physical vs. non-physical). Features: TF-IDF; dimensionality reduction via PCA. Data split: 70/30 train/test on 108 labeled instances. Algorithms: k-NN, logistic regression (LR), random forest (RF), multinomial naïve Bayes (MNB), stochastic gradient descent (SGD), and support vector classification (SVC). Baseline models were built and then tuned using GridSearchCV (hyperparameter grids provided). Evaluation metrics: accuracy, precision, recall, F1. Sentiment and emotion labeling (for analysis and DL training): Lexicon-based sentiment (NLTK VADER) produced positive/negative/neutral/compound; compound was mapped to three labels (negative <0, positive >0, neutral =0). Emotions were extracted with Text2Emotion (happy, angry, surprise, sad, fear); the highest-scoring emotion per sentence was assigned as the label. Model 2: Deep learning sentiment classification and emotion classification. Using the full sentence set (~58k sentences) labeled via lexicons, an LSTM-GRU sequential model with GloVe embeddings was built. Architecture: Embedding, LSTM(128, return_sequences=True) + Dropout(0.5), GRU(64) + Dropout(0.5), Dense output with softmax. Optimization: Adam, categorical_crossentropy; trained for up to ~20 epochs with early selection (e.g., best at ~6 epochs for sentiment). Outputs: 3-way sentiment classifier and 5-way emotion classifier. Performance was measured via accuracy (with sample distributions and plots). Word cloud visualizations were produced for nouns and verbs in harassment sentences.
Key Findings
- Corpus scale and detection: 58,458 sentences processed across 12 novels. Rule-based lexicon flagged 570 sentences (~0.1%); manual review identified 108 conceptually harassing sentences, comprising 65 physical and 43 non-physical offense instances. - Sexual harassment offense-type classification (physical vs. non-physical): Reported performance across algorithms (test set). Table indicates accuracies roughly as follows: LR 0.758, RF 0.760, with others lower (k-NN 0.636; MNB 0.606; SGD 0.667; SVC 0.606). The paper highlights LR achieving 75.8% accuracy and being selected as the final model. - Sentiment distribution (lexicon-based) over the full corpus: Positive 18,653; Neutral 24,451; Negative 15,354. - Sentiment classification (LSTM-GRU): 3-label classifier achieved 84.5% accuracy (selected epoch to avoid overfitting). - Emotion analysis and classification: Lexicon-based analysis suggests fear and surprise are prevalent emotions in harassment contexts; physical harassment shows a higher range/intensity for fear than non-physical. The LSTM-GRU 5-label emotion classifier achieved 80.8% accuracy. - Qualitative lexical patterns: Word clouds/n-grams showed frequent nouns and verbs related to women, rape, family, and fear, reflecting victim experiences and contexts in the narratives.
Discussion
The framework successfully addresses the research aims by detecting and classifying sexual harassment instances in Middle Eastern Anglophone literature and characterizing associated sentiment/emotion. The combination of rule-based detection with manual validation overcame the pitfalls of purely lexical matching, improving precision in identifying true harassment content. Supervised ML could then operate on curated labels to distinguish physical vs. non-physical offenses with competitive performance (LR ~75.8% accuracy). Lexicon-based sentiment labeling at scale enabled training an LSTM-GRU sentiment classifier achieving strong accuracy (84.5%), while emotion detection and classification indicated that harassment-related passages skew negatively, with fear and surprise particularly prominent—especially in physical harassment. These findings reinforce theoretical understandings of harassment’s psychological and social impacts and demonstrate the feasibility and value of computational literary analysis to derive structured insights from extensive texts. The results provide an empirical basis for future, larger-scale analyses and potentially inform policy, education, and advocacy by elucidating patterns and emotional contexts of harassment in cultural narratives.
Conclusion
The study presents a computational framework for mining sexual harassment in Anglophone Middle Eastern literature, integrating rule-based detection, manual annotation, classical ML for offense-type classification, and LSTM-GRU deep learning for sentiment and emotion classification. The ML classifier for physical vs. non-physical harassment attained approximately 75.8% accuracy (logistic regression), and the LSTM-GRU sentiment model achieved 84.5% accuracy; the emotion classifier reached 80.8%. Analyses indicate that harassment-related sentences predominantly exhibit negative sentiment, with physical harassment associated with stronger fear signals than non-physical harassment. The work demonstrates a scalable approach for typology identification and affective profiling in literary corpora. Future directions include: expanding and diversifying datasets; involving multiple lexicons and expert annotators to reduce labeling bias; exploring ensemble and transformer-based models; adapting the framework to non-English texts; and tailoring models specifically to harassment-related sentences to improve domain performance.
Limitations
- Lexicon-based detection and labeling can introduce bias and false positives/negatives; many lexical hits do not imply harassment without context, necessitating manual review. - Small labeled dataset for offense-type classification (108 instances) limits model generalizability and stability. - Sentiment/emotion labels for deep learning were derived from lexicons across heterogeneous sentences (not solely harassment-related), which may weaken topic specificity. - Cultural and domain specificity: findings and models are tailored to Anglophone Middle Eastern literature and may not generalize to other languages or regions without adaptation. - Training time and potential class imbalance issues; limited interpretability for some models. - Subjectivity remains in manual annotation; involving few annotators can lead to systematic biases.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny