logo
ResearchBunny Logo
The language of happiness in self-reported descriptions of happy moments: Words, concepts, and entities

Linguistics and Languages

The language of happiness in self-reported descriptions of happy moments: Words, concepts, and entities

A. Moreno-ortiz, C. Pérez-hernández, et al.

Explore the fascinating language of happiness in this insightful study by Antonio Moreno-Ortiz, Chantal Pérez-Hernández, and María García-Gámez. Utilizing text analytics on the HappyDB corpus, the research uncovers how sentiment words and semantic classes reveal our sources of happiness, shining a light on the influence of commercial products and services. Dive in to discover the linguistic expressions that shape our joy!

00:00
00:00
Playback language: English
Introduction
Happiness research spans numerous fields, with self-reported assessments being a common method despite limitations like recency bias and cultural context. The language of happiness itself presents challenges due to the lack of exact semantic equivalents across languages. Social media offers a readily available source of text data for studying sentiment and emotions. Previous studies have used sentiment dictionaries to analyze social media text, but this approach has limitations, including incomplete lexicon coverage, the challenges of multiword expressions and contextual variations, and the fact that happiness expressions might not always contain sentiment-laden words. This study uses the HappyDB corpus to investigate the sources of happiness mentioned in self-reported happy moments, considering these limitations. The research questions address the relevance of sentiment words, differences in sentiment word usage across happiness categories, and the identification of happiness sources as materialized in language.
Literature Review
Seligman's theory of authentic happiness posits three sources: pleasure, engagement, and meaning. Other researchers have emphasized the roles of social interactions, work, and leisure. Visakko and Voutilainen added mental states, social entities, and material possessions, distinguishing between happiness itself and happiness as an experience. Kahneman's distinction between the experiencing self and the remembering self further complicates the expression of happiness. Existing sentiment lexicons, like the Harvard General Inquirer, MPQA, and SenticNet, offer varying degrees of sentiment information but have limitations in multiword expression coverage and contextual understanding.
Methodology
The study uses the HappyDB corpus, a crowd-sourced collection of over 100,000 self-reported happy moments. After data cleaning to remove duplicates and low-quality entries, the remaining 91,608 moments were analyzed using several NLP techniques. Basic text statistics were computed, including word counts and sentence lengths. Sentiment analysis was performed using both a machine-learning approach (DistilBERT) and a lexicon-based approach (Lingmotif). Keyword extraction and clustering used the TextRank algorithm and K-means clustering. Named entity recognition (NER) employed a hybrid approach combining neural architecture and linguistic pattern matching. The analysis included the examination of keywords and their groupings within HappyDB's seven categories: ACHIEVEMENT, AFFECTION, BONDING, ENJOY THE MOMENT, EXERCISE, LEISURE, and NATURE.
Key Findings
Basic text statistics revealed high variability in the length of responses and a surprisingly low average number of adjectives. Sentiment analysis showed that a significant portion of happy moments were not classified as positive, indicating that happiness is not exclusively expressed through positive words. A chi-square test revealed significant differences in the proportion of positive classifications across categories. Lexicon analysis identified common positive terms across categories but also category-specific expressions. Keyword analysis revealed conceptual classes associated with each HappyDB category. For example, ACHIEVEMENT contained clusters related to consumer products, work, school, and health. AFFECTION and BONDING showed different keyword clusters, despite initial similarity, with AFFECTION focusing on family and BONDING on friends. ENJOY THE MOMENT focused on food and entertainment, while EXERCISE emphasized physical activities. LEISURE featured entertainment, and NATURE focused on pleasant weather and natural landscapes. Named entity analysis revealed a significant number of commercial brands and products, highlighting the role of consumerism in the expression of happiness.
Discussion
The findings suggest that happiness is not solely expressed through positive words but involves a broader linguistic landscape including neutral and even negative elements. The discrepancy between expected positive classifications and actual results highlights the limitations of relying solely on sentiment words to measure happiness. The lexical specificity of negative items, compared to the more generic positive items, supports the negative bias notion, where negative events have a larger impact on psychological state. The keyword analysis revealed different types of happiness: external happiness associated with social interactions, and internal happiness associated with individual accomplishments. Named entity analysis demonstrated the considerable influence of commercial products and services on self-reported happiness.
Conclusion
This study demonstrates that the automatic measurement of self-reported happiness requires approaches beyond simple sentiment analysis. The diverse NLP techniques used effectively identified happiness sources beyond traditional social and leisure activities, highlighting the role of newness and commercial products. Future research could explore cross-cultural variations in happiness expression and investigate the long-term impact of consumerism on well-being.
Limitations
The HappyDB corpus has limitations, including potential sampling biases due to the crowd-sourcing method and age demographics. The data cleaning process, while thorough, may have inadvertently removed some valid data points. The study focuses solely on the English language, limiting the generalizability of findings to other linguistic contexts.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny