logo
ResearchBunny Logo
A classification and recognition algorithm of key figures in public opinion integrating multidimensional similarity and K-shell based on supernetwork

Interdisciplinary Studies

A classification and recognition algorithm of key figures in public opinion integrating multidimensional similarity and K-shell based on supernetwork

G. Wang, Y. Wang, et al.

Discover an innovative classification algorithm for identifying key figures in online public opinion, developed by Guanghui Wang, Yushan Wang, Kaidi Liu, and Shu Sun. This research showcases how integrating multidimensional similarities and K-shell analysis within a four-dimensional communication framework outperforms traditional methods, providing valuable insights through a case study of the China Eastern Airlines incident.

00:00
00:00
~3 min • Beginner • English
Introduction
The proliferation of social media has intensified the frequency and complexity of online public opinion events, making their management challenging due to anonymity, virtual identities, and decentralised communication. Key figures (users) drive the formation, evolution, and spread of public opinion; however, traditional analyses often focus on explicit ties (e.g., forwarding) and single-layer networks, overlooking implicit ties such as emotions, opinions, and roles, and typically identifying only one role (opinion leader). This study addresses the gap by introducing a supernetwork-based approach that models the multidimensional nature of public opinion dissemination on Sina Weibo, enabling simultaneous identification of multiple roles of key figures: opinion leaders (global cores), focus figures (local cores), and communication figures (bridge/structural hole spanners). The purpose is to improve fine-grained recognition of key figures, filter pseudocore nodes, and better support emergency management and cyberspace governance.
Literature Review
Prior work shows public opinion dissemination is shaped by temporal, spatial, and social distances; user attributes (emotions, opinions, motivation) affect polarization and misinformation spread. Government strategies and social media platforms also influence dissemination. Social network analysis methods identify influential nodes via topology-based metrics: K-shell, mixed-degree/lowest-degree decompositions, CI centrality, multilayer/multidimensional networks, and supernetwork approaches (e.g., SuperedgeRank). Opinion dynamics models and structural hole detection leverage shortest paths, average distance, and weighted centrality. Recent machine learning methods (e.g., SRBM+, DeepInf, semantic ML systems) predict social influence using structure and attributes. Multidimensional similarity, rooted in the law of attraction, leverages content, time, and topology similarities for leader identification and link prediction. Gaps remain: most methods rely on single data dimensions or static structures, offer limited interpretability for complex structured/unstructured data, and yield coarse-grained identification focusing mainly on opinion leaders while neglecting role differentiation and bridge nodes.
Methodology
Overview: The study constructs a four-layer supernetwork of public opinion (social-psychology-opinion-convergent) and proposes a classification and recognition algorithm combining a multidimensional similarity index with K-shell decomposition. An influence index fuses both measures and a quadrant-based rule classifies nodes into opinion leaders, focus figures, communication figures, or ordinary figures. Supernetwork model (SNP): SNP is defined over four layers with interlayer superedges: - Social network (Gs): nodes are users; directed, weighted edges represent forwarding relationships. Adjacency SL_ij = 1 if user i forwarded user j, else 0. - Opinion network (Gk): nodes are opinions (keywords/topics); edges connect opinions co-occurring in the same post (KL_ij = 1 if co-occur). - Psychological network (Gp): nodes are emotion categories (positive, neutral, negative) with transformation/association edges; PL encodes whether emotions are directly related or transformable. - Convergent network (Gc): nodes are roles (opinion leader, focus figure, communication figure, ordinary figure) with edges indicating role transformability. - Superedges (SE): each superedge links one node from each layer: a user s expressing opinion k under emotion p while playing role c. Simplification assumption: one role, one subject, and one emotion per superedge, and one-to-multiple opinions. Association matrices (SP, SK, SC, PK, PC, KC) connect the layers. Multidimensional similarity index between social nodes Si and Sj: - Construct a star/polygonal structure around Si and Sj using shared neighbours across layers. - Components: 1) Social–opinion similarity (Sim^o): proportion of shared opinion nodes connected to both Si and Sj relative to their opinion connections. 2) Social–psychological similarity (Sim^p): proximity of averaged emotion intensities associated with Si and Sj (weighted by emotional intensity and SP links). 3) Social inner-edge similarity (Sim^s): overlap/topology similarity of shared neighbours within the social layer. - Aggregate weighted similarity: Sim_ij = w_p Sim^p_ij + w_k Sim^o_ij + w_s Sim^s_ij. Equal weights used (w_p = w_k = w_s = 1/3). Average node similarity: Sim_i = average of Sim_ij over all j. K-shell decomposition: - Global structural importance via iterative pruning by degree to assign each node a K-shell index k_i. The maximum shell index (k_max) denotes the most central shell. Influence index fusion and normalisation: - Both Sim_i and k_i are min–max normalised to [0,1]. - Influence index: Inf_i = sqrt( Sim_i^2 + k_i^2 ). Classification rules (quadrant method): - Threshold for K-shell: H vs L split by k_max (nodes in k_max-shell are High; others Low). - Threshold for similarity: H vs L by the top 30% Sim_i as High (per prior work). - Quadrants: HH = opinion leaders; LH = focus figures (high K-shell, lower similarity); HL = communication figures (high similarity, lower K-shell); LL = ordinary figures. Empirical pipeline on Sina Weibo case: - Data: Event window 2022-03-21 to 2022-03-31; 49,728 posts and comments collected, including user metadata, interactions, social links, and content. - Preprocessing: removal of links/emoticons/symbols; Chinese segmentation (CAS, lexical packages, Jieba); DF/TF/entropy-based stopword removal; 10,191 valid keywords retained. - Opinion extraction: keyword extraction and syntax analysis; LDA topic modelling with manual checking produced 82 core opinions; topics abstracted and generalised. - Sentiment analysis: BERT-based semi-supervised classifier trained on event corpus; emotions labelled as negative, neutral, positive with intensity scores (positive (0,1], neutral 0, negative [-1,0)). - Temporal segmentation: gestational (0–6 h), explosion (6–28 h), duration (28–72 h), recovery (4–10 days). - Network analysis: construct social graph per period; compute Sim components via SP and SK associations and social topology; compute K-shell per period; normalise, compute Inf, and classify into roles via quadrant rules. - Validation: AUC classification performance vs CI, forwarding volume, degree centrality, K-shell, and multidimensional similarity; network destructive experiments (selective attacks on top 1%, 2%, 5% nodes) to compare network connectivity decay; fine-grained inspection of role assignments and errors of baselines.
Key Findings
- Dataset and segmentation: 49,728 Weibo items over 10 days; event lifecycle phases defined (gestational 0–6 h, explosion 6–28 h, duration 28–72 h, recovery 4–10 days). - Opinion and sentiment: 82 core opinions extracted; sentiments classified into positive, neutral, negative with example intensities (e.g., 0.9506 positive; -0.8471 negative; neutral 0). - Social network topology by period (n, m, average degree, max degree, clustering C, average shortest path l, K-shell max ks, modularity λ): • Gestational: n=2110, m=2024, avg deg=1.903, dmax=433, C=0.019, l=5.226, ks=2, λ=0.893. • Explosion: n=10246, m=10429, avg deg=1.984, dmax=4134, C=0.002, l=4.064, ks=3, λ=0.792. • Duration: n=14913, m=15063, avg deg=1.998, dmax=7557, C=0.007, l=3.480, ks=3, λ=0.680. • Recovery: n=13738, m=14175, avg deg=2.009, dmax=2610, C=0.033, l=4.419, ks=3, λ=0.882. - Identification examples (Explosion period): • Opinion leaders (HH): China News Agency (Inf=1.15), Mo Chen Mo Chen (Inf=1.13), China Daily (Inf=1.10). • Focus figures (LH, k=3, lower Sim): Li Sweet Sauce (Inf=1.04), China News Network (Inf=1.04), Modern Express (Inf=1.03). • Communication figures (HL, high Sim, k<3): Ideological recording body (Sim=0.67, k=2, Inf=0.83), Daqiqi is Daqipa (Sim=0.48, k=2, Inf=0.69), Yeah, spinach flavour Pop Rocks (Sim=0.67, k=1, Inf=0.67). - Other periods (illustrative top entries): • Gestational: Modern Express (Sim=0.99, k=2, Inf=1.41), etc. • Duration: People’s Daily (Sim=0.80, k=3, Inf=1.28), others at Inf≈1.20–1.05; communication figures with Sim=0.67 at k=1–2 (Inf=0.67–0.83). • Recovery: CCTV News (Sim=0.87, k=3, Inf=1.33) as opinion leader; focus figures include Sina News (Inf=1.06); communication figures such as Nibelungen’s Dream and Penguin 9156 (Sim=0.67, k=2, Inf=0.83). - AUC validation (Explosion period example): Influence index (classification and recognition) AUC=0.7229 vs CI=0.7152, forwarding volume=0.4390, degree centrality=0.6711, K-shell=0.6416, multidimensional similarity=0.5536. Across periods, the proposed method generally outperforms baselines. - Network destructive experiments (selective attacks on top-ranked nodes): Under 1% attack in explosion period, remaining connectivity for CI, forwarding volume, degree centrality, K-shell, multidimensional similarity, and the proposed method were 48%, 66%, 55%, 64%, 63%, and 48% of initial connectivity, respectively; overall, across attack ratios (1%, 2%, 5%) and periods, the proposed method caused greater network damage than degree, K-shell, forwarding volume, and multidimensional similarity, with CI sometimes comparable or exceeding during incubation in select cases. - Fine-grained validation: Baselines tend to misclassify roles—e.g., degree centrality confuses local cores as global cores; K-shell cannot distinguish functions and misses bridge nodes; multidimensional similarity may erroneously elevate pseudocore nodes or mislabel bridges as leaders. The integrated method simultaneously identifies global cores, local cores, and bridge nodes and filters pseudocore nodes.
Discussion
The study demonstrates that public opinion diffusion is jointly driven by multiple roles: opinion leaders (global cores), focus figures (local cores), and communication figures (bridges/SH spanners). By integrating structural centrality (K-shell) with cross-layer multidimensional similarity (opinions, emotions, and social topology), the proposed approach captures both explicit and implicit drivers of influence. This dual-perspective improves fine-grained recognition, successfully distinguishing local from global cores and identifying bridge nodes critical for cross-community diffusion—capabilities that single-metric or single-layer methods lack. Empirical results on a major Weibo event show higher discriminative performance (AUC), greater network disruption under targeted removal, and clearer role delineations, indicating practical value for early-warning, intervention, and governance in online public opinion management.
Conclusion
This work introduces a four-dimensional supernetwork model (social-psychology-opinion-convergent) and a combined multidimensional similarity plus K-shell algorithm to classify and recognize key figures in online public opinion. It simultaneously identifies opinion leaders, focus figures, and communication figures, outperforming baseline methods in sensitivity and effectiveness, and filtering out pseudocore nodes while highlighting important bridge nodes. Future directions include: (1) finer-grained role taxonomy, including detection of potential future key figures (e.g., potential leaders/communicators) for proactive intervention; (2) extending and validating the method for key topic identification and improving model applicability across events with multiple viewpoints and topics; (3) developing supernetwork-based machine learning methods that combine complex structures and interactions with unsupervised learning for automated behavior, topic, and trend analysis.
Limitations
- Threshold selection: The quadrant classification relies on thresholds (k_max for K-shell; top 30% for similarity). While guided by literature, thresholds introduce subjectivity and may affect results across contexts. - Simplifying superedge assumption: Each superedge is constrained to one role, one subject, and one emotional tendency (with one-to-multiple opinions). Real-world interactions may involve more complex many-to-many mappings. - Single-platform, single-event validation: Empirical verification is on one Sina Weibo event within a 10-day window, which may limit generalizability across platforms, languages, and event types. - Equal weighting in similarity: The multidimensional similarity uses equal weights for psychosocial, opinion, and social components; optimal weighting may vary by event and could improve performance if learned or tuned. - Coarse sentiment categories: Emotions are grouped into three classes; finer affective states or nuanced sentiment dynamics may further refine similarity and role detection.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny