logo
ResearchBunny Logo
DiffNet++: A Neural Influence and Interest Diffusion Network for Social Recommendation

Computer Science

DiffNet++: A Neural Influence and Interest Diffusion Network for Social Recommendation

L. Wu, J. Li, et al.

Discover DiffNet++, a neural framework that jointly models influence diffusion in social networks and interest diffusion in user–item graphs, using iterative aggregation and multi-level attention to learn richer user embeddings and boost social recommendation performance on real datasets. This research was conducted by Authors present in <Authors> tag.... show more
Introduction

The paper addresses the challenge of data sparsity in collaborative filtering by leveraging social connections and user-item interactions. Traditional social recommendation methods largely rely on first-order social neighbors and ignore higher-order influence diffusion across the social network. Likewise, collaborative filtering on the user-item bipartite graph typically considers only first-order interactions, missing higher-order collaborative signals. The research question is how to jointly model higher-order social influence diffusion and higher-order interest diffusion to learn better user embeddings for social recommendation. The authors reformulate social recommendation as link prediction in a heterogeneous graph comprising a user-user social graph and a user-item interest graph. They propose DiffNet++, which unifies influence diffusion and interest diffusion via iterative, multi-hop message passing and a multi-level attention mechanism that balances contributions from social neighbors, item neighbors, and a user's own embedding. This unified modeling aims to capture both global social influence and collaborative interest patterns for improved recommendation accuracy.

Literature Review

The paper surveys three strands of work: (1) Classical CF and social recommendation models such as matrix factorization (PMF), BPR, SVD++, SocialMF, TrustSVD, ContextMF, and CNSR. These approaches incorporate social regularization or social neighbors but typically use only first-order relations. (2) Graph-based recommendation models applying GCNs on user-item graphs (GC-MC, NGCF, PinSage) and on user-user social graphs (DiffNet). These models exploit higher-order structure but generally focus on a single graph type (either social or interest). (3) Attention mechanisms in recommendation and GNNs (NAIS, Graph Attention Networks, GraphRec), which learn importance weights over neighbors or aspects but often limit themselves to first-order structures. The gap identified is the absence of a unified framework that simultaneously models higher-order social influence and higher-order interest diffusion with adaptive attention across and within graphs.

Methodology

DiffNet++ consists of four components: (1) Embedding layer: initializes free latent embeddings for users (P ∈ R^{M×D}) and items (Q ∈ R^{N×D}). (2) Fusion layer: fuses free embeddings with optional feature vectors (user x_a, item y_i) using learnable transformations to produce initial fused embeddings u_a^0 = g(W1·[p_a, x_a]) and v_i^0 = g(W2·[q_i, y_i]). If attributes are unavailable, the fusion layer degenerates and u_a^0=p_a, v_i^0=q_i. (3) Influence and interest diffusion layers: perform K-layer iterative updates. For items, interest diffusion aggregates neighbor user embeddings in the user-item graph with attention: v_i^{k+1} = Σ_{a∈R_i} η_{ia}^{k+1} u_a^k, then residual addition v_i^{k+1} = v̂_i^{k+1} + v_i^k. Attention weights η_{ia}^{k+1} are computed via an MLP over [v_i^k, u_a^k] and normalized. For users, updates combine three sources with residual addition: u_a^{k+1} = u_a^k + γ_1^{k+1} P_a^{k+1} + γ_2^{k+1} Q_a^{k+1}, where P_a^{k+1} = Σ_{b∈S_a} α_{ab}^{k+1} u_b^k (social influence aggregation) and Q_a^{k+1} = Σ_{i∈R_a} β_{ai}^{k+1} v_i^k (interest aggregation). Node-level attentions α_{ab}^{k+1} and β_{ai}^{k+1} are learned via MLPs over [u_a^k, u_b^k] and [u_a^k, v_i^k], respectively, with normalization. Graph-level attentions γ_{a1}^{k+1}, γ_{a2}^{k+1} are learned via MLPs conditioning on u_a^k and the attentively aggregated representations from node-level outputs, with γ_{a1}^{k+1}+γ_{a2}^{k+1}=1, allowing user-specific balancing of social versus interest diffusion. (4) Prediction layer: to mitigate GCN over-smoothing, final user and item representations concatenate embeddings from all layers: u_a^* = [u_a^0 || ... || u_a^K], v_i^* = [v_i^0 || ... || v_i^K], and predictions use inner products: ŕ_{ai} = (u_a^)^T v_i^. Training uses a pair-wise BPR loss with negative sampling and L2 regularization: minimize Σ -ln σ(ŕ_{ai} - ŕ_{aj}) + λ||Θ||^2, optimized with Adam. The model includes a matrix formulation enabling efficient implementation: attention matrices for item aggregation H^{k+1}, social α^{k+1}, interest β^{k+1}, and graph-level Γ^{k+1} are used to update U^{k+1}, V^{k+1} via sparse matrix multiplications. Complexity: space complexity comparable to embedding-based models (dominant P,Q); time complexity grows linearly in users, items, and diffusion depth K, with per-layer cost O(M(L_u+L_i)D + N L_u D). The model generalizes to settings without attributes by removing the fusion layer.

Key Findings
  • DiffNet++ consistently outperforms baselines across four datasets (Yelp, Flickr, Epinions, Dianping). Reported top-10 improvements over the best baseline: approximately +14% (Yelp), +21% (Flickr), +12% (Epinions), and +4% (Dianping).
  • Best performance typically occurs at embedding dimension D=64 and diffusion depth K=2. Using K=2 yields superior HR and NDCG; increasing to K=3 slightly degrades performance, indicating 2-hop structures suffice and deeper layers may introduce noise.
  • Representative HR/NDCG at D=64 for DiffNet++: Yelp HR=0.3694, NDCG=0.2263; Flickr HR=0.1832, NDCG=0.1420; Epinions HR=0.3503, NDCG=0.2288; Dianping HR=0.2713, NDCG=0.1605.
  • Multi-level attention improves performance over average weighting. On Flickr, graph-level attention improves HR by ~4.67% and NDCG by ~4.36% over average; combining node-level and graph-level attention further boosts results (overall +5.71% HR, +6.85% NDCG compared to average).
  • Attention value analysis: At k=1, mean graph-level weight favors social influence (Yelp γ_1≈0.7309; Flickr γ_1≈0.8381). At k=2, Yelp still favors social (γ_1≈0.6888), while Flickr shifts strongly to interest (γ_2≈0.9273), reflecting dataset-specific balancing.
  • Performance under sparsity: DiffNet++ shows larger gains for users with fewer ratings. For users with <8 ratings, DiffNet++ improves by 22.4% (Yelp) and 45.0% (Flickr) over the best baseline.
  • Runtime: Although more expensive than single-graph GNNs, training remains practical. Average epoch times (seconds): Yelp 7.72, Flickr 7.21, Epinions 4.69, Dianping 25.62; convergence typically within <100 epochs (≤1 hour on largest dataset).
Discussion

The findings demonstrate that jointly modeling higher-order social influence and interest diffusion yields superior user embeddings and recommendation accuracy compared to approaches focusing on a single graph or only first-order neighborhoods. The multi-level attention mechanism effectively adapts the fusion of social and interest signals at both node and graph levels to individual users, enabling flexible weighting that reflects user variability in susceptibility to social influence versus personal interests. Concatenating embeddings across layers in the prediction stage mitigates over-smoothing as depth increases, preserving informative multi-hop signals. The empirical results across diverse datasets substantiate the central hypothesis: integrating higher-order structures from both user-user and user-item networks improves top-N recommendation, especially under data sparsity. These outcomes underscore the relevance of heterogeneous graph modeling and attentive fusion in social recommendation systems.

Conclusion

The paper introduces DiffNet++, a unified neural framework that captures both higher-order social influence diffusion and higher-order interest diffusion for social recommendation via iterative graph convolutions and a multi-level attention mechanism. By reformulating social recommendation as link prediction on a heterogeneous graph, DiffNet++ improves recommendation performance across four datasets and is particularly effective in sparse scenarios. The architecture mitigates over-smoothing through layer-wise embedding concatenation and remains efficient in training time. Future work includes exploring graph reasoning techniques to generate explainable paths underlying user behaviors, enhancing interpretability of the diffusion processes.

Limitations
  • Computational overhead: Joint diffusion across two graphs with multi-level attention increases training time compared to simpler CF or single-graph GNNs.
  • Sensitivity to hyperparameters: Performance depends on diffusion depth K (with K=2 optimal empirically) and embedding dimension; deeper layers can lead to over-smoothing or noise.
  • Data requirements: While adaptable, the approach assumes availability of a user-user social graph and a user-item interaction graph; benefits may diminish where one modality is extremely sparse or noisy.
  • Attention stability: Learned attention weights may vary across datasets, potentially requiring careful tuning or regularization to avoid overfitting in small or noisy graphs.
  • Implicit feedback setting: Experiments focus on implicit feedback; generalization to explicit ratings or other feedback types may require additional calibration.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny