Computer Science

Deep Item-based Collaborative Filtering for Top-N Recommendation

F. Xue, X. He, et al.

This research, conducted by Feng Xue, Xiangnan He, Xiang Wang, Jiandong Xu, Kai Liu, and Richang Hong, introduces DeepICF — a neural item-based collaborative filtering that models nonlinear and higher-order item relationships beyond pairwise interactions, and shows on MovieLens and Pinterest that higher-order modeling and an attention-augmented variant (DeepICF+a) improve recommendation performance.... show more

Introduction

The study addresses the limitations of current item-based collaborative filtering (ICF) methods, which primarily model only second-order (pairwise) item relations. ICF represents users by their consumed items and estimates relevance via item-item similarity, offering advantages in accuracy, interpretability, and ease of online personalization over user-based methods. However, linear and shallow models miss higher-order relations (e.g., multiple items sharing attributes or co-occurring due to complementarity) that influence user choices. The research questions are whether modeling nonlinear, higher-order item interactions within ICF can improve top-N recommendation performance and how attention mechanisms can further refine the importance of pairwise interactions. The work proposes DeepICF, a neural network framework that stacks nonlinear layers above pairwise interaction modeling to capture higher-order relations, aiming to significantly enhance recommendation accuracy and interpretability.

Literature Review

Early ICF approaches (ItemKNN) used heuristic statistical measures (cosine similarity, Pearson correlation) to compute item-item similarity, requiring manual tuning and lacking generalization. Learning-based ICF advanced with SLIM, which directly learns a sparse, non-negative item-item similarity matrix via regression; and FISM, which factorizes item similarity under a low-rank assumption using item embeddings. NAIS further introduced attention to weigh pairwise item similarities dynamically. HOSLIM extended SLIM to include higher-order relations by mining frequent itemsets and learning itemset-item similarities, but relies on a support threshold and aggregates itemset effects linearly and statically. CDAE applied auto-encoder architectures to learn item similarity but remains limited by linear inner-product interactions. These works motivate a unified, end-to-end, nonlinear neural solution that automatically captures higher-order item relations.

Methodology

The proposed DeepICF framework models higher-order item relations via a neural architecture inspired by Neural Collaborative Filtering (NCF), with key differences tailored to ICF:

Input and Embeddings: The target item i is represented by a one-hot ID embedding p_i ∈ R^k. The user u is represented by multi-hot encoding of her interacted items R_u^+, each mapped to embeddings q_j ∈ R^k, yielding Q_u = {q_j | j ∈ R_u^+}.
Pairwise Interaction Layer: For each historical item j, compute element-wise product v_j = q_j ⊙ p_i, forming V_{ui} = {v_j} to encode second-order item relations.
Pooling Layer: Aggregates variable-size V_{ui} into a fixed-size vector e_{ui}: • DeepICF: weighted average pooling with normalization e_{ui} = (1 / (|R_u^+|−1)^α) Σ_{j∈R_u^+{i}} (q_j ⊙ p_i), where α controls smoothing across users with different history lengths. • DeepICF+a: attention-based pooling e_{ui} = (1 / (|R_u^+|−1)^α) Σ_{j} a(v_j)·v_j, where a(v) = softmax'( h^T ReLU(Wv + b) ) learns varying importance of pairwise interactions. softmax' includes a smoothing factor β to account for history length.
Deep Interaction Layers: Stack L fully connected layers with ReLU activations to capture nonlinear higher-order interactions among items: e_1 = ReLU(W_1 e_{ui} + b_1), …, e_L = ReLU(W_L e_{L−1} + b_L).
Prediction Layer: Final score ŷ_{ui} = z^T e_L + b_u + b_i, where z is a global weight vector and b_u, b_i are user and item biases to model activity/popularity. Learning: Optimize pointwise binary cross-entropy (log loss) with negative sampling (NS negatives per positive, typically NS=4), using sigmoid on predictions and L2 regularization (λ) primarily on deep layer weights to mitigate overfitting. Pre-training: Initialize item embeddings p_i, q_j with FISM-learned embeddings to improve convergence and performance. Time Complexity: DeepICF inference complexity is O(k|R_u^+| + Σ_{l=1}^L d_{l−1}d_l); DeepICF+a adds attention cost, yielding O(k'k|R_u^+| + Σ_{l=1}^L d_{l−1}d_l), where k' is attention hidden size. Connections: FISM and NAIS are special cases recovered by removing deep layers and using average or attention pooling with a linear projection, demonstrating the generalization of DeepICF over prior ICF models.

Key Findings

Overall performance: DeepICF and DeepICF+a consistently outperform strong baselines (FISM, eALS, BPR, MLP, YouTube Rec, HOSLIM) on MovieLens and Pinterest using leave-one-out evaluation with HR@10 and NDCG@10. • MovieLens (embedding size 16):
- FISM: HR@10=0.6685, NDCG@10=0.3954
- DeepICF: HR@10=0.6881, NDCG@10=0.4113
- DeepICF+a: HR@10=0.7084, NDCG@10=0.4380 • Pinterest (embedding size 16):
- FISM: HR@10=0.8763, NDCG@10=0.5529
- DeepICF: HR@10=0.8806, NDCG@10=0.5631
- DeepICF+a: HR@10=0.8835, NDCG@10=0.5666 Relative NDCG improvements over FISM: ~4.0% (DeepICF) and ~10.8% (DeepICF+a) on MovieLens; ~1.9% (DeepICF) and ~2.5% (DeepICF+a) on Pinterest. Reported enhancements are statistically significant (p < 0.05).
Higher-order interactions help: Stacking nonlinear hidden layers yields better performance; deeper architectures capture complex item relations (e.g., DeepICF-3/4 often best across embedding sizes).
Attention improves second-order modeling: DeepICF+a’s attention-based pooling differentiates historical items’ influence, improving both HR and NDCG; qualitative attention visualizations align with category/genre relevance.
Pre-training utility: Initializing embeddings with FISM speeds convergence and improves accuracy (e.g., HR gains ~0.8–1.3% at k=16), compared to random initialization.
Hyper-parameter sensitivity: • Normalization α: Best α for DeepICF varies by dataset (MovieLens ~0.4–0.5; Pinterest ~0.5–1); DeepICF+a performs best with α=0 and properly tuned β. • Embedding size k: Performance generally improves with larger k; DeepICF+a mitigates small-k weaknesses on denser datasets. • Negative sampling: Optimal NS around 4 negatives per positive; increasing negatives up to ~4 improves performance for both DeepICF variants.
Comparative insights: Item-based deep models are more robust on sparser datasets (Pinterest) than user-based models; YouTube Rec’s reliance on sequence/time limits its effectiveness on datasets lacking temporal signals.

Discussion

The findings confirm that modeling nonlinear, higher-order item interactions within ICF substantially enhances top-N recommendation accuracy. By explicitly forming pairwise interaction vectors and learning higher-order dependencies through deep neural layers, DeepICF captures complex itemset effects that linear models (SLIM, HOSLIM, FISM) miss. The attention mechanism further refines second-order interactions by weighting historically relevant items more strongly, aligning with intuitive user decision patterns and improving explainability. Performance gains across two different datasets, statistical significance, and robustness to hyper-parameter settings demonstrate the approach’s relevance to industrial recommender systems, especially for implicit-feedback scenarios and sparse data conditions.

Conclusion

DeepICF introduces a deep neural framework for item-based collaborative filtering that unifies second-order and higher-order interaction modeling, overcoming the linearity and uniform-weight limitations of prior ICF models. Extensive experiments on MovieLens and Pinterest show significant improvements over strong baselines, and the attentional variant (DeepICF+a) achieves further gains and better explainability. Future directions include: (1) incorporating heterogeneous item relations and side information (attributes, content, co-occurrence) into the framework; (2) providing finer-grained, feature-level explanations to increase user trust; and (3) modeling sequential preference evolution via reinforcement learning or memory networks.

Limitations

Scope restricted to implicit-feedback, pure collaborative filtering without side information; incorporation of attributes, context, and content is deferred to future work.
The deep models are prone to overfitting due to fully connected MLP layers, requiring careful regularization and benefiting from FISM-based pre-training.
The attention and normalization mechanisms introduce additional hyper-parameters (α, β, attention size) that require dataset-specific tuning.
Evaluation uses leave-one-out with sampled negatives (99), which, while standard, may not reflect full catalog ranking or real-time constraints.
The prediction layer uses a global projection vector; more fine-grained user/item-aware projections are suggested but not explored here.

Related Publications

Explore these studies to deepen your understanding of the subject.

Engineering and Technology

Deep-learning-based image segmentation integrated with optical microscopy for automatically searching for two-dimensional materials

S. Masubuchi, E. Watanabe, et al.

Computer Science

Deep Reinforcement Learning-Based Dynamic Pricing for Parking Solutions

V. Bui, S. Zarrabian, et al.

Engineering and Technology

Ultrathin crystalline-silicon-based strain gauges with deep learning algorithms for silent speech interfaces

T. Kim, Y. Shin, et al.

Computer Science

Deep learning-based multi-criteria recommender system for technology-enhanced learning

L. Salau, H. Mohamed, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny