Computer Science
Deep Item-based Collaborative Filtering for Top-N Recommendation
F. Xue, X. He, et al.
The study addresses the limitations of current item-based collaborative filtering (ICF) methods, which primarily model only second-order (pairwise) item relations. ICF represents users by their consumed items and estimates relevance via item-item similarity, offering advantages in accuracy, interpretability, and ease of online personalization over user-based methods. However, linear and shallow models miss higher-order relations (e.g., multiple items sharing attributes or co-occurring due to complementarity) that influence user choices. The research questions are whether modeling nonlinear, higher-order item interactions within ICF can improve top-N recommendation performance and how attention mechanisms can further refine the importance of pairwise interactions. The work proposes DeepICF, a neural network framework that stacks nonlinear layers above pairwise interaction modeling to capture higher-order relations, aiming to significantly enhance recommendation accuracy and interpretability.
Early ICF approaches (ItemKNN) used heuristic statistical measures (cosine similarity, Pearson correlation) to compute item-item similarity, requiring manual tuning and lacking generalization. Learning-based ICF advanced with SLIM, which directly learns a sparse, non-negative item-item similarity matrix via regression; and FISM, which factorizes item similarity under a low-rank assumption using item embeddings. NAIS further introduced attention to weigh pairwise item similarities dynamically. HOSLIM extended SLIM to include higher-order relations by mining frequent itemsets and learning itemset-item similarities, but relies on a support threshold and aggregates itemset effects linearly and statically. CDAE applied auto-encoder architectures to learn item similarity but remains limited by linear inner-product interactions. These works motivate a unified, end-to-end, nonlinear neural solution that automatically captures higher-order item relations.
The proposed DeepICF framework models higher-order item relations via a neural architecture inspired by Neural Collaborative Filtering (NCF), with key differences tailored to ICF:
- Input and Embeddings: The target item i is represented by a one-hot ID embedding p_i ∈ R^k. The user u is represented by multi-hot encoding of her interacted items R_u^+, each mapped to embeddings q_j ∈ R^k, yielding Q_u = {q_j | j ∈ R_u^+}.
- Pairwise Interaction Layer: For each historical item j, compute element-wise product v_j = q_j ⊙ p_i, forming V_{ui} = {v_j} to encode second-order item relations.
- Pooling Layer: Aggregates variable-size V_{ui} into a fixed-size vector e_{ui}: • DeepICF: weighted average pooling with normalization e_{ui} = (1 / (|R_u^+|−1)^α) Σ_{j∈R_u^+{i}} (q_j ⊙ p_i), where α controls smoothing across users with different history lengths. • DeepICF+a: attention-based pooling e_{ui} = (1 / (|R_u^+|−1)^α) Σ_{j} a(v_j)·v_j, where a(v) = softmax'( h^T ReLU(Wv + b) ) learns varying importance of pairwise interactions. softmax' includes a smoothing factor β to account for history length.
- Deep Interaction Layers: Stack L fully connected layers with ReLU activations to capture nonlinear higher-order interactions among items: e_1 = ReLU(W_1 e_{ui} + b_1), …, e_L = ReLU(W_L e_{L−1} + b_L).
- Prediction Layer: Final score ŷ_{ui} = z^T e_L + b_u + b_i, where z is a global weight vector and b_u, b_i are user and item biases to model activity/popularity. Learning: Optimize pointwise binary cross-entropy (log loss) with negative sampling (NS negatives per positive, typically NS=4), using sigmoid on predictions and L2 regularization (λ) primarily on deep layer weights to mitigate overfitting. Pre-training: Initialize item embeddings p_i, q_j with FISM-learned embeddings to improve convergence and performance. Time Complexity: DeepICF inference complexity is O(k|R_u^+| + Σ_{l=1}^L d_{l−1}d_l); DeepICF+a adds attention cost, yielding O(k'k|R_u^+| + Σ_{l=1}^L d_{l−1}d_l), where k' is attention hidden size. Connections: FISM and NAIS are special cases recovered by removing deep layers and using average or attention pooling with a linear projection, demonstrating the generalization of DeepICF over prior ICF models.
- Overall performance: DeepICF and DeepICF+a consistently outperform strong baselines (FISM, eALS, BPR, MLP, YouTube Rec, HOSLIM) on MovieLens and Pinterest using leave-one-out evaluation with HR@10 and NDCG@10.
• MovieLens (embedding size 16):
- FISM: HR@10=0.6685, NDCG@10=0.3954
- DeepICF: HR@10=0.6881, NDCG@10=0.4113
- DeepICF+a: HR@10=0.7084, NDCG@10=0.4380 • Pinterest (embedding size 16):
- FISM: HR@10=0.8763, NDCG@10=0.5529
- DeepICF: HR@10=0.8806, NDCG@10=0.5631
- DeepICF+a: HR@10=0.8835, NDCG@10=0.5666 Relative NDCG improvements over FISM: ~4.0% (DeepICF) and ~10.8% (DeepICF+a) on MovieLens; ~1.9% (DeepICF) and ~2.5% (DeepICF+a) on Pinterest. Reported enhancements are statistically significant (p < 0.05).
- Higher-order interactions help: Stacking nonlinear hidden layers yields better performance; deeper architectures capture complex item relations (e.g., DeepICF-3/4 often best across embedding sizes).
- Attention improves second-order modeling: DeepICF+a’s attention-based pooling differentiates historical items’ influence, improving both HR and NDCG; qualitative attention visualizations align with category/genre relevance.
- Pre-training utility: Initializing embeddings with FISM speeds convergence and improves accuracy (e.g., HR gains ~0.8–1.3% at k=16), compared to random initialization.
- Hyper-parameter sensitivity: • Normalization α: Best α for DeepICF varies by dataset (MovieLens ~0.4–0.5; Pinterest ~0.5–1); DeepICF+a performs best with α=0 and properly tuned β. • Embedding size k: Performance generally improves with larger k; DeepICF+a mitigates small-k weaknesses on denser datasets. • Negative sampling: Optimal NS around 4 negatives per positive; increasing negatives up to ~4 improves performance for both DeepICF variants.
- Comparative insights: Item-based deep models are more robust on sparser datasets (Pinterest) than user-based models; YouTube Rec’s reliance on sequence/time limits its effectiveness on datasets lacking temporal signals.
The findings confirm that modeling nonlinear, higher-order item interactions within ICF substantially enhances top-N recommendation accuracy. By explicitly forming pairwise interaction vectors and learning higher-order dependencies through deep neural layers, DeepICF captures complex itemset effects that linear models (SLIM, HOSLIM, FISM) miss. The attention mechanism further refines second-order interactions by weighting historically relevant items more strongly, aligning with intuitive user decision patterns and improving explainability. Performance gains across two different datasets, statistical significance, and robustness to hyper-parameter settings demonstrate the approach’s relevance to industrial recommender systems, especially for implicit-feedback scenarios and sparse data conditions.
DeepICF introduces a deep neural framework for item-based collaborative filtering that unifies second-order and higher-order interaction modeling, overcoming the linearity and uniform-weight limitations of prior ICF models. Extensive experiments on MovieLens and Pinterest show significant improvements over strong baselines, and the attentional variant (DeepICF+a) achieves further gains and better explainability. Future directions include: (1) incorporating heterogeneous item relations and side information (attributes, content, co-occurrence) into the framework; (2) providing finer-grained, feature-level explanations to increase user trust; and (3) modeling sequential preference evolution via reinforcement learning or memory networks.
- Scope restricted to implicit-feedback, pure collaborative filtering without side information; incorporation of attributes, context, and content is deferred to future work.
- The deep models are prone to overfitting due to fully connected MLP layers, requiring careful regularization and benefiting from FISM-based pre-training.
- The attention and normalization mechanisms introduce additional hyper-parameters (α, β, attention size) that require dataset-specific tuning.
- Evaluation uses leave-one-out with sampled negatives (99), which, while standard, may not reflect full catalog ranking or real-time constraints.
- The prediction layer uses a global projection vector; more fine-grained user/item-aware projections are suggested but not explored here.
Related Publications
Explore these studies to deepen your understanding of the subject.

