Computer Science

Revisiting recommender systems: an investigative survey

O. A. S. Ibrahim, E. M. G. Younis, et al.

This review maps the evolution of recommender systems—classifying collaborative, content-based, and hybrid approaches—while highlighting how machine learning and deep learning tackle cold-start, filter bubbles, and personalization. It also stresses fairness, transparency, and trust and points to future directions. Research conducted by Authors present in <Authors> tag.... show more

Introduction

Recommender systems (RSs) have become essential decision-support tools across domains including e-commerce, social media, healthcare, finance, and travel as users face information overload online. Historically, recommendations were sourced from social contacts; today, internet-connected devices and social platforms drive large-scale personalized recommendations. Key challenges persist: (1) filter bubbles limit exposure to diverse content; (2) cold start for new users/items impedes accurate personalization; and (3) reliance on traditional collaborative filtering (CF) and content-based techniques may underfit complex, evolving user preferences. Recent advances integrate machine learning (ML), reinforcement learning, and especially deep learning (e.g., neural collaborative filtering) to model nonlinear user–item interactions and implicit feedback at scale. Market analyses forecast substantial growth of recommendation engines through 2030, underscoring strategic importance in industry. Parallel developments in human–AI interaction emphasize aligning RSs with human values, transparency, and trust, and acknowledge proprietary knowledge gaps where industrial practices are not fully disclosed. The paper argues for a holistic view that complements algorithmic advances with human factors and real-world deployment considerations. Contributions: (1) clarifies classifications and attributes of three principal RS categories (collaborative filtering, personalized/context-aware systems, and content-based systems); (2) organizes prior work within a systematic taxonomy; and (3) articulates drawbacks of existing approaches and outlines future research avenues, including robustness, fairness, trust, and serendipity.

Literature Review

The survey adopts a mixed-method, semi-systematic literature review to capture the breadth of RS research amid evolving terminology. Sources include Springer, ScienceDirect, and IEEE Xplore, supplemented by snowballing from reference lists and expert knowledge to identify additional relevant papers. Inclusion emphasized papers with specified search terms in titles or keywords; arXiv preprints were excluded. The review quantifies topic prevalence (approximate shares): recommender system (~32%), collaborative filtering (~17%), personalized recommendation (~11%), fairness (~9%), content-based (~7%), shilling attacks (~6%), review papers (~5%), and biased recommendation (~3%). It situates the work against prior surveys focused on single domains (e.g., fairness, personality-aware RSs, generative-model RSs, popularity bias, trustworthiness) and positions this survey as broader across CF, personalized/context-aware, content-based, and robustness domains. The taxonomy (Fig. 1) organizes RSs into: Personalized (CF—model-based and neighborhood-based; context-aware; mathematical models), and Content-based (ML-driven).

Methodology

Design: Mixed-method, semi-systematic review. Data sources: Springer, ScienceDirect, IEEE Xplore; expert-informed snowballing to capture studies potentially missed by keyword search. Scope and query: Broad search terms to accommodate non-standardized RS terminology. Filtering required presence of terms in title/keywords; arXiv-only preprints were excluded. Research questions: (1) How has the theory of RS algorithms evolved over time? (2) How can theoretical developments in RS be translated into useful applications? Search terms (indicative): recommender system, trustworthy, collaborative filtering, e-commerce, shilling attacks, hybrid recommender system, personalized recommendation system, context-aware recommender system, fairness of recommender systems, deep learning for recommender systems, content-based recommender systems, recommender systems using generative models, review of recommender systems, ranking recommendations, biased recommendation. Example Boolean query (compressed): ("recommender systems" OR "RS" OR "RecSys") AND ("LSA" OR "semantic analysis") AND ("rule-based" OR "knowledge-based systems") AND ("matrix factorization" OR "deep learning" OR "neural collaborative filtering" OR "graph neural networks" OR "LLMs" OR "sequential models" OR "hybrid recommender systems") AND ("precision" OR "recall" OR "F1" OR "RMSE" OR "MAE" OR "novelty" OR "diversity" OR "serendipity" OR "user satisfaction") AND (application domains, e.g., e-commerce, media streaming, education, healthcare) AND ("algorithm evolution" OR "scalability" OR "bias mitigation" OR "cold start" OR "real-time" OR "contextual"). Screening: Inclusion if query terms present in title/keywords; exclusion of arXiv-only preprints. Analysis: Categorization under the proposed taxonomy; comparative synthesis of techniques, challenges, and evaluation metrics; identification of gaps and future directions.

Key Findings

Taxonomy and scope: RSs are categorized into collaborative filtering (CF), content-based (CB), hybrid, and personalized/context-aware systems, spanning neighborhood-based (user-/item-based) and model-based (MF, probabilistic, rule-based, deep learning) approaches.
Collaborative filtering: Memory-based methods rely on similarity among users/items; model-based methods learn latent factors (e.g., SVD/MF, probabilistic MF), clustering (K-means), and neural models (NCF). Formalizations include user-based CF with weighted deviations and item-based CF using cosine/Pearson similarity. Trust-aware variants (Trust-weighted mean, Trust-based CF, TidalTrust) mitigate adversarial bias and cold start by propagating trust.
Content-based/IR models: Vector Space Model (TF-IDF weighting) and probabilistic IR (Okapi Two-Poisson; BM25 with typical parameters K1≈1.2, b≈0.7) remain foundational for CB-RSs; multiple similarity functions (cosine, Jaccard, Dice, Euclidean) are discussed. Learning-to-Rank methods (pointwise, pairwise, listwise; hybrids like LambdaMART) improve ranking effectiveness.
Personalized/context-aware RSs: Context (time, location, device, social) enriches modeling but exacerbates cold start; solutions include context-aware MF (tensorization), multi-domain transfer, PLMs with prompt/prefix tuning, multimodal fusion (text, image, audio, video), and GNNs/Transformers for sequential and graph-structured signals.
Robustness and bias: Shilling attacks (push/nuke) and gray sheep users introduce bias. Detection metrics include prediction difference before/after user removal and deviations in user ratings. Trust-aware frameworks and social trust networks bolster robustness but depend on trustworthy graph information and security.
Algorithm landscape (representative): ALS, BPR, BiVAE, Caser, NCF, GRU4Rec, SASRec (Transformers), DKN, LSTUR, NAML/NRMS/NPA (news), xDeepFM, RBM, LightFM, GeoIMC, RLRMC, SAR—each with strengths (scalability, sequential modeling, knowledge integration) and weaknesses (cold start, interpretability, complexity, data hunger).
Ethical and human-centered aspects: Increasing emphasis on fairness, transparency, and user trust; balancing beyond-accuracy objectives (diversity, novelty, serendipity) with relevance.
Quantified topic prevalence (approximate from Fig. 2): recommender system ~32%, collaborative filtering ~17%, personalized recommendation ~11%, fairness ~9%, content-based ~7%, shilling attacks ~6%, review ~5%, biased recommendation ~3%.

Discussion

The review addresses the evolution of RS theory by tracing progression from neighborhood-based CF and classical IR/VSM/PM models to latent factorization (SVD/MF), probabilistic modeling, and modern deep architectures (NCF, CNN/RNN/Transformer-based sequential models, GNNs), alongside trust-aware and context-aware extensions. It connects theory to practice by detailing how ML-driven models capture nonlinear user–item dynamics, fuse multimodal/contextual signals, and enable large-scale, real-time personalization. Robustness considerations (bias/shilling detection and trust propagation) and human-centered objectives (fairness, transparency, serendipity) are mapped to application needs in domains such as e-commerce, media, healthcare, and education. The synthesis indicates that hybridization—combining CF, CB, context, and trust with deep representation learning—best aligns algorithmic advances with practical constraints (scalability, cold start, security, and ethics), guiding deployment strategies and metric selection beyond accuracy (e.g., diversity, novelty, serendipity, user satisfaction).

Conclusion

The survey systematizes RS methodologies across CF, CB, hybrid, and personalized/context-aware paradigms, highlighting deep learning’s role in modeling complex preferences and improving accuracy, as well as trust-aware mechanisms for robustness. It underscores the need to incorporate fairness, transparency, and serendipity to mitigate filter bubbles and bias. Recommended future directions include: (1) enhanced trust-aware techniques (trust propagation, trust-based filtering, hybrid models leveraging social trust) for cold start and robustness; (2) tighter integration of CF and ML for multimodal, context-rich, interpretable, and fair recommendations; (3) addressing proprietary knowledge gaps by fostering reproducible research and bridging academic–industry practice; and (4) principled bias mitigation with reliable serendipity measurement and optimization, scalable to real-world systems.

Limitations

Scope and coverage: As a semi-systematic survey, inclusion depends on chosen sources and search terms; exclusion of arXiv-only preprints and reliance on title/keyword filtering may omit relevant work.
Lack of empirical benchmarking: The study is theoretical and does not conduct new experiments; comparative claims rely on cited literature, not unified empirical evaluation across datasets.
Proprietary opacity: Many industrial RS practices are undisclosed, limiting generalizability from academic methods to production systems.
Visual/figure-derived estimates: Reported keyword prevalence from Fig. 2 is approximate; actual counts may vary.
Evolving field: Rapid advances (e.g., LLM-based RS, reinforcement learning at scale) may outpace the survey’s coverage window.

Related Publications

Explore these studies to deepen your understanding of the subject.

Education

Towards an intelligent blended system of learning activities model for New Zealand institutions: an investigative approach

A. Adel and J. Dayan

Physics

An optic to replace space and its application towards ultra-thin imaging systems

O. Reshef, M. P. Delmastro, et al.

Medicine and Health

Extracorporeal life support provision in COVID-19 patients -An international EuroELSO 2022 update survey

M. Fleig, T. Müller, et al.

Environmental Studies and Forestry

Are the impacts of food systems on climate change being reported by the media? An Australian media analysis

N. Atkinson, M. Ferguson, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny