Business
Hybrid LLM/Rule-based Approaches to Business Insights Generation from Structured Data
A. Vertsel and M. Rumiantsau
This paper, conducted by Aliaksei Vertsel and Mikhail Rumiantsau, delves into innovative hybrid approaches that merge rule-based systems and Large Language Models (LLMs) to extract actionable business insights from complex datasets. It highlights how this combination can overcome the limitations faced by traditional systems and standalone models, promising enhanced data extraction and insights generation.
~3 min • Beginner • English
Introduction
As organizations grapple with increasingly complex and diverse data sets, the demand for advanced techniques that can extract valuable insights has grown exponentially. Traditional rule-based systems have often struggled to keep up with the intricacies of modern business data, while stand-alone AI Models, although powerful, may still have limitations in certain scenarios.
In response to these challenges, the concept of hybrid approaches has emerged as a compelling solution. By combining the strengths of rule-based systems and AI models, hybrid approaches offer the potential to enhance the process of data extraction and uncover meaningful insights from diverse data sources. In this paper, we explore the use of LLM-powered and rule-based systems to address the complexities of data extraction in the field of business intelligence.
The following sections will investigate the details of this hybrid approach, assessing its effectiveness in navigating the complexities of business data and extracting actionable insights.
Literature Review
Methodology
The paper proposes and evaluates hybrid LLM-powered and rule-based methods for generating business insights from structured data, detailing pipeline design, preprocessing strategies, insight extraction techniques, narrative generation methods, and several hybrid architectures, followed by benchmarking.
Hybrid approach and considerations (Section 2):
- Combines interpretable AI techniques (e.g., LIME conceptually), rule-based systems, and supervised document classification to extract insights. LLMs integrate with rule-based logic to enhance natural language understanding/generation and capture nuanced user interests and goals.
- Key considerations: data quality and preprocessing; domain knowledge for rule design and feature selection; scalability and computational resources; ethical concerns, bias propagation, interpretability challenges, and resource intensity.
Business insight generation data pipeline (Section 3):
- Data preprocessing and normalization: cleaning, integration, transformation, reduction. Rule-based methods ensure consistency and control; LLM-based methods add adaptability and handle unstructured text; trade-offs in resource intensity and predictability.
- Data preprocessor structures: sequential cleaning, integration, transformation, and reduction steps.
- Rule-based vs LLM-based preprocessing: rule-based offers efficiency and determinism; LLM-based offers adaptability, especially for unstructured text.
- Experimental approach: building a data preprocessor via LLM code generation using input-output dataset examples. Steps: define examples; design task framework; use LLM to infer transformations; generate scripts (e.g., Python/SQL); validate/refine; automate/iterate; continuous learning.
Business insights extraction (Section 4):
- Hybrid approach: rule-based filters for patterns/anomalies; LLMs add context and deeper interpretation.
- Insight categories: general anomalous measurement shifts; dimension-specific anomalies; spikes; all-time highs; top dimensions; dimension comparisons.
Overview of approaches (Section 5):
- Rule-based: high precision, resource efficiency, determinism, interpretability, customizability; challenges with scalability, flexibility, rule complexity, missing nuances, maintenance overhead.
- LLM-based: adaptable, handles unstructured data, rich insights, reduced maintenance; challenges with resources, interpretability, precision on structured tasks, data needs, bias risk, and precise math.
Natural language narrative generation (Section 6):
- Rule-based narratives: precise, efficient, consistent, customizable, deterministic; challenges with scalability, rule development complexity, missing nuance, maintenance, rigidity.
- LLM narratives: adaptable, nuanced, scalable, less maintenance, engaging; challenges with compute cost, interpretability, precision for metric-bound narratives, dependence on training data quality, and bias/inaccuracy risks.
Hybrid architectures (Section 7):
- 7.1 LLM-based insight generation from chunked data: chunk unprocessed data to fit token limits; craft prompts per chunk; process sequentially/parallel; synthesize insights. Advantages: scalability, focused analysis, parallelism. Challenges: integrating chunk insights, optimal chunking strategy, resource intensity, prompt engineering complexity.
- 7.2 Sequential data processing and insight generation: preprocess; extract targeted fragments; enrich with expert prompts; generate atomic insights via LLM; summarize into final report. Advantages: targeted analysis, expert guidance, depth. Challenges: prompt complexity, integration of insights, avoiding harmful fragmentation.
- 7.3 Hybrid rule-based + LLM summarization: rules-based engine generates atomic insights; LLM synthesizes into coherent report. Advantages: precision/reliability of rules and high-quality reporting via LLM; efficient division of labor. Challenges: integration complexity, rule maintenance, summarization accuracy.
Benchmarking setup (Section 8):
- Task: extract important business events from time-series datasets as readable insights.
- Data: from 30 corporate Google Analytics 4 and Google Ads accounts via APIs over ~2 years.
- Models: GPT-4 via native API for LLM components.
- Evaluations: precision of mathematical operations; proper name hallucinations; recall of important insights; overall user satisfaction (likes-to-dislikes ratio). Mitigations tested include rule-based precalculation of totals/averages, name hashing, source-specific chunking and summarization.
Key Findings
Benchmark results across rule-based, LLM-only, and hybrid pipelines:
1) Precision of mathematical operations (processing efficiency):
- Rule-based: 100%
- LLM: 63%
- Hybrid (rule-based precalculation + LLM analysis): 87%
2) Proper name hallucinations (number of errors):
- Rule-based: 0%
- LLM: 12%
- Hybrid (name hashing + LLM analysis + hash decoding): 3%
3) Recall of important business insights (processing efficiency):
- Rule-based: 71%
- LLM: 67%
- Hybrid (source-specific chunking + LLM analysis + LLM summarization): 82%
4) Overall user satisfaction (likes-to-dislikes ratio):
- Rule-based: 1.79
- LLM: 3.82
- Hybrid: 4.60
Additional observations:
- Rule-based preprocessing of totals/averages improves LLM precision on metric calculations but increases prompt size, trading recall for precision.
- Name hashing substantially reduces proper name hallucinations in hybrid setups.
- Hybrid pipelines balance precision, recall, and narrative quality better than single-method approaches.
Discussion
The study set out to evaluate whether integrating rule-based systems with LLMs improves business insight generation from structured data relative to purely rule-based or purely LLM approaches. The benchmarking indicates that hybrids consistently mitigate the weaknesses of each method: they approach rule-based precision on mathematical operations while outperforming both rule-based and LLM-only methods on recall and user satisfaction. Specifically, rule-based precalculation addresses LLM weaknesses in numerical precision; hashing names curbs hallucinations; and chunking plus LLM summarization enhances coverage of insights. The architectures demonstrate practical strategies to overcome token limits (chunking), target analysis (fragment extraction with expert prompts), and maintain interpretability and control (rules generating atomic insights).
These results are important for BI workflows where accuracy, coverage, and communicability are all critical. The hybrid approach delivers more actionable narratives aligned with business objectives while preserving determinism where needed (e.g., metric calculations). However, trade-offs remain: increased complexity in system design and integration, dependence on prompt engineering, and higher computational costs. Overall, the findings support adopting hybrid pipelines as a robust, scalable, and adaptable solution for enterprise insight generation, particularly when integrating structured metrics with contextual narrative needs.
Conclusion
In conclusion, hybrid LLM-powered and rule-based systems provide a compelling, balanced pathway for generating accurate, comprehensive, and accessible business insights from structured data. By combining rule-based precision and transparency with LLM adaptability and narrative fluency, organizations can improve recall, reduce hallucinations, maintain numerical accuracy, and enhance user satisfaction. Future research should optimize hybrid pipeline components (e.g., prompt engineering automation, rule maintenance tooling), address computational efficiency, broaden datasets beyond GA4/Ads to other domains, and further evaluate bias and interpretability safeguards. As data complexity grows, such hybrid architectures are poised to become integral to strategic, data-driven decision-making.
Limitations
- Ethical and bias considerations: LLMs can propagate training data biases and produce inaccurate or biased insights; interpretability remains challenging.
- Computational/resource demands: Training, fine-tuning, and running LLMs (especially with chunking/parallelization) require substantial resources.
- Prompt engineering and integration complexity: Effective prompts, chunking strategies, and seamless handoffs between rule-based modules and LLMs are non-trivial and time-consuming to design and maintain.
- Rules maintenance: Rule-based components require ongoing updates to align with evolving business logic and data schemas.
- Numerical precision trade-offs: Precomputing totals/averages improves precision but increases prompt size, potentially reducing recall under token limits.
- Generalizability of benchmarks: Results are based on data from 30 GA4 and Google Ads accounts over ~2 years and one LLM (GPT-4); findings may not generalize to other domains, datasets, or models without further validation.
Related Publications
Explore these studies to deepen your understanding of the subject.

