Business

Hybrid LLM/Rule-based Approaches to Business Insights Generation from Structured Data

A. Vertsel and M. Rumiantsau

This paper, conducted by Aliaksei Vertsel and Mikhail Rumiantsau, delves into innovative hybrid approaches that merge rule-based systems and Large Language Models (LLMs) to extract actionable business insights from complex datasets. It highlights how this combination can overcome the limitations faced by traditional systems and standalone models, promising enhanced data extraction and insights generation.

00:00

Playback language: English

Index

Introduction

The increasing complexity and volume of business data necessitate advanced analytical techniques. Traditional rule-based systems, while reliable for structured data, struggle with the dynamism and intricacy of modern datasets. Conversely, LLMs excel at pattern recognition and predictive analytics but might lack the precision needed for specific business applications. This paper introduces hybrid approaches that leverage the strengths of both rule-based systems and LLMs to overcome these limitations and extract more valuable, actionable business insights. The research investigates the effectiveness of integrating LLMs with rule-based systems, focusing on the challenges and opportunities presented by this hybrid approach in the context of business intelligence.

Literature Review

The paper draws upon existing research in interpretable AI techniques like LIME, rule-based systems, and supervised document classification (Liu et al., 2022). It also references studies on understanding user interests and needs through user journey analysis using LLMs (Christakopoulou et al., 2020). The work acknowledges the existing literature on using LLMs for question answering on enterprise SQL databases (A Benchmark To Understand The Role Of Knowledge Graphs On Large Language Model's Accuracy For Question Answering On Enterprise Sql Databases, arXiv:2311.07509), SQL-to-text generation using graph-to-sequence models (SQL-to-Text Generation with Graph-to-Sequence Model, arXiv:1809.05255v2), and the ethical considerations associated with using LLMs in HCI data work (LLMs in HCI Data Work: Bridging the Gap Between Information Retrieval and Responsible Research Practices, arXiv:2403.18173). Furthermore, the paper cites research on leveraging graph structural information through prompts (Can LLMs Effectively Leverage Graph Structural Information through Prompts, and Why?, arXiv:2309.16595) and hybrid approaches to aspect-based sentiment analysis (A Hybrid Approach To Aspect Based Sentiment Analysis Using Transfer Learning, arXiv:2403.17254). It also mentions research on enhancing legal document retrieval using multi-phase approaches with LLMs (Enhancing Legal Document Retrieval: A Multi-Phase Approach with Large Language Models, arXiv:2403.18093) and improved chain-of-thought prompting for LLMs (ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting, arXiv:2403.14312). Finally, it references relevant patents on using semantic graphs for business situations (Methods and systems of facilitating provisioning contexts for business situations using a semantic graph, US20230410016A1) and AI-based natural language interfaces for data insights (Generating actionable insight information from data sets using an artificial intelligence-based natural language interface, US20230281228A1).

Methodology

The paper proposes a hybrid approach combining rule-based systems and LLMs for business insight generation. It details three key architectures for implementing this hybrid approach: 1. **LLM-Based Insight Generation from Chunked Data:** This architecture addresses LLM token limitations by dividing large datasets into smaller chunks, processing each with a tailored prompt designed to extract specific insights. The chunks are then processed sequentially or in parallel by the LLM. The generated insights are compiled to form a comprehensive analysis. 2. **Sequential Data Processing and Insight Generation:** This approach involves a structured, multi-step process. It begins with data preprocessing, followed by the extraction of specific, relevant data fragments. These fragments are then enriched with expert-crafted prompts to guide the LLM's analysis. The LLM generates atomic insights, which are then summarized into a final report. 3. **Hybrid Rule-Based and LLM Insight Generation:** This hybrid model employs a rule-based engine for the initial generation of atomic business insights. An LLM then summarizes these insights into a coherent final report, leveraging its natural language generation capabilities. This combines the precision of rule-based analysis with the narrative capabilities of LLMs. The paper also describes a process for building a data preprocessor using an LLM trained on input and output dataset examples. This involves defining input/output examples, designing a preprocessing task framework, using the LLM for task identification, generating preprocessing scripts, validating and refining the code, automating the process, and allowing for continuous learning and improvement. The methodology includes a benchmarking process comparing rule-based, LLM-only, and hybrid approaches across four metrics: precision of mathematical operations, proper name hallucinations, recall of important insights, and overall user satisfaction with generated reports. The benchmarking used data from 30 corporate Google Analytics 4 and Google Ads accounts over approximately two years, with GPT-4 as the LLM.

Key Findings

The benchmarking results revealed the superiority of the hybrid approaches over purely rule-based or LLM-only methods. Specifically: * **Precision of Mathematical Operations:** Rule-based systems achieved 100% precision, while LLMs achieved 63%, and the hybrid approach (rule-based precalculation + LLM analysis) reached 87%. * **Number of Proper Name Hallucinations:** Rule-based systems had 0% hallucinations, LLMs had 12%, and the hybrid approach (name hashing + LLM analysis + hash decoding) had 3%. * **Recall of Important Business Insights:** Rule-based systems achieved 71% recall, LLMs achieved 67%, and the hybrid approach (source-specific data chunking + LLM analysis + LLM summarization) reached 82%. * **Overall User Satisfaction:** Rule-based systems had a likes-to-dislikes ratio of 1.79, LLMs had 3.82, and the hybrid approach had 4.60, indicating significantly higher user satisfaction with the hybrid model's reports. These findings demonstrate that the hybrid methods effectively combine the strengths of both rule-based systems and LLMs, leading to improved accuracy, reduced errors, higher recall, and increased user satisfaction. The paper highlights the importance of carefully designed prompts and the strategic use of data chunking in optimizing the performance of the hybrid systems.

Discussion

The results strongly support the efficacy of the proposed hybrid approaches. The significant improvements observed in precision, recall, and user satisfaction compared to purely rule-based or LLM-only methods highlight the synergistic potential of integrating both systems. The hybrid architectures effectively mitigate the limitations of each individual approach. The success of the name hashing technique in reducing hallucinations underscores the importance of addressing the specific challenges posed by LLMs. The high user satisfaction with the hybrid reports reflects the improved comprehensiveness, accuracy, and readability achieved through the combined approach. The findings suggest that carefully designed hybrid systems can significantly enhance the value and impact of business intelligence analysis, leading to more informed and effective decision-making.

Conclusion

This research demonstrates the substantial benefits of hybrid LLM/rule-based systems for business insights generation. The proposed architectures and benchmarking results showcase the enhanced accuracy, reduced errors, and increased user satisfaction achievable through a strategic combination of rule-based precision and LLM adaptability. Future research could focus on optimizing the data chunking strategies, developing more sophisticated prompt engineering techniques, and expanding the application of these methods to a broader range of business data types and analytical tasks.

Limitations

While the hybrid approaches show promising results, limitations exist. The performance of the hybrid systems is dependent on the quality of the rules in the rule-based component and the quality of the prompts used to guide the LLM. The computational cost of using LLMs can be substantial, particularly for large datasets. Further research is needed to optimize the computational efficiency of the hybrid systems. Additionally, the generalizability of the findings might be limited by the specific datasets and LLM used in the benchmarking process.

Related Publications

Explore these studies to deepen your understanding of the subject.

Business

Hybrid LLM/Rule-based Approaches to Business Insights Generation from Structured Data

A. Vertsel and M. Rumiantsau

Business

From insights to impact: leveraging data analytics for data-driven decision-making and productivity in banking sector

R. Gul and M. A. S. Al-faryan

Computer Science

Exploring Innovative Approaches to Synthetic Tabular Data Generation

E. Papadaki, A. G. Vrahatis, et al.

Medicine and Health

Population Pharmacokinetic and Exposure–Response Analysis of Finerenone: Insights Based on Phase IIb Data and Simulations to Support Dose Selection for Pivotal Trials in Type 2 Diabetes with Chronic Kidney Disease

N. Snelder, R. Heinig, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny