logo
ResearchBunny Logo
Analyzing the past, improving the future: a multiscale opinion tracking model for optimizing business performance

Business

Analyzing the past, improving the future: a multiscale opinion tracking model for optimizing business performance

S. Sigari and A. H. Gandomi

Discover an innovative unsupervised learning system designed to extract and classify topics along with their emotion scores from text streams, enhancing business decision-making. This groundbreaking research was conducted by Salman Sigari and Amir H. Gandomi.... show more
Introduction

As information continues to grow, firms are changing their strategies to grow their market share. Managers and leaders formulate their strategies based on all the competitive tactics and operational measures. Developing sustained leadership requires understanding all external and internal information (news, media, and social networks), listening to the internal context (internal surveys, career websites, and social networks), analyzing and summarizing information, and communicating the results effectively. The emergence of social media has given online users a place for expressing and sharing their thoughts and opinions on different topics and events. In the market and e-commerce, understanding the role of social media, news, and customer reviews is becoming increasingly critical. Companies and service providers monitor user responses to their products and services on social media platforms such as Twitter, LinkedIn, Facebook, YouTube videos, and news sites.

Discussions about current issues, complaints, and sentiments on the Internet are an excellent source of information. Developing knowledge from extracted data (mostly unstructured) has been shown to have a significant effect on sales and consumer decision-making. Unlike structured data, unstructured texts, audios, and videos are complex and difficult to analyze. Prior work has demonstrated the importance of developing efficient and appropriate methods for mining heterogeneous datasets. Businesses and service providers often lack the knowledge and time to determine if their choice of competing products from numerous websites is the right one. Some approaches identify metaphors and topics at the sentence-level by targeting sentences, surrounding contexts, emotions, and cognitive words; however, recent cognitive neuroscience studies argue that purely structural explanations are insufficient without emotional factors. Qualitative content analysis is common, while quantitative and combined techniques are less used. There is a lack of social science research on benefits of analyzing large corpora with AI, and linguistic theories applied to intersubjective information construction have been insufficient. Language is influenced by embodied cognition. Text mining can enable testing existing or new research questions with rich, context-rich data.

To collect and analyze large heterogeneous, unstructured data, multiscale modeling and automation are necessary to reduce uncertainty about information, products, and services. Data must be classified automatically in real time as positive, negative, or neutral. Social media content characteristics pose challenges to NLP. This study addresses issues by presenting a competitive intelligence process integrating information to monitor the market environment and predict customer behavior, and reviews recent literature on online text mining for market prediction.

The following contributions are made as a result of the work:

  1. We provide a brief overview of relevant economic and information technology concepts and their relationship to proposed solutions.
  2. We review recent significant published literature relevant to the project.
  3. We describe an online data collection and classification method for creating BI analysis for strategic decision-making by detecting and extracting subjective information from human-generated comments, feedback, and suggestions.
  4. We present findings and the proposed feature engineering process, outlining how data are transformed into knowledge.
  5. We conclude with findings and potential future research directions.
Literature Review

Background: Information can be extracted from text using statistical analysis, computational linguistics, and machine-learning techniques. Human-generated text can be converted into meaningful information and summaries to support evidence-based decisions. Despite the wealth of information in web archives, increasing text volume and noise make analysis and extraction difficult. Integrating machine learning and NLP has predicted ROI, e.g., relationships between script features and movie success with high accuracy.

Human behavioral coding: Automatic sentiment analysis can be combined with human coding to capture behavioral, cultural, and contextual nuances from large text collections. Supervised learning constructs training sets and learning models to infer labels, enabling conceptually coherent models, performance measurement, and validation. Supervised methods have been applied in multiple sentiment analysis contexts.

Behavioral-economics: Price reflects perceived value; media can affect market dynamics through coverage. Market participants are affected by cognitive biases. Investor sentiment (optimism/pessimism) influences behavior and market valuations.

Social network text mining techniques: Prior work on Twitter sentiment extracted emotion tokens and classified emotions but faced limits in representation and real-time stream handling. Real social networks are heterogeneous, often with key leaders influencing consensus. Methods have examined social influence and popularity but often without polarity. Various approaches combine lexicon-based and supervised learning, random-walk polarity inference, rule-based classification, multilingual processing with minimal linguistic resources, and collaborative filtering for sparsity, each with advantages and limitations.

Polarity mining: Emotion/sentiment/attitude/subjectivity mining extract and analyze opinions from unstructured reviews about products, organizations, individuals, and events. Sentiment analysis operates at document-, sentence-, and aspect-levels; aspect-based methods identify sentiments tied to specific aspects, offering granular insights. Noise removal from raw data is necessary. Both consumers and businesses benefit from polarity mining. Supervised approaches treat it as classification; unsupervised approaches infer semantic orientation of words/phrases and aggregate to text polarity. Studies show differing impacts of positive vs. negative messages and biases toward negativity; limited research exists on shifting negative reviews to positive over time.

Methodology

Methodology: algorithms and problem formulation. The proposed real-time decision-support model combines cognitive intelligence with NLP to turn online resources and social media into real-time productivity by transforming unstructured text into subjective knowledge. The algorithm has been collecting data from online sources for more than two years and updating a data lake monthly. The model understands data types, performs optimal feature engineering to eliminate redundancy by finding highly correlated features early, and transforms features into tables for analysts and end users to learn from extracted topics and monitor improvements in real time as textual data is continuously gathered.

Data collection and preprocessing. Relevant data sources include news websites, blogs and forums, official websites, corporate documents (reports, internal surveys, earnings call transcripts), personal text (chats, emails, SMS, tweets), and open-ended survey responses. Text is collected via APIs or scraping. Preprocessing includes keeping only relevant text, removing unimportant characters (extra spaces, formatting tags), segmentation, lowercasing, and stemming/lemmatization. Stopwords are removed. Texts are transformed into a document-term matrix; term weights use TF-IDF to emphasize specificity.

Text mining for service experience and cognitive DSS. Cognitive decision-support systems leverage cognitive processes (intelligence, perception), situation awareness, and mental models to aid decisions in uncertain, dynamic contexts. The model detects and extracts subjective information from reviews and news, evaluating texts and features for richness before recommendation.

Proposed algorithm. The algorithm pipeline: automatically identify relevant sources; clean text (retain relevant elements, remove noise, lemmatize, segment into words, sentences, paragraphs); classify and assign topics to each review; validate topics; compute sentiment scores at review, paragraph, and sentence levels; aggregate topic modeling with sentiment scores at the service level; compute the most accurate sentiment score per topic; return top-N topics with weighted sentiment per class. Distinct sentiment scores are assigned at sentence and paragraph levels; the final topic sentiment is a weighted combination. Topic modeling uses LDA to represent documents by topic probabilities; TF-IDF features feed LDA, which partitions the corpus into topics. Feature engineering prioritizes informative features and mitigates the curse of dimensionality by reducing correlated or low-utility features.

Text to structured data. Text is ingested from varied formats (JSON, XML, PDF, MS Word, HTML). Advanced feature engineering integrated with NER, n-grams, and POS tagging extracts names and locations; additional features include reviewer location, occupation, tenure, engagement metrics (likes, shares, impressions), and publication metadata (date, source). A polarity score is added. A compound sentiment score is computed by aggregating topic probabilities, sentiment scores, and word contributions across documents.

Transforming information into knowledge. LDA topic modeling interprets latent topics from large-scale documents, creating a vector space of topics with document-topic probabilities. Two feature types are emphasized: bigrams (e.g., adjective–noun, noun–noun collocations like “Service Delivery”) and LDA topics. Simplified NLP and statistical techniques extract these collocations for downstream classification and sentiment aggregation. Feature selection balances predictive performance, computational efficiency, and avoidance of highly correlated features that add cost without benefit.

Key Findings

Datasets and features. The corpus focused on news and online business service reviews collected over more than two years. Category counts and engineered features (Table 1):

  • News: 3,310 documents; original features: 6; optimized features: 14
  • Social networks: 6,960 documents; original features: 13; optimized features: 24
  • Online business review websites: 2,103 documents; original features: 6; optimized features: 14
  • Surveys and emails: 1,988 documents; original features: 5; optimized features: 17 Total documents: 14,361.

Correlation insights (Fig. 5). Notable correlations include Reviewer Score with Sentiment Score (0.48), Review Source with Reviewer Location (0.45). The correlation matrix underscores the need to manage highly correlated features to avoid negative impacts on classifiers.

Classification performance (Tables 2–3). Top 10 negative topics (examples):

  • Service delivery: Precision 0.92, Recall 0.89, F1 0.90, Support 412
  • Call center: P 0.89, R 0.86, F1 0.88, Support 248
  • Customer service: P 0.91, R 0.89, F1 0.90, Support 213 Top 10 positive topics (examples):
  • Service delivery: P 0.96, R 0.91, F1 0.93, Support 198
  • Proactive solution: P 0.91, R 0.93, F1 0.92, Support 193
  • Action plan: P 0.96, R 0.92, F1 0.94, Support 68

Compound sentiment (Tables 4–5). Weighted negative topic scores (top):

  • Service delivery: −0.3422; Call center: −0.3118; Customer service: −0.3011; Product management: −0.2332; Innovation solution: −0.2109 Weighted positive topic scores (top):
  • Service delivery: 0.4711; Proactive solution: 0.4588; Management team: 0.427; Innovative solution: 0.3182; Follow-up status: 0.2406

Temporal trends (Figs. 6–7). From January 2017 to March 2018, topics and sentiment scores were tracked; from March 2018 onward, live modeling introduced accuracy scores for decision-makers. Over time, negative topics decreased while positive topics increased post-implementation. Consistent with cognitive psychology, negative effects moderate more slowly than positive, reflected in trend differences.

Operational insights. The model integrates location, time, and reviewer position to track opinions; generates high-granularity tables suitable for real-time BI tools; and supports market opinion tracking with continuous online–offline data updates.

Discussion

The study addresses the need for real-time, automated analysis of heterogeneous, unstructured textual data to inform business decisions. By combining LDA-based topic modeling with multiscale sentiment scoring at sentence, paragraph, and document levels and aggregating to topics with weighted sentiments, the system captures nuanced, aspect-level opinions that traditional word-level sentiment may miss. The feature engineering pipeline extracts contextual attributes (location, occupation, tenure, engagement metrics) to enrich competitive intelligence.

Empirically, the system achieves strong classification metrics across top positive and negative topics and produces interpretable compound sentiment scores that align with operational categories (e.g., service delivery, call centers). Correlation analyses inform feature selection and model parsimony. Temporal analyses indicate that, after deployment, negative-topic prevalence declines and positive-topic prevalence increases, suggesting actionable feedback loops where identified issues can be addressed and strengths reinforced. Overall, the findings support the utility of multiscale opinion tracking for improving short- and long-term business performance and decision-making.

Conclusion

This work presents an end-to-end, cognitive, multiscale opinion-tracking model that integrates NLP, LDA topic modeling, and weighted sentiment aggregation to transform unstructured text into actionable business intelligence. Contributions include: an online–offline data pipeline with advanced preprocessing and feature engineering; multilevel sentiment computation aggregated to topics; and real-time tracking and reporting for decision support. Results show robust topic classification, interpretable compound sentiment scores for key service dimensions, and favorable temporal trends following model implementation.

Future directions include extended longitudinal evaluation to assess sustained impact, integration of multimodal sources (videos, voice recordings from platforms like YouTube, TikTok, and call centers) to capture richer emotional signals, and continued benchmarking with human-in-the-loop validation to refine accuracy and reliability.

Limitations
  • Data availability is limited; datasets are available only upon reasonable request to the corresponding author.
  • Ideal, fully representative datasets are expensive and difficult to obtain; the model integrates online and offline data but coverage may be incomplete.
  • Live modeling began in March 2018; evaluation relies on periodic human-supervised benchmarking, and automated precise accuracy calculation is planned for future iterations.
  • Current implementation focuses primarily on text; multimodal extensions (audio/video) are proposed but not yet incorporated, potentially limiting emotional nuance capture.
  • Correlated features can negatively impact classifiers if not controlled; while feature engineering mitigates this, residual correlations may affect generalizability.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny