logo
ResearchBunny Logo
Low-cost and efficient prediction hardware for tabular data using tiny classifier circuits

Computer Science

Low-cost and efficient prediction hardware for tabular data using tiny classifier circuits

K. Iordanou, T. Atkinson, et al.

This innovative research by Konstantinos Iordanou, Timothy Atkinson, Emre Ozer, Jedrzej Kufel, Grace Aligada, John Biggs, Gavin Brown, and Mikel Luján presents a groundbreaking methodology for automatically generating tiny predictor circuits for tabular data classification. With a focus on maximizing accuracy while minimizing hardware and power usage, this study reveals that these compact classifiers can significantly outperform traditional machine learning techniques in efficiency.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses whether accurate, low-cost, and resource-efficient hardware classifiers can be automatically synthesized directly from tabular data without mapping to predefined ML models or conventional accelerators. Conventional DNN-based approaches excel on homogeneous data (images, audio, text) and are supported by programmable accelerators, but tabular data feature weakly correlated, heterogeneous features where tree-based methods often outperform deep learning. The authors propose auto tiny classifiers: an evolutionary approach that directly searches the space of combinational logic circuits for tabular classification, producing verified hardware (sea-of-gates) that can be auto-translated to RTL. The goal is to achieve competitive predictive accuracy while drastically reducing area and power, enabling always-on, near-sensor, and flexible-electronics applications.
Literature Review
The paper situates its work among AutoML, Neural Architecture Search (NAS), and Neural Architecture and Implementation Search (NAIS). AutoML and NAS optimize models/architectures but require manual RTL translation and verification; NAIS co-designs a chosen NN with a known accelerator. For tabular data, gradient-boosted trees (XGBoost, CatBoost) are strong baselines, and well-tuned MLPs can be competitive with state-of-the-art tree models, often outperforming specialized deep tabular models like TabNet. Prior graph-based genetic programming (e.g., Cartesian GP) and evolutionary circuit synthesis typically target fully known truth tables; here, only a fraction of the truth table (labeled samples) is known, and circuits are evolved directly for supervised classification. Related work on flexible and near-sensor computing motivates low-cost, task-specific hardware for resource-constrained scenarios.
Methodology
The approach adapts Evolving Graphs by Graph Programming (EGGP) with a 1+λ evolutionary scheme to synthesize combinational classifier circuits from tabular data. Representation: circuits are directed acyclic graphs with input nodes (features after binary encoding), function nodes (logic operators; symmetric, same arity), and output nodes (classification bits). Inactive nodes (no path to outputs) enable neutral drift. Initialization: given number of function nodes n and function set F, function nodes are instantiated with random functions and connected uniformly at random to available prior nodes; outputs connect to any existing node. The hyperparameter n controls graph size. Mutation: with mutation rate p, the number of node and edge mutations are drawn from binomial distributions B(n,p) and B(E,p). Node mutation replaces a node's function with a different one from F; edge mutation retargets an edge to a new valid source while avoiding cycles and no-ops. Fitness: balanced accuracy on labeled data. Training-set fitness guides parent replacement; validation-set fitness selects the best-discovered solution to mitigate overfitting. Termination occurs if validation fitness fails to improve by at least a threshold y within κ generations, or when a maximum generation G is reached. Hyperparameters include λ (children per generation) and p (mutation rate). Data encoding: numerical and categorical features are encoded to bits using user-selected strategies: quantization (equal-width bins), quantiles (equal-frequency bins), one-hot, or Gray encoding; users choose bits per input (typically 2 or 4). For comparisons, tiny classifiers report the best accuracy over encodings with 2 or 4 bits per input; MLP baselines are quantized to 2 bits for a resource-optimized comparison. Hardware toolflow (Auto tiny classifiers): the evolved Boolean expressions are auto-translated into synthesizable Verilog (assign-based RTL with wrappers). Standard EDA flow produces netlists and reports (area/power/timing). For full chip, place-and-route, clock tree, DRC/LVS, and GDS are generated. Designs include local input/output buffers sized to the number of encoded input bits and output bits (1 bit for binary, multiple bits for multiclass). The method targets both silicon (45 nm PDK for synthesis) and flexible thin-film transistor processes (0.8 µm FlexIC) for fabrication and testing. Evaluation protocol: 33 tabular datasets (OpenML, UCI, Kaggle) split 80/20 train/test; validation used during evolution (50% of training for validation). Baselines: AutoGluon (XGBoost, TabNeuralNet, NNFastAITab), Google TabNet, and NAS-optimized MLPs (best and smallest; also 2-bit quantized versions). Hyperparameter sweeps assess gate-count constraints (50–300 gates), termination generations, and termination iterations. ASIC synthesis compares tiny classifiers to hardware-implemented baselines (2-bit quantized smallest MLP and XGBoost) for two datasets (blood: binary; led: 10-class) using Synopsys Design Compiler at 45 nm, 1.1 V, 1 GHz; FlexIC implementations at 0.8 µm are realized and tested via Cadence flows.
Key Findings
- Accuracy across 33 datasets: AutoGluon XGBoost averages 81% accuracy; tiny classifiers average 78% (second highest). In tenfold cross-validation, tiny classifiers show low variance in accuracy distribution, indicating robustness relative to XGBoost. - Comparison to MLPs: The best non-quantized NAS MLP achieves ~83% average accuracy; its 2-bit quantized version matches tiny classifiers. The smallest non-quantized NAS MLP averages ~80%, dropping to ~75% when 2-bit quantized. Thus, tiny classifiers perform on par with the 2-bit quantized best MLP and exceed the 2-bit smallest MLP. - ASIC (45 nm) synthesis (standalone blocks, 1.1 V, 1 GHz, including I/O buffers): tiny classifiers consume 0.04–0.97 mW and use 11–426 NAND2-equivalent gates across designs. For blood/led baselines: 2-bit quantized MLP consumes 34–38 mW (≈87–119× higher power) and area ≈171–278× larger than tiny classifiers; XGBoost power is ≈3.9× (blood) and ≈8.0× (led) higher; area is ≈8.0× (blood) and ≈18.0× (led) larger. - FlexIC (0.8 µm) full-chip implementations (Extended Data Table 2): • Tiny classifier (blood): 0.54 mm², 0.32 mW, 350 kHz, 150 NAND2-eq. • Tiny classifier (led): 0.37 mm², 0.25 mW, 440 kHz, 105 NAND2-eq. • XGBoost (blood): 5.4 mm², 4.12 mW, 165 kHz, 1,520 NAND2-eq. • XGBoost (led): 27.74 mm², 18.6 mW, 130 kHz, 7,780 NAND2-eq. Result: tiny classifiers are 10× (blood) to 75× (led) smaller, consume ~13× (blood) to ~75× (led) less power, and run 2–3× faster than XGBoost. - Yield: Tiny classifier FlexICs demonstrate 6× higher yield than XGBoost FlexICs on the blood dataset, implying substantially lower unit cost. - Area scalability: Tiny classifier area is similar between binary and multiclass cases (e.g., led tiny classifier smaller than blood), whereas XGBoost area grows substantially with number of classes (led ≈5× blood).
Discussion
The results demonstrate that evolving small combinational circuits directly from tabular data can achieve competitive predictive accuracy to state-of-the-art methods while drastically reducing hardware resources and energy. By operating in the Boolean function space (akin to decision trees) and using graph-based genetic programming, the approach avoids reliance on predefined ML models or accelerators and can escape local minima that may hinder gradient-based methods. Hardware evaluations confirm orders-of-magnitude savings in area and power versus MLPs and significant gains versus XGBoost, with higher clock rates and markedly improved yield in flexible electronics. This directly addresses the goal of efficient, accurate classification for resource-constrained, always-on, and near-sensor deployments, particularly in applications like smart packaging where programmability is not essential and low-cost customization is valuable. The stable accuracy distribution and minimal area variation across class counts further support robustness and scalability of the approach for tabular tasks.
Conclusion
The paper introduces auto tiny classifiers, an evolutionary methodology that directly synthesizes classifier circuits (≤300 gates) from tabular data, auto-generating RTL suitable for ASIC and flexible-electronics implementation. Across 33 datasets, tiny classifiers attain accuracy close to top baselines and match 2-bit quantized best MLPs, while significantly reducing area and power. ASIC synthesis (45 nm) and FlexIC implementations (0.8 µm) show 8–18× (ASIC) and 10–75× (FlexIC) area reductions, 4–8× (ASIC) and 13–75× (FlexIC) power reductions versus ML baselines, with 2–3× higher clock speeds and 6× higher yield on FlexICs. Future directions include extending the evolutionary framework beyond tabular data (e.g., time-series via recurrent graph-based GP), exploring multiobjective optimization to co-optimize accuracy, gate count, and power, and integrating tiny classifiers as tightly or loosely coupled accelerators in near-sensor and IoT systems.
Limitations
- Hardware baselines (XGBoost and 2-bit quantized smallest MLP) were implemented and synthesized in detail for two datasets (blood, led) due to manual design effort; broader hardware comparisons across all 33 datasets were not performed. - The evolutionary fitness primarily optimizes balanced accuracy; power/area are not directly included in the objective, though multiobjective extensions are discussed but not explored experimentally. - Input encoding strategy and bit-width (2 or 4 bits) can influence accuracy-resource trade-offs; only the best result per dataset across selected encodings is reported. - Code and data artifacts for full replication are available upon reasonable request rather than fully open-sourced in the paper, potentially limiting immediate reproducibility.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny