
Physics
Machine learning of high dimensional data on a noisy quantum processor
E. Peters, J. Caldeira, et al.
Explore the cutting-edge of data analysis with quantum kernel methods! This research, conducted by Evan Peters and colleagues, demonstrates the implementation of a quantum kernel classifier on high-dimensional cosmological data, achieving impressive classification accuracy using Google's Sycamore processor.
~3 min • Beginner • English
Introduction
The study investigates whether quantum kernel methods (QKMs) can effectively learn from real, high-dimensional classical datasets on noisy intermediate-scale quantum (NISQ) hardware. The motivation stems from the potential of QKMs to map input data into high-dimensional Hilbert spaces where inner products can define powerful kernels for support vector machines, while the practical scalability, noise robustness, and circuit design challenges on real devices remain open. The authors extend QKM-based supervised learning up to 17 hardware qubits with only nearest-neighbor connectivity on Google’s Sycamore processor, targeting 67-dimensional supernova data (PLASTICC) without dimensionality reduction. They aim to design a feature-map circuit that yields sufficiently large kernel magnitudes to mitigate shot noise, incorporate error mitigation tailored to kernel estimation, and demonstrate hardware performance comparable to noiseless simulations, thereby assessing the practicality and robustness of QKMs on NISQ devices.
Literature Review
The paper situates QKMs within prior theoretical and experimental work: recent results show potential quantum speedups for specific data classes and use kernels to quantify data’s computational power in quantum ML. Prior experiments largely used artificial or heavily preprocessed data, few qubits, or unsuitable connectivity for NISQ, though recent multi-qubit applications emerged in high energy physics. Various feature maps for QKMs have been proposed, and hardware-efficient variational circuits have informed circuit design choices. Classical SVMs and the kernel trick are foundational, with the radial basis function (RBF) kernel as a standard baseline. The authors reference benchmarking methods for hardware fidelity and note software frameworks such as Cirq and TensorFlow Quantum supporting simulations and exploration of quantum ML. They also cite emerging proposals and numerical validations suggesting potential quantum advantage on engineered datasets using QKMs, motivating further investigation into noisy kernels and real-data performance.
Methodology
- Problem and model: Supervised binary classification using an SVM trained on a quantum kernel. Input x in R^d is mapped to a quantum state |ψ(x)〉 = U(x)|0〉 via a non-variational feature-map circuit. The kernel is k(x_i, x_j) = |〈ψ(x_i)|ψ(x_j)〉|^2, estimated by executing U(x_i)U(x_j)† and sampling the probability of the all-zeros bitstring.
- Dataset: PLASTICC training data (simulated astronomical time-series from the Rubin Observatory) restricted to two classes (types II and Ia supernovae). Time series (six wavelength bands) are transformed into fixed-size feature vectors through statistical summaries: counts; min/max/mean/median/std/skew for flux and flux error; sums and skew of flux/flux-error ratio and flux times squared flux ratio; mean/maximum time between measurements; host galaxy redshifts (spectroscopic, photometric); sky position; first two Fourier coefficients per band plus kurtosis and skewness. Total of 67 features per object. Preprocessing includes converting lognormal-distributed spectral inputs to log scale and normalizing all inputs to [-1, 1]; no dimensionality reduction.
- Circuit ansatz: A hardware-efficient structure using layers of local single-qubit rotations interleaved with native iSWAP entangling gates on Sycamore. Each encoding block consists of H followed by three rotations (Rz, Ry, Rz), each parameterized by different input features; the total number of parameterized gates matches the 67-dimensional input. Circuit width and depth can vary independently of feature dimension; nearest-neighbor connectivity suffices. The ansatz yields large inner products (median kernel magnitude ≥ 1e-1), enabling low statistical error in kernel estimates.
- Kernel estimation under noise: For each pair (x_i, x_j), run R repetitions of U(x_i)U(x_j)†, tally v_0 counts of the all-zeros bitstring to estimate K̂_ij. Noise (gate infidelity, readout error) causes K to deviate from the ideal K_t. Error mitigation targeted to kernel estimation includes: (i) scheduling entangling layers in parallel; (ii) readout error mitigation via perturbative correction to approximate P(0…0) with polynomial overhead; (iii) choosing sufficient shots per circuit (R = 5000) to control sampling error.
- Training/evaluation: Build the m×m kernel matrix over the training set T, then train an SVM with decision function f(x) = sign(Σ_i α_i y_i k(x_i, x) + b). Hyperparameter C is chosen via leave-one-out cross-validation on T to maximize mean LOOCV score, then fixed for test evaluation. Complexity scales as O(m^2) for training; classifying v test points requires m v kernel evaluations, though the count can be reduced if few support vectors are used (not the case here).
- Dataset selection for hardware: Learning curves from noiseless Cirq simulations guided the choice of training set size to balance generalization and compute cost. For hardware, the authors generated a 1000×1000 simulated kernel matrix, repeatedly performed 4-fold cross-validation on a size-280 subset, and selected m=210 training and v=70 test points from folds with validation accuracy closest to the mean across trials/folds, to avoid overstating performance due to sampling variance.
- Hardware platform and runs: Google Sycamore superconducting processor with 23 active qubits, grid connectivity; single-qubit Pauli gates >99% randomized benchmarking fidelity; native iSWAP entangling gates (ViSWAP) with typical cross-entropy benchmarking fidelities >97%. Experiments run for n ∈ {10, 14, 17} qubits on the same 67-dimensional data with balanced classes, using the selected qubit lines (heuristic based on calibration). Parallel execution of entangling layers improved performance. Readout error correction was implemented but did not consistently improve classification accuracy. Each experiment used R=5000 shots per circuit; total circuits per n amounted to ~1.83×10^8 experiments, taking ~16 hours per n.
- Postprocessing and baselines: Noiseless simulations provided baseline learning curves and comparisons against classical SVM with RBF kernel (γ optimized via adaptive grid search).
Key Findings
- Feasibility and scale: Implemented a quantum kernel SVM on real, high-dimensional data (67 features) using up to 17 qubits with only nearest-neighbor connectivity on Google’s Sycamore processor.
- Accuracy: Hardware test accuracies were competitive with noiseless simulations across n = 10, 14, 17 qubits; performance was not restricted by qubit count. No explicit accuracy percentages are provided in text, but hardware results tracked the noiseless baseline closely and outperformed random guessing.
- Noise robustness: Despite limited circuit fidelity suppressing observed bitstring probabilities (and thus kernel entries) by about 50–70%, the classifier maintained reasonable accuracy. The SVM decision is invariant to global scaling K → rK, contributing to robustness.
- Kernel magnitudes and sampling: The circuit ansatz produced large kernel magnitudes (median ≥ 1e-1), enabling low relative statistical error with R = 5000 shots per kernel element.
- Support vectors: A large fraction of training points were selected as support vectors: ~87% in noiseless simulations and ~95% on hardware, consistent with complex decision boundaries and, on hardware, additional noise in K.
- Resource usage: Approximately 1.83×10^8 circuit executions per qubit-count setting and ~16 hours of processor time per setting; 5000 repetitions per circuit sufficient to mitigate sampling error.
- Error mitigation observations: Parallelizing entangling layers improved performance. Implemented readout error correction did not reliably improve classification performance in this task.
- Classical comparison: Noiseless quantum kernel performance on 17 qubits was competitive with a classical SVM using an optimized RBF kernel on the same data subsets; for n=10,14, simulated test accuracies were statistically indistinguishable from optimized RBF performance.
Discussion
The results show that quantum kernel methods can process real, high-dimensional data on a NISQ device without dimensionality reduction and achieve test accuracies comparable to noiseless simulations, addressing concerns about scalability and noise sensitivity. Designing a feature-map circuit that yields sufficiently large kernel values proved crucial to controlling shot noise, while the SVM’s invariance to kernel rescaling and the use of tailored error mitigation contributed to robustness against hardware-induced suppression of probabilities. The observation that many training points become support vectors indicates complex decision boundaries and suggests that noise may inflate the effective model capacity on hardware. The experiments demonstrate that tens of qubits can be effectively utilized for classification, motivating further theoretical analysis of noisy quantum kernels, the role of kernel magnitude distributions, shot budgets, and strategies to manage overfitting and generalization in the presence of noise.
Conclusion
This work demonstrates an intermediate-scale implementation of quantum kernel classification on a real scientific dataset using up to 17 qubits, with performance on hardware comparable to noiseless simulations despite device noise and without quantum error correction. The proposed hardware-efficient feature map accommodates high-dimensional inputs and maintains resolvable kernel magnitudes, enabling effective SVM training. While not a demonstration of quantum advantage, the results suggest that QKMs may achieve strong classification performance on near-term devices. Future directions include: (i) theoretical development of noisy kernel analysis, including the interplay of kernel magnitude distributions and shot complexity; (ii) identifying natural datasets whose correlations are inherently difficult to represent classically (e.g., quantum many-body data near criticality, outputs from quantum sensing/communication, or data from quantum solutions of linear/nonlinear differential equations); (iii) leveraging and extending software frameworks like TensorFlow Quantum to explore larger scales and different feature maps; and (iv) pursuing empirical demonstrations of quantum advantage on appropriately structured datasets.
Limitations
- Hardware noise and lack of quantum error correction limit circuit fidelities, suppressing measured kernel values by ~50–70%.
- Readout error mitigation did not consistently improve classification accuracy for this task.
- A large fraction of support vectors (≈87% in simulation, ≈95% on hardware) suggests complex boundaries and potential sensitivity to noise, increasing inference cost.
- Training complexity scales as O(m^2), constraining feasible training set sizes and total experiment time; each qubit-count setting required ~1.83×10^8 experiments and ~16 hours.
- Limited dataset sizes for hardware runs prevented robust characterization of generalization error (e.g., via extensive bootstrapping).
- The circuits used are not designed to demonstrate quantum advantage; classical hardness of the employed kernel was not established.
- Some author-affiliation details and device qubit availability constrained to line connectivity (17 contiguous qubits), potentially limiting circuit topology exploration.
Related Publications
Explore these studies to deepen your understanding of the subject.