Computer Science

Understanding quantum machine learning also requires rethinking generalization

E. Gil-fuster, J. Eisert, et al.

Discover groundbreaking insights from Elies Gil-Fuster, Jens Eisert, and Carlos Bravo-Prieto as they explore the unexpected generalization capabilities of quantum machine learning models. Their systematic experiments challenge traditional understanding and reveal new dimensions of memorization in quantum neural networks.

00:00

Playback language: English

Index

Introduction

Quantum computing holds promise for solving computational problems beyond classical capabilities. A key area of exploration is quantum machine learning (QML), aiming to leverage quantum devices for machine learning tasks. Parameterized quantum circuits (PQCs), also known as quantum neural networks (QNNs), are central to this field. While quantum advantages have been proven for specific fine-tuned QML problems, these advantages often rely on full-scale quantum computers. Research on PQCs focuses on expressivity, trainability, and generalization. Generalization, ensuring good performance on unseen data, is paramount. The classical machine learning understanding of generalization, largely based on Vapnik's work, has been challenged recently. Studies on large-scale classical neural networks show that conventional complexity measures like the VC dimension and Rademacher complexity are insufficient to explain their generalization success. These networks, overparameterized with significantly more trainable parameters than data dimensions, defy traditional generalization bounds. This work investigates whether similar randomization tests, revealing the limitations of uniform generalization bounds, would yield analogous outcomes for near-term QML models.

Literature Review

Classical machine learning theory, largely based on Vapnik's work, focused on uniform generalization bounds. These bounds, based on complexity measures like the VC dimension or Rademacher complexity, apply uniformly to all hypotheses within a function family. However, seminal work demonstrated that this approach fails to explain the exceptional generalization capabilities of large-scale deep learning models. These models frequently overfit training data yet achieve excellent generalization performance. This has prompted a reassessment of the conventional understanding of generalization in classical machine learning. Existing generalization bounds in quantum machine learning have largely focused on uniform variants, similar to the classical machine learning canon before the advent of these groundbreaking studies. This raises the question of whether analogous behavior is observed in quantum models, especially considering the relatively smaller scale of current QML models compared to their classical counterparts.

Methodology

The study employs randomization tests from non-parametric statistics to investigate generalization in QML. The experiments focus on a quantum convolutional neural network (QCNN) architecture and a quantum phase classification task. The QCNN, an adaptation of classical convolutional neural networks to the quantum setting, is well-suited for tasks involving spatial patterns. The core of the methodology involves training QNNs on datasets with varying degrees of randomization: (a) random labels replacing true labels; (b) partial label corruption, mixing real and random labels; (c) random quantum states replacing original states. In each case, the model's ability to fit the training data and its generalization performance are evaluated. The generalization gap, the difference between the expected risk (performance on the entire data distribution) and the empirical risk (performance on the training data), is calculated. The experiment utilizes the generalized cluster Hamiltonian to generate quantum states, classifying them based on their symmetry-protected topological phases. The QCNN is trained using a custom loss function that selects the measurement outcome with the lowest probability, a strategy found to lead to good generalization in prior studies. Training employs the CMA-ES (Covariance Matrix Adaptation Evolution Strategy), a derivative-free optimization algorithm. Analytical results complement the empirical findings. The study formally proves the ability of quantum circuits to perfectly fit arbitrary labels to quantum states under specific conditions. This is shown through the concept of finite sample expressivity, which establishes sufficient conditions for quantum circuits to memorize arbitrary data. The study employs a semi-definite programming problem to solve the condition.

Key Findings

The randomization experiments reveal that QNNs can accurately fit random labels and random quantum states. Observation 1 shows that existing QML models can perfectly fit random labels to quantum states. Observation 2 demonstrates that they can accurately fit partially corrupted labels. Observation 3 reveals their ability to fit labels to random quantum states. These observations highlight the memorization capability of QNNs. The generalization gap for random labels is close to the maximum attainable, indicating near-perfect fitting of the random data. For partially corrupted labels, the test error increases steadily as the noise level rises, demonstrating the ability of QNNs to extract signals from noisy data while also memorizing the noise. For random states, the generalization gap shows a similar trend to random labels, indicating that the QNN's memorization ability is unaffected by the absence of local correlations in the input. These findings imply that uniform generalization bounds are loose for current QML models. The analytical results formally prove the finite sample expressivity of quantum circuits and PQCs under certain conditions. Theorem 1 demonstrates that quantum circuits can fit arbitrary labels to a set of quantum states, provided the Gram matrix of their inner products is well-conditioned. Theorem 2 extends this to PQCs, requiring a distinguishability condition on the input states. The distinguishability condition is met when approximate states can be efficiently prepared using PQCs, allowing the construction of a well-conditioned Gram matrix. This analytical work supports the empirical findings of the QNN's ability to memorize random data.

Discussion

The results challenge the prevailing understanding of generalization in QML and suggest that current approaches based on uniform generalization bounds are insufficient. The study does not disprove uniform generalization bounds but demonstrates their vacuous nature for the tested models in the studied regimes. The ability of even relatively small models like the QCNN to memorize random data implies that uniform generalization bounds are likely loose for larger, more expressive models. The findings underscore the need for alternative approaches, potentially focusing on non-uniform measures like training convergence time, minimum sharpness, and data robustness. The study's parallel with previous work on large classical neural networks suggests that the mechanisms underlying successful generalization may differ significantly between classical and quantum machine learning. The fact that these different learning models display similar behavior under randomization further strengthens this point.

Conclusion

The paper demonstrates that current approaches to understanding generalization in quantum machine learning need revisiting. The QNN's ability to memorize random data suggests that uniform generalization bounds provide trivial guarantees. The need for new methods to understand and predict generalization in QML, potentially focusing on factors beyond traditional complexity measures, is highlighted. Future work should explore non-uniform generalization measures and investigate the role of symmetries and equivariance in QML.

Limitations

The numerical experiments are limited to a specific architecture (QCNN) and a specific task (quantum phase classification). While this is a state-of-the-art setting for generalization studies in QML, the results may not generalize to all QML models and tasks. The analytical results rely on the distinguishability condition, a requirement for efficiently preparing approximate quantum states using PQCs. The practical applicability of this condition may vary depending on the specifics of the input quantum states and approximation protocols.