Physics

Efficient quantum state tomography with convolutional neural networks

T. Schmale, M. Reh, et al.

This innovative research introduces a quantum state tomography scheme leveraging convolutional neural networks (CNNs) to enhance the accuracy of measurement outcomes. Conducted by Tobias Schmale, Moritz Reh, and Martin Gärttner, this method significantly surpasses traditional techniques in fidelity and reduces estimation errors, making state reconstruction more efficient.

00:00

~3 min • Beginner • English

Index

Introduction

Quantum state tomography (QST) aims to reconstruct an unknown quantum state from measurement data, but faces the curse of dimensionality: both experimental sample requirements and classical post-processing typically scale exponentially with system size. The authors articulate four desirable properties for QST schemes: (i) sub-exponential data requirements; (ii) sub-exponential classical post-processing; (iii) observable universality (ability to estimate arbitrary observables once tomography is done); and (iv) state universality (independence from purity or specific state classes). Existing methods trade off among these goals: MLE is computationally expensive; Bayesian methods ease data needs but can be costly computationally; shadow tomography and entanglement detection can sacrifice observable universality; and variational approaches like MPS tomography, compressed sensing, or PI tomography restrict state classes. Neural-network quantum states (NQS) have emerged as powerful variational ansätze with strong representational capacity, including volume-law entanglement, and have been used in NN-QST. Motivated by results showing CNNs can efficiently encode volume-law entanglement in pure states, this work explores CNN architectures for NN-QST of mixed states using the POVM formalism to learn a state’s outcome probability distribution directly from data. The goals are sub-exponential scaling, applicability to mixed states, and quantitative benchmarking versus standard techniques like MLE across experimentally relevant scenarios to clarify strengths and limitations.

Literature Review

The paper reviews the landscape of QST methods: classical MLE (Lvovsky 2004) scales poorly with system size; Bayesian approaches can mitigate data requirements at computational expense; entanglement detection and shadow tomography sacrifice observable universality; variational methods such as MPS tomography, compressed sensing, and PI tomography restrict the search space to low-entanglement, low-rank, or permutationally invariant states, achieving efficiency at the cost of state universality. Neural-network-based approaches (NQS, RBM, RNNs, autoregressive models, attention/Transformer-based architectures) have demonstrated strong expressivity and success on both synthetic and experimental systems (Rydberg, trapped ions, optics). CNNs have theoretical support for efficiently encoding volume-law entanglement and generalizing tensor-network ideas, motivating their application here. Prior NN-QST with POVMs has been explored (e.g., Carrasquilla et al., RNNs and attention), but experimental applications have been limited to few qubits; a systematic comparison with standard methods like MLE for mixed states and larger systems has been lacking.

Methodology

Overview: The authors propose a NN-QST scheme that learns the probability distribution P(a) over outcomes a of an informationally complete (IC) POVM directly from measurement data, using convolutional neural networks as variational models. This avoids explicit density-matrix parameterizations and leverages standard probabilistic ML techniques on real-valued outputs. POVM formalism: - Represent the quantum state ρ via Born-rule probabilities P(a) = Tr[ρ M_a] of an IC POVM {M_a}. IC implies the {M_a} form a basis for Hermitian operators, so ρ and observables can be reconstructed from P(a). - Inversion: ρ = Σ_a P(a) M_a using the operator overlap matrix T_ab = Tr[M_a M_b] (with index summation implied). Expectation values Tr[ρ O] = Σ_a O_a P(a), with O_a = Tr[M_a O]†. - Consider N-qubit systems with factorized single-site POVMs: M_a = M_{a1}⋯M_{aN}. For local or few-term-Pauli-string observables, O_a can be computed efficiently due to factorization. - Practical IC measurement: random single-qubit Pauli x, y, z measurements yield an overcomplete 6-outcome/qubit POVM. They compress to an IC Pauli-4 POVM by grouping three outcomes, e.g., {M0, M1, M2} = {|↑ along x/y/z}, M3 = I − M0 − M1 − M2}, so each qubit has 4 outcomes and overall outcomes are strings a = (a1,…,aN). Neural network modeling and training: - Define a neural network P_θ(a) approximating the POVM distribution. Training uses maximum likelihood (equivalently, minimizing cross-entropy between empirical outcome frequencies and P_θ), optimized with Adam, which empirically outperforms simpler optimizers. - Note: Enforcing positivity of ρ is not straightforward in the POVM-probability parametrization without incurring exponential costs. The authors proceed without explicit positivity constraints; violations of physical constraints (e.g., Tr[ρ^2] ≤ 1) were observed rarely in practice. CNN architectures: - Standard CNN (1D/2D): Input is a one-hot representation of the N single-site outcomes (shape: batch × N × 4). The network applies L convolutional layers with kernel size K, possibly with periodic (circular) or open boundary conditions matching system symmetries. Outputs an unnormalized scalar proportional to P_NN(a). Two options for the final layer: dense layer or product-output layer; with the latter, normalization requires Monte Carlo estimation per training step by sampling uniform POVM outcomes. - Autoregressive CNN (ARCNN, 1D): Implements P(a) = Π_i P(a_i | a_{<i}). The final layer outputs 4 conditional probabilities per site via softmax. This yields exact normalization and enables exact ancestral sampling without Markov chains (N forward passes per sample). ARCNN is empirically easier to train and more stable. Direct generalization to higher dimensions is nontrivial. Expressivity and correlation range: - For CNN with product-output and for ARCNN, typical maximum captured correlation distance: • d_max^CNN = (K − 1) L • d_max^ARCNN = (K − 1) L + 1 - Intuition: Each convolution layer propagates dependencies by K−1 sites; ARCNN’s conditional structure extends one site further. For systems with long-range correlations, they choose architectures with d_max ≈ system linear size ℓ to ensure coverage while keeping parameter count polynomial. Computational complexity and parameter scaling: - With f = N features per layer and L, K = O(√N), parameter tensor size scales as K·L·f·f = O(N^3). Training epochs scale with the number of parameters, O(N^3). One iteration of their MLE baseline scales as O((4^N)^3) for naïve implementations; faster MLE variants exist, but overall NN-QST shows favorable scaling. Benchmarking protocol and metrics: - Synthetic datasets: Compute target ρ (exactly or via Monte Carlo wave function for open systems), obtain the POVM distribution, and draw N_s samples (e.g., 1k–100k for 16 qubits). Train networks to obtain P_NN(a). Where feasible (small N), perform MLE to obtain ρ̂_MLE and P_MLE(a). - Metrics: • Classical infidelity D = 1 − Σ_a √(P_est(a) P_truth(a)). Compare D_NN vs D_MLE via the ratio D_NN/D_MLE (<1 favors NN). • For larger systems where MLE is infeasible: Root-mean-square (RMS) error of observables, RMS = √⟨(O_est − O_truth)^2⟩, comparing estimates from the raw dataset vs NN-generated samples; use ratio RMS_NN/RMS_Data (<1 favors NN). Draw large numbers of NN samples (e.g., 500k for N=16) to reduce Monte Carlo noise. Systems studied: - 1D Transverse-Field Ising Model (TFIM) ground states (periodic), H = −∑ σ_i^x − B ∑ σ_i^z, at criticality (J/B = 1). Use standard CNN with translation invariance. - 2D TFIM ground states on 4×4 lattices (periodic). Use standard CNN without enforced symmetries for fairness in later examples. - 16-site long-range interacting ion chain ground states with added 3% dephasing noise (open boundaries); target state Ψ_target ≈ 0.97|Ψ0⟩ + |Ψ1⟩. Use ARCNN. Evaluate higher-order correlators C_n derived from powers of Pauli-Z two-point functions. Compare also to local MLE applied to reduced subsystems, both on raw and NN-enhanced datasets. - 4×4 dissipative TFIM steady states under a Lindblad master equation with spontaneous decay (Monte-Carlo wave function simulations with 1000 trajectories provide the target). Use standard CNN (dense output). Study correlation length ξ across a dissipative phase transition and compare raw sampling vs NN-enhanced estimates, including effects of self-averaging when summing over many correlators. Implementation notes: - Adam optimizer; tanh activations; learning rate 1e−3; features per layer set to N; exponential output; training ≤2000 epochs; MLE ≤100 iterations. Hyperparameters for each figure summarized in the paper (e.g., L and K chosen to achieve desired d_max).

Key Findings

- CNN-based NN-QST achieves sub-exponential classical processing and strong representability with parameter count scaling ~O(N^3), enabling practical tomography beyond small system sizes. - 1D TFIM (critical, small N where MLE feasible): The CNN reduces classical infidelity versus MLE by factors of roughly 2–5 depending on dataset size N_s; the advantage shrinks as N_s increases, consistent with MLE’s asymptotic optimality with infinite data. - 2D TFIM (4×4): NN-generated large sample sets from the learned model significantly reduce the variance of local observable estimates compared to direct evaluation from the raw dataset. RMS_NN/RMS_Data < 1 across coupling regimes, with larger gains for smaller datasets and for states closer to product states (smaller J/|B|). - Noisy long-range ion chain (N=16, 3% dephasing): ARCNN reduces RMS errors of higher-order correlators C_n, enabling reliable estimation up to three orders higher in n than achievable with the raw dataset at fixed N_s. Applying local MLE to the raw data or to NN-enhanced datasets reduces the NN advantage but does not eliminate it; ARCNN remains competitive with substantially lower computational cost. - Dissipative 4×4 TFIM steady states: The NN captures the dissipative phase transition, showing a clear peak in correlation length ξ at |B|/J ≈ 2. For large |B|, the CNN trades reduced variance for a small bias, yielding significantly smaller RMS errors overall compared to raw sampling; the bias diminishes with larger datasets. When observables involve sums over many similar terms (self-averaging), the NN advantage is reduced because the NN’s bias does not benefit from variance reduction via averaging. - Expressivity bounds quantified by d_max = (K−1)L (CNN) or (K−1)L+1 (ARCNN), guiding architecture choices; for the benchmarks, architectures with d_max ≈ linear system size gave good performance without overfitting. - Positivity violations of inferred observables were very rare; in some cases the NN corrected unphysical estimates present in raw data due to reduced statistical error. - Overall, across ground and steady states, CNN/ARCNN NN-QST outperforms MLE (when data-limited) or direct sampling in terms of fidelity or RMS error, particularly for small datasets and local observables, while preserving scalability.

Discussion

The study demonstrates that learning an IC-POVM outcome distribution with CNN architectures provides an effective and scalable route to QST. By operating in probability space, the method interfaces directly with experimental data and leverages mature probabilistic ML training. The benchmarks address the central question of whether NN-QST can provide practical advantages over standard methods: in small systems with limited data, CNNs achieve lower classical infidelity than MLE; in larger systems where MLE is infeasible, NN-enhanced sampling substantially reduces RMS errors of local observables compared to raw datasets. The ARCNN’s exact normalization and sampling make it especially robust and expressive for 1D systems. The results also clarify limitations: advantages decrease with large datasets as MLE becomes optimal; NN bias can emerge, especially when observables aggregate many terms that self-average, reducing the relative benefit; and explicit positivity constraints are absent although violations are rare. Importantly, the method supports mixed states and a broad class of observables (observable universality post-learning), and its performance benefits from architectural advances in ML. These findings underscore NN-QST’s relevance for contemporary quantum simulators, enabling more accurate observable estimation with fewer experimental shots and manageable classical resources.

Conclusion

The paper introduces and validates a CNN-based NN-QST framework that learns IC-POVM outcome distributions to reconstruct quantum states and estimate observables efficiently. With polynomial parameter scaling and favorable training, the approach achieves high-fidelity reconstructions and significantly reduced observable estimation errors versus MLE (in data-limited regimes) and versus direct sampling (in larger systems), across pure and mixed (noisy, dissipative) states. The ARCNN variant offers exact normalization and sampling and excels in 1D settings. The work quantifies expressivity via a correlation range bound and highlights the bias–variance trade-offs that govern performance. Future directions include: integrating or enforcing physical constraints (e.g., positivity) within the POVM-based framework; extending autoregressive ideas to higher-dimensional systems; systematic comparisons to shadow tomography and other advanced estimators; exploring more expressive architectures to expand the representable state class; and applying the method to real experimental datasets to validate generalization across platforms.

Limitations

- Positivity of the reconstructed density matrix is not explicitly enforced in the POVM-probability parametrization; while violations were rare, they can occur. - Network-induced bias can affect certain observables, particularly when aggregating many terms where self-averaging reduces raw-sampling variance; this reduces the NN advantage. - The advantage diminishes with increasing dataset size; in the infinite-data limit, full-parameter MLE becomes optimal. - Standard CNN requires Markov-chain sampling (risking correlated samples); ARCNN avoids this but is presently limited to 1D geometries. - Expressivity limited by d_max = (K−1)L (or +1 for ARCNN); insufficient d_max can miss long-range correlations, while increasing it risks overfitting and higher computational cost. - Training can become unreliable for very strongly correlated states given very small datasets (e.g., large J/|B| regimes), reducing gains. - Benchmarking uses synthetic datasets; real experimental noise/artifacts may introduce additional challenges.