Engineering and Technology

Learning inverse kinematics using neural computational primitives on neuromorphic hardware

J. Zhao, M. Monforte, et al.

This research showcases a groundbreaking online motor control system powered by a hardware spiking neural network (SNN). Conducted by Jingyue Zhao, Marco Monforte, Giacomo Indiveri, Chiara Bartolozzi, and Elisa Donati, the SNN achieves an impressive 97.93% accuracy in learning the inverse kinematics of a robotic arm, paving the way for neuromorphic computing in real-world applications.... show more

Introduction

The study addresses how to achieve robust, low-latency, and low-power inverse kinematics control on neuromorphic hardware despite variability, limited precision, and constraints of mixed-signal spiking processors. Within the broader context of neuromorphic engineering for embedded autonomous systems, motor control lags behind sensing, perception, and decision modules. The research question is whether brain-inspired computational primitives—triplet-STDP learning, basal ganglia-like disinhibition, and cooperative-competitive (WTA) circuits—can be leveraged to learn and execute inverse kinematics on hardware to coordinate multiple joints for target reaching and trajectory tracking. The purpose is to develop a hardware-deployable SNN controller that learns the mapping from Cartesian end-effector positions to joint configurations and executes online control with minimal latency and power, moving toward end-to-end neuromorphic robotic platforms.

Literature Review

Classical inverse kinematics methods rely on analytical solutions (often unavailable for high-DoF systems) or numerical solvers (e.g., Jacobian-based, IpOpt), which can be computationally intensive, require calibration, and lack adaptability. Learning-based approaches provide model-free adaptation to plant non-idealities. Prior SNN work includes STDP-based directional control for end-effector motion, spiking reinforcement learning to map muscle lengths to activation for fixed 2D targets, and hierarchical motor action composition; however, many remain in simulation with floating-point parameters not directly transferable to constrained hardware. NEF-based implementations on Loihi and Neurogrid use neuron ensembles with error-driven learning and have demonstrated online inverse kinematics and force control, but often offload non-spiking processing, require large parameter counts and high firing rates, and exhibit long inference times (hundreds of ms to seconds). Recent comparisons between online NEF learning and offline SGD training highlight faster convergence for online methods. A gap remains in deploying multi-joint high-level controllers on neuromorphic hardware with ultra-low latency and power, using biologically inspired primitives and hardware-compliant parameterizations.

Methodology

System overview: A mixed-signal neuromorphic processor (DYNAP-SE1) runs a spiking neural network (SNN) to learn and execute inverse kinematics for a 2-DoF iCub robot arm (shoulder pitch θ1 and elbow θ2) in simulation (iCubSim). A Spartan-6 FPGA provides event-driven interfacing: encoding CPU-provided rates to Poisson spikes and routing spikes between CPU/FPGA and DYNAP-SE1; decoded outputs are sent to iCub low-level controllers. Data generation and discretization: Training data are generated via motor babbling, uniformly sampling the joint space to produce end-effector Cartesian positions. Due to non-uniform end-effector distribution, the Cartesian space is normalized and rotated using PCA to maximize variance, then discretized along each axis with N-quantiles (N=8), yielding non-uniform partitions that reduce discretization error. Input positions (x,y) and outputs (θ1,θ2) are encoded by one-hot population codes. Network architecture: Input populations x and y (N neurons each) project to a hidden Cartesian layer (hiddenCartesian) with N^2 neurons. To suppress noisy co-activation from multiple excitatory inputs, a basal ganglia-inspired disinhibition mechanism is used: a y gate layer tonically inhibits columns of hiddenCartesian; active y neurons inhibit their corresponding y gate neuron, disinhibiting the target column so only the neuron at the intersection of the x-driven row and the disinhibited column fires. A hidden joint layer (hiddenJoint, N^2 neurons) encodes joint configurations. During training, θ1 and θ2 populations project to hiddenJoint via excitatory and disinhibitory (θ2 gate) pathways to enforce selective postsynaptic activity matching the teaching signal. During inference, hiddenCartesian drives hiddenJoint through learned excitatory connections. A soft WTA is implemented in hiddenJoint using a global inhibitory population (exc:inh ≈ 4:1) to select a single configuration among multiple solutions. Output θ1 and θ2 populations decode the hiddenJoint winner to joint angles via one-hot decoding. Total neurons: 176 (training) and 184 (testing with inhibitory population). Learning: Triplet-STDP (minimal model) governs synaptic plasticity from hiddenCartesian to hiddenJoint. Three exponentially decaying traces (one pre r1, two post o1,o2) drive weight updates: pre-spike triggers LTD proportional to o1 and current weight; post-spike triggers LTP proportional to r1·o2 and distance to Wmax. Learning is implemented with computer-in-the-loop: pre/post spikes are streamed to the CPU, traces are computed, and floating-point weights are updated online. After each 400 ms sample, weights are discretized to binary to meet hardware constraints via: (i) thresholding by W_thr to filter weak/noisy potentiation, (ii) binarization to 0/1 connectivity, (iii) fusion to preserve prior learned connections across samples. Samples are presented in random order (N^2=64 samples), each followed by a 400 ms cooling interval, totaling ~51.2 s training time. Only connections exceeding W_thr are applied on-chip. This discretization acts as pruning, yielding a sparse, hardware-compliant connectivity matrix. Control and decoding: At run-time, for each target (x,y), x and y populations are stimulated (y slightly earlier to open the gate), hiddenCartesian activates the target neuron, learned connections drive hiddenJoint, WTA selects a single joint configuration, and θ1/θ2 outputs are decoded continuously into joint commands. Commands are applied if the Euclidean distance between current and commanded joint vectors exceeds a 0.5° threshold to avoid redundant actuation. Layer-wise convergence latencies were measured during target transitions: input→hiddenCartesian ~13 ms; hiddenCartesian→hiddenJoint ~16 ms; hiddenJoint→output ~3 ms. Power estimation: On-chip power is estimated from event counts using an operation-wise energy model P = Σ_n r_n (E_spike + E_enc + N_cores(E_br + E_rt) + N_cam-match E_pulse), with energy constants from DYNAP-SE1 characterization. Layer-wise and phase-wise power are computed; hiddenJoint_inh contributes significantly during inference due to many postsynaptic targets. Baselines and comparisons: A classical nonlinear optimizer (IpOpt) within the iCub Cartesian control module is used as a baseline. Inference-time comparisons are also made to SNN inverse kinematics implementations on Loihi reported in prior work (considering network latency/convergence and firing rate as a proxy for power).

Key Findings

Functional performance: The trained solver achieves 97.93% control accuracy on a 12-point continuous target-reaching trajectory, with average on-chip network latency of 33.96 ms and end-to-end system latency of 102.1 ms. Estimated on-chip power during inference is 26.92 μW.
Transient dynamics: During target transitions, layer-wise latencies are ~13 ms (input→hiddenCartesian), ~16 ms (hiddenCartesian→hiddenJoint), and ~3 ms (hiddenJoint→output), totaling ~32 ms; overall command update latency is ~102 ms due to system interfacing.
Disinhibition efficacy: Basal ganglia-inspired disinhibition yields selective, stable firing patterns in hiddenCartesian and hiddenJoint during training and inference, enabling robust triplet-STDP learning of a sparse, structured weight matrix. Replacing disinhibition with direct excitation produces chaotic activity, noisy weights, and failed learning.
Speed-accuracy trade-off: Stronger hiddenCartesian→hiddenJoint synapses reduce network latency (down to 14.44 ms average) but can reduce accuracy due to WTA instability and multiple winners; weaker synapses increase accuracy stability but slow transitions. A balanced weight setting gives 97.93% accuracy at 33.96 ms average network latency.
Baseline comparisons: IpOpt function latency averages 114.57 ms (range 99–156 ms) on laptop CPU. Under the same discretized target grid, IpOpt fails to reach 4 closely spaced target positions (samples 2–5), while the SNN reaches all targets. Compared to Loihi-based SNN solvers (best-case 400 ms convergence with ~200k parameters; others 2.6–3.8 s), the proposed system is faster. The proposed SNN exhibits low firing rates (~1.4 Hz global: 184 neurons, ~16% active at ~52 Hz), suggesting better energy efficiency than kHz-rate Loihi implementations.
Power details: Inference power dominated by the hiddenJoint inhibitory population (>62% of total) due to its broad fan-out. Training phase average power is lower (3.46 μW) due to cooling intervals and no inhibitory WTA activity.
Training efficiency: Full supervised training with 64 samples takes ~51.2 s. Weight discretization and fusion improve robustness; fusion can raise training-set connectivity accuracy to ~99.69%.

Discussion

The results demonstrate that biologically inspired computational primitives—disinhibition, triplet-STDP, and cooperative-competitive circuits—enable a neuromorphic SNN to learn and execute inverse kinematics for multi-joint coordination directly on mixed-signal hardware. The approach achieves robust online control with low latency and ultralow estimated power despite hardware variability and limited precision. Layer-wise analysis shows rapid convergence consistent with real-time demands. Compared to a classical optimizer constrained to the same discretization, the SNN attains higher task completion with comparable or lower latency, and it outperforms previously reported neuromorphic inverse kinematics solvers in speed while operating at much lower firing rates, indicating superior energy efficiency. The findings validate the hypothesis that brain-inspired primitives can resolve multi-solution mapping via WTA and disinhibition, yielding sparse, transferable connectivity that maps Cartesian targets to joint configurations reliably on hardware. This advances the state of end-to-end neuromorphic control and highlights practical pathways to deploy adaptive, low-power SNN controllers in robotics.

Conclusion

This work presents a hardware-deployed SNN inverse kinematics solver on DYNAP-SE1 that uses basal ganglia-like disinhibition, triplet-STDP, and WTA mechanisms to coordinate two joints of the iCub arm in continuous target reaching. The controller achieves 97.93% accuracy, ~34 ms network latency, ~102 ms system latency, and ~26.9 μW estimated inference power, while training completes in ~51 s for 64 samples. Compared to a classical solver (IpOpt) and prior neuromorphic SNN implementations, the approach provides faster convergence, full target coverage under the same discretization, and lower firing activity indicative of better energy efficiency. Future directions include scaling to higher DoF and 3D spaces with more neurons and populations; improving encoding resolution and topology flexibility; moving toward fully spiking end-to-end pipelines with event sensors and spiking low-level controllers; on-chip learning with memristive synapses; modular toolchains and faster I/O interfaces; and adaptive mechanisms to tune the latency/accuracy/power trade-off on the fly.

Limitations

Task scope: Demonstrated on a simplified 2-DoF planar reaching problem with discretized spaces; accuracy is bounded by encoding resolution and discretization granularity.
System latency: Overall latency includes significant interface overhead (data conversion and transfer), yielding 100–170 ms system delays beyond intrinsic SNN convergence (14.44–170.37 ms depending on weights and recurrence).
Power measurement: On-chip power is estimated from event counts and operation energy models; full-system power (including FPGA/CPU) not measured and dominates in the prototype setup.
Comparability: Lack of standardized benchmarks, metrics, and hardware-reported power figures in the literature limits direct quantitative comparisons across platforms and tasks.
Hardware constraints: DYNAP-SE1 imposes binary weights and limited connectivity/precision, necessitating computer-in-the-loop training and discretization; multiple winners in output populations can cause errors; iCubSim actuation dynamics (0.3 ms to 1.8 s) can bottleneck control updates.
Scalability challenges: Scaling to higher DoF increases quadratic hidden layer growth; more flexible processors and encoding schemes are needed to mitigate resource and latency costs. Hardware mismatch remains a factor, potentially requiring more neurons and power to maintain robustness.

Related Publications

Explore these studies to deepen your understanding of the subject.

Education

Interpretable early warning recommendations in interactive learning environments: a deep-neural network approach based on learning behavior knowledge graph

X. Xia and W. Qi

Education

Co-designing inclusive excellence in higher education: Students’ and teachers’ perspectives on the ideal online learning environment using the I-TPACK model

L. Saenen, K. Hermans, et al.

Medicine and Health

Identifying schizophrenia stigma on Twitter: a proof of principle model using service user supervised machine learning

S. Jilka, C. M. Odoi, et al.

Psychology

Neural and computational underpinnings of biased confidence in human reinforcement learning

C. Ting, N. Salem-garcia, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny