Engineering and Technology
Smart insect-computer hybrid robots empowered with enhanced obstacle avoidance capabilities using onboard monocular camera
R. Li, Q. Lin, et al.
This paper introduces a groundbreaking navigation algorithm equipped with an integrated obstacle avoidance module for insect-computer hybrid robots, utilizing a monocular camera. By harnessing a deep learning-based monocular depth estimation algorithm, the system has drastically improved navigation success rates from a meager 6.7% to an impressive 73.3%. This innovative research was conducted by Rui Li, Qifeng Lin, Phuoc Thanh Tran-Ngoc, Duc Long Le, and Hirotaka Sato.
~3 min • Beginner • English
Introduction
The study addresses the challenge of enabling insect-computer hybrid robots to navigate environments cluttered with complex obstacles. Although insects can use antennae to sense and avoid obstacles, this sensing range is short, often occurring only upon contact, and can be disrupted by electrical stimulation used for navigation control. Such conflicts can trap the robot near obstacles and hinder progress. To improve adaptability to rugged terrains, additional perception is required, but payload constraints preclude heavier sensors like LiDAR. Monocular RGB cameras fit hybrid robots due to small size and low power, yet existing monocular depth estimation models trained on datasets such as KITTI and Cityscapes do not generalize to the unique low-height, insect-scale viewpoint. Further, converting depth maps into reliable obstacle avoidance commands for hybrid robots remains unresolved. This work proposes a monocular-camera-based navigation algorithm integrating obstacle avoidance, supported by a new dataset captured from a small robot’s viewpoint and a lightweight depth-to-command method, aiming to anticipate obstacles and avoid risk zones before entrapment.
Literature Review
Prior studies have developed electrical stimulation protocols to control insect locomotion, inducing behaviors such as jumping in locusts, flight initiation and steering in beetles and bees, and forward and turning motions in cockroaches via cerci and antenna stimulation. Early navigation algorithms based on these protocols remain limited, particularly in complex terrain. Insects’ antennae can guide obstacle avoidance, but range and reliability degrade during controlled navigation. Camera-equipped cyborg insects have been demonstrated for guidance tasks, yet onboard visual processing for obstacle avoidance has been underexplored. Monocular depth estimation has advanced with deep learning, including supervised and unsupervised/self-supervised methods; the latter eases data collection by removing the need for ground-truth depth. Many works show monocular depth aiding obstacle avoidance in drones and vehicles, but generalization to insect-scale perspectives is poor due to mismatched training data (e.g., KITTI, Cityscapes). This motivates collecting insect-view datasets and devising depth-to-command strategies suitable for biologically driven, collision-tolerant platforms.
Methodology
Insect platform: Madagascar hissing cockroaches (5.7±0.6 cm) were used for their strong carrying capacity and robust climbing abilities. Animals were kept under controlled lab conditions with food and water. Following established procedures, electrodes were implanted to stimulate cerci for forward motion, unilateral antennae for turning, and an abdominal ground electrode; electrodes were fixed with beeswax. Preparation took about 15 minutes per robot.
Hardware: The microcontroller comprises an ESP32-CAM (OV2640 sensor) for 320×240 RGB imaging and a custom stimulation module. The ESP32-CAM (ESP32-S MCU) supports WiFi image transmission to a workstation. The stimulation module uses an MSP432P4011 controller, CC1352 Bluetooth module, and AD5504 signal generator, providing four channels (0–12 V); navigation used <3 V stimuli. Both modules are powered by a 3.7 V, 180 mAh Li-Po battery. Commands are delivered via Bluetooth from the workstation.
Monocular depth estimation model: An unsupervised/self-supervised framework jointly trains a depth network and a pose network using monocular video sequences. Given a target frame and adjacent reference frame(s), the pose network (ResNet-50 backbone) predicts relative camera motion, and the depth network (as in Godard et al.) predicts per-pixel depth for the target. The target is reconstructed by sampling pixels from the reference image using predicted depth and pose; training minimizes the photometric reconstruction loss, with auto-masking and an added scale consistency loss to maintain scale. Training ran on an NVIDIA GeForce RTX 3090 (24 GB).
SmallRobot Dataset: To address viewpoint mismatch, the authors collected a new dataset from a small robot’s perspective using ESP32-CAM mounted on a hand-movable tray powered by a power bank. Images (320×240) were transmitted via WiFi to a laptop and recorded at 0.01 s intervals.
Obstacle avoidance module: From the predicted depth map, a 40×40 central region is examined; the minimum depth serves as estimated obstacle distance. If below a threshold, obstacle avoidance activates. For command generation, a 40×320 region spanning the image height is processed with width-dependent weights for three candidate commands: Left Turn (higher weights on left, decreasing to right), Right Turn (mirror weighting), and Go Forward (highest in center, decreasing to sides). Weighted sums are computed per command and passed through a SoftMax to select the output command. Left Turn stimulates the right antenna to induce a left rotation; Right Turn stimulates the left antenna; Go Forward stimulates cerci.
Navigation experiment and algorithm: Experiments used a start-to-goal task with an intervening obstacle featuring a three-sided corner prone to trapping. A 3D motion capture system tracked robot position and orientation, providing data via cable to the workstation. Images from the robot were streamed to the workstation over WiFi; navigation and avoidance commands were sent back via BLE to the stimulation module. The navigation controller includes a general navigation module and the obstacle avoidance module. If the robot is not at the destination and the avoidance trigger is off, the general module checks whether the robot’s heading aligns with the goal direction; it issues Go Forward if aligned, otherwise a steering command to realign followed by Go Forward. If the avoidance trigger is on, the obstacle avoidance module’s command takes priority. The study compared performance with the avoidance module disabled versus enabled, across repeated trials using three robots.
Key Findings
- Integrating the obstacle avoidance module into the navigation algorithm dramatically increased the point-to-point navigation success rate from 6.7% to 73.3% (15 trials per condition across three robots).
- Risk zone analysis: Without obstacle avoidance, robots entered the risk zone in up to 93.3% of attempts; with obstacle avoidance, only 40% of attempts entered the risk zone. Of those entering the risk zone, 33.3% escaped with the avoidance module versus 0% without.
- Conflict mitigation: Without avoidance, the general navigation module often forced motion toward the goal, conflicting with obstacle geometry and trapping the robot. With avoidance, the algorithm prioritized evasion, correcting direction early and reducing entrapment.
- Depth estimation generalization: A model trained on the KITTI dataset produced poor depth predictions for insect-perspective images, whereas the model trained on the SmallRobot Dataset yielded high-quality, sharp-edge depth maps suitable for navigation.
- Depth-to-command mapping: The proposed weighted-sum and SoftMax method generated intuitive commands (left, right, forward) consistent with human judgment, effectively steering robots away from obstacles based on depth distribution without explicit object edge detection.
Discussion
The results demonstrate that monocular-camera-based perception combined with a tailored depth-to-command strategy enables insect-computer hybrid robots to anticipate obstacles and avoid entering trapping configurations, directly addressing the core challenge of reliable navigation in cluttered terrains. The substantial improvement in success rate and reduced incidence of risk-zone entry show that early obstacle detection and prioritized avoidance prevent command–obstacle conflicts typical of antenna-only sensing or goal-only navigation. By training an unsupervised depth model on an insect-scale dataset (SmallRobot), the work overcomes generalization issues that hindered use of conventional datasets, enabling effective depth predictions from a low-height viewpoint. The weighted-sum control mapping leverages the cockroach platform’s collision tolerance to simplify computation by focusing on depth distribution trends rather than precise contour extraction, aligning with the constraints of limited onboard resources. Collectively, these findings validate the feasibility of lightweight vision-based obstacle avoidance for biohybrid platforms and extend the operational envelope of cyborg insects in complex environments.
Conclusion
This work presents the first automatic navigation algorithm for insect-computer hybrid robots that integrates monocular-camera-based obstacle avoidance. Contributions include: (1) an obstacle avoidance module that transforms depth maps into robust steering/forward commands via weighted sums and SoftMax, (2) an unsupervised monocular depth estimation pipeline trained on the newly collected SmallRobot Dataset tailored to insect-perspective imagery, and (3) a complete navigation framework that prioritizes avoidance to prevent entrapment. Experiments show a marked increase in navigation success (6.7% to 73.3%) and reduced exposure to risk zones. Future work will miniaturize and integrate the microcontroller with imaging and stimulation, deploy an ultra-lightweight depth model onboard to reduce wireless transmission and energy consumption, and develop mechanical structures to ensure and maintain horizontal camera installation.
Limitations
- Processing is offloaded to a workstation via WiFi due to current microcontroller storage and compute limitations, increasing energy consumption and latency.
- The microcontroller hardware is relatively unwieldy; integration of imaging and stimulation is pending.
- Camera mounting requires careful horizontal alignment; maintaining this in practical deployments is nontrivial.
- Experiments were conducted in controlled environments with a specific obstacle configuration and motion capture system; generalization to diverse outdoor or uninstrumented settings was not evaluated within this study.
Related Publications
Explore these studies to deepen your understanding of the subject.

