Result: 3D object detection based on frustum-fusion for embedded systems.

Title:
3D object detection based on frustum-fusion for embedded systems.
Authors:
Huang, Bin1 (AUTHOR) huangb@whut.edu.cn, Wang, Gaopeng1 (AUTHOR) 300081@whut.edu.cn, Wei, Xiaoxu1 (AUTHOR) wxx2014@whut.edu.cn, Wang, Ziting1 (AUTHOR) 313663@whut.edu.cn
Source:
Machine Vision & Applications. Mar2026, Vol. 37 Issue 2, p1-20. 20p.
Database:
Academic Search Index

Further Information

In complex dynamic environments, the autonomous navigation of unmanned ground vehicles (UGVs) relies heavily on real-time perception executed on embedded platforms. However, existing multimodal perception methods that fuse camera and LiDAR data often face limited compute budgets and spatiotemporal asynchrony between sensors, making it difficult to balance real-time performance and perception accuracy. To address these challenges, we propose a multimodal fusion perception algorithm optimized for embedded platforms. To cope with constrained resources, we design a lightweight co-processing framework. On the 2D side, the algorithm integrates Partial Convolution (PConv) and a Convolutional Gated Linear Unit (CGLUSE) into the YOLOv8 architecture, improving detection accuracy while significantly accelerating inference and producing high-quality instance segmentation masks. These masks are further lifted into 3D to construct frustum-constrained regions. For 3D perception, we introduce a two-stage filtering mechanism: LiDAR point clouds are mapped into the camera coordinate system, and RGB semantics are used to prune redundant points, fundamentally reducing the computational burden of subsequent 3D processing. To resolve the spatiotemporal asynchrony between camera and LiDAR, we develop a GPS–PPS–based hardware frequency-division synchronization scheme that enables high-precision time alignment. In addition, we adopt a 3D object detection framework that fuses a CenterPoint detection head with a Dynamic Sparse Voxel Transformer (DSVT) and an improved EIoU loss, further enhancing accuracy and robustness on point clouds. Experiments show that the proposed algorithm achieves an end-to-end latency below 73 ms on a Jetson Orin embedded platform, while keeping CPU and GPU utilization under 36% and 52%, respectively. These results demonstrate an efficient solution for real-time perception and autonomous navigation of UGVs in dynamic scenarios. [ABSTRACT FROM AUTHOR]