Point-Based Fusion for Multimodal 3D Detection in Autonomous Driving

Xinxin Liu, Bin Ye

Abstract

In the broader field of mechanical technology, and specifically within the domain of self-driving vehicles, cameras and LIDAR are crucial sensor modalities that provide complementary information, offering significant potential for sensor fusion. However, directly merging multi-sensor data through point projection can lead to information loss due to quantization, and managing the differences between data formats from multiple sensors remains a challenge. To address these issues, we propose an new fusion method that leverages continuous convolution, point-pooling, and a learned MLP to achieve superior detection performance. Our approach integrates the segmentation mask with raw LIDAR points instead of using projected points, thereby avoiding quantization loss. We conduct neighbor searches on the points and retrieve corresponding semantic features from images to concatenate image and LIDAR data. Subsequently, we apply continuous convolution, point-pooling, and a learned MLP to obtain the fused output. The pooling and aggregation operations, as extensions of convolution, are specifically designed to handle the disparities in data formats. Our detection network is divided into two stages: in the first stage, preliminary proposals and segmentation features are generated; in the second stage, the fusion result with the segmentation mask is refined to produce the final prediction. Our method aims to achieve precise object detection in 3D environments by enhancing LIDAR point data with semantic features from images, allowing for the flexibility to alternate segmentation sub-algorithms as needed. Extensive experiments on the KITTI dataset demonstrate the effectiveness of our approach, which achieves high precision and robust performance in 3D object detection tasks.

References

[1] X. Wang, K. Li and A. Chehri, "Multi-Sensor Fusion Technology for 3D Object Detection in Autonomous Driving: A Review," IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 2, pp. 1148-1165, 2024. [Online]. Available: IEEE Xplore, http://www.ieee.org. [Accessed May 10, 2024].
[2] D. Feng, A. Harakeh, S. L. Waslander and K. Dietmayer, "A Review and Comparative Study on Probabilistic Object Detection in Autonomous Driving," IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 8, pp. 9961-9980, 2022. [Online]. Available: IEEE Xplore, http://www.ieee.org. [Accessed Jun. 15, 2024].
[3] D. Feng et al., "Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges," IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 3, pp. 1341-1360, 2021. [Online]. Available: IEEE Xplore, http://www.ieee.org. [Accessed Aug. 6, 2024].
[4] R. Q. Charles, H. Su, M. Kaichun and L. J. Guibas, "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation," in Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 21-26 July 2017, Honolulu, HI, USA [Online]. Available: http://www.ieee.org. [Accessed: 13 Jan. 2024].
[5] C. R. Qi, L. Yi, H. Su, et al., "PointNet++: Deep hierarchical feature learning on point sets in a metric space," in Proc. of the 2017 Advances in Neural Information Processing Systems (NeurIPS 2017), 4-9 Dec. 2017, Long Beach, CA, USA [Online]. Available: http://neurips.cc. [Accessed: 26 Jun. 2024].
[6] A. H. Lang, S. Vora, H. Caesar, et al., "PointPillars: Fast encoders for object detection from point clouds," in Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 16-20 June 2019, Long Beach, CA, USA [Online]. Available: http://www.ieee.org. [Accessed: 11 Jan. 2024].
[7] Y. Zhou and O. Tuzel, "VoxelNet: End-to-end learning for point cloud based 3D object detection," in Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 18-22 June 2018, Salt Lake City, UT, USA [Online]. Available: http://www.ieee.org. [Accessed: 7 Jul. 2024].
[8] S. Shi, X. Wang, and H. Li, "PointRCNN: 3D object proposal generation and detection from point cloud," in Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 16-20 June 2019, Long Beach, CA, USA [Online]. Available: http://www.ieee.org. [Accessed: 10 Apr. 2024].
[9] Z. Li, F. Wang, and N. Wang, "LiDAR R-CNN: An efficient and universal 3D object detector," in Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 19-25 June 2021, Nashville, TN, USA [Online]. Available: http://www.ieee.org. [Accessed: 21 Apr. 2024].
[10] T. Yin, X. Zhou, and P. Krahenbuhl, "Center-based 3D object detection and tracking," in Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 19-25 June 2021, Nashville, TN, USA [Online]. Available: http://www.ieee.org. [Accessed: 23 Jun. 2024].
[11] A. Mousavian, D. Anguelov, J. Flynn, et al., "3D bounding box estimation using deep learning and geometry," in Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 21-26 July 2017, Honolulu, HI, USA [Online]. Available: http://www.ieee.org. [Accessed: 4 Feb. 2024].
[12] B. Li, W. Ouyang, L. Sheng, et al., "GS3D: An efficient 3D object detection framework for autonomous driving," in Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 16-20 June 2019, Long Beach, CA, USA [Online]. Available: http://www.ieee.org. [Accessed: 18 Aug. 2024].
[13] X. Chen, K. Kundu, Z. Zhang, H. Ma, S. Fidler, and R. Urtasun, "Monocular 3D object detection for autonomous driving," in Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 27-30 June 2016, Las Vegas, NV, USA [Online]. Available: http://www.ieee.org. [Accessed: 3 Jun. 2024].
[14] Y. Wang, et al., "Pseudo-LiDAR from visual depth estimation: Bridging the gap in 3D object detection for autonomous driving," in Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 16-20 June 2019, Long Beach, CA, USA [Online]. Available: http://www.ieee.org. [Accessed: 22 Feb. 2024].
[15] B. Xu and Z. Chen, "Multi-level fusion based 3D object detection from monocular images," in Proc. of the 2018 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 18-22 June 2018, Salt Lake City, UT, USA [Online]. Available: http://www.ieee.org. [Accessed: 14 Apr. 2024].
[16] J. Ku, M. Mozifian, J. Lee, et al., "Joint 3D proposal generation and object detection from view aggregation," in Proc. of the 2018 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 1-5 Oct. 2018, Madrid, Spain [Online]. Available: http://www.ieee.org. [Accessed: 26 Aug. 2024].
[17] X. Chen, H. Ma, J. Wan, et al., "Multi-view 3D object detection network for autonomous driving," in Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 21-26 July 2017, Honolulu, HI, USA [Online]. Available: http://www.ieee.org. [Accessed: 27 Jan. 2024].
[18] C. R. Qi, W. Liu, C. Wu, et al., "Frustum PointNets for 3D object detection from RGB-D data," in Proc. of the 2018 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 18-22 June 2018, Salt Lake City, UT, USA [Online]. Available: http://www.ieee.org. [Accessed: 9 May 2024].
[19] Z. Wang and K. Jia, "Frustum ConvNet: Sliding frustums to aggregate local point-wise features for amodal 3D object detection," in Proc. of the 2019 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 1-5 Oct. 2019, Macau, China [Online]. Available: http://www.ieee.org. [Accessed: 24 Feb. 2024].
[20] M. Liang, B. Yang, S. Wang, et al., "Deep continuous fusion for multi-sensor 3D object detection," in Proc. of the 2018 European Conf. on Computer Vision (ECCV), 8-14 Sept. 2018, Munich, Germany [Online]. Available: http://www.ieee.org. [Accessed: 19 Jan. 2024].
[21] S. Wang, S. Suo, W. C. Ma, et al., "Deep parametric continuous convolutional neural networks," in Proc. of the 2018 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 18-22 June 2018, Salt Lake City, UT, USA [Online]. Available: http://www.ieee.org. [Accessed: 17 May 2024].
[22] M. Liang, B. Yang, Y. Chen, et al., "Multi-task multi-sensor fusion for 3D object detection," in Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 16-20 June 2019, Long Beach, CA, USA [Online]. Available: http://www.ieee.org. [Accessed: 12 Jul. 2024].
[23] S. Vora, A. H. Lang, B. Helou, et al., "PointPainting: Sequential fusion for 3D object detection," in Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 14-19 June 2020, Seattle, WA, USA [Online]. Available: http://www.ieee.org. [Accessed: 1 Jul. 2024].
[24] L. Xie, C. Xiang, Z. Yu, et al., "PI-RCNN: An efficient multi-sensor 3D object detector with point-based attentive cont-conv fusion module," in Proc. of the 2020 AAAI Conf. on Artificial Intelligence (AAAI), 7-12 Feb. 2020, New York, NY, USA [Online]. Available: http://www.aaai.org. [Accessed: 4 Apr. 2024].
[25] X. Zhou, D. Wang, and P. Krähenbühl, "Objects as points," arXiv preprint, arXiv:1904.07850, 2019. [Online]. Available: https://arxiv.org. [Accessed: 28 Jun. 2024].
[26] S. Ren, K. He, R. Girshick, and others, "Faster R-CNN: Towards real-time object detection with region proposal networks," Advances in Neural Information Processing Systems, vol. 28, 2015. [Online]. Available: https://proceedings.neurips.cc. [Accessed: 16 Jul. 2024].
[27] K. He, G. Gkioxari, P. Dollár, and others, "Mask R-CNN," in Proc. of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2961-2969. [Online]. Available: https://ieeexplore.ieee.org. [Accessed: 17 May 2024].
[28] T.-Y. Lin, P. Goyal, R. Girshick, and others, "Focal Loss for Dense Object Detection," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PP, no. 99, pp. 2999-3007, 2017. [Online]. Available: https://ieeexplore.ieee.org. [Accessed: 17 Jul. 2024].
[29] A. Geiger, P. Lenz, and R. Urtasun, "Are we ready for autonomous driving? the KITTI vision benchmark suite," in Proc. of the 2012 IEEE Conf. on Computer Vision and Pattern Recognition, 16-21 June 2012, Providence, RI, USA [Online]. Available: IEEE Xplore, http://www.ieee.org. [Accessed: 10 Sept. 2023].
[30] Xu D, Anguelov D, and Jain A, "Pointfusion: Deep sensor fusion for 3D bounding box estimation," in Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 18-22 June 2018, Salt Lake City, UT, USA, pp. 244-253. [Online]. Available: http://www.ieee.org. [Accessed: 21 Apr. 2024]

Authors

Xinxin Liu
Bin Ye
yebin@cumt.edu.cn (Primary Contact)
Liu, X., & Ye, B. (2024). Point-Based Fusion for Multimodal 3D Detection in Autonomous Driving. International Journal of Advanced Science and Computer Applications, 3(2). https://doi.org/10.47679/ijasca.v3i2.101

Article Details