Object 3D Perception via Camera-Radar Cross-Modality Learning for Autonomous Driving

Wang, Yizhou

Object 3D Perception via Camera-Radar Cross-Modality Learning for Autonomous Driving

Files

Wang_washington_0250E_24947.pdf (37.87 MB)

Date

2023-01-21

relationships.isAuthorOf

Wang, Yizhou

Abstract

Autonomous or assisted driving is increasingly feasible through the recent development of machine learning and deep learning community. As the autonomous vehicle is usually a multi-sensor platform, it is crucial to not only exploit the information from the different sensors but also effectively fuse the information to compensate for individual limitations under different driving scenarios. In this dissertation, we focus on two common and cost-efficient sensors on an autonomous vehicle, i.e., camera and radar, and manage to achieve an accurate and robust object perception strategy for autonomous or assisted driving purposes. Specifically, the camera can capture rich semantic information, while radar has reliable ranging and speed estimation capability. Thus, with a camera and radar, we can potentially achieve accurate and reliable object perception results, including 3D object detection and 3D object tracking. On the other hand, however, the camera itself is not a robust sensor under severe conditions, such as weak/strong lighting or bad weather. Whereas radar is relatively more reliable in most harsh environments, even though its capability of semantic understanding of scene contents is quite limited. Therefore, it is critical to propose a system that can also rely purely on the radar for semantic object detection, in the format of radio frequency (RF) images, under scenarios when the camera cannot provide reliable information. To achieve the aforementioned goals, we start by collecting a new camera-radar dataset with various driving scenarios, named CRUW dataset, including camera, radar, and LiDAR sensors. We adopt the RF image as our radar data representation for better radar data exploitation. We make our CRUW dataset public and set up a benchmark, named ROD2021, to help the community further develop the related algorithms. As for the algorithms, we first develop the camera-radar cross-modality supervision algorithms for object 3D perception, including a camera-only (CO) object detection and 3D localization system to perform 3D localization of detected objects in the camera coordinates, and a camera-radar fusion (CRF) framework that takes advantage of the accurate ranging results from the radar to obtain more reliable 3D object detection. After that, to achieve radar-only object detection, we propose a radar object detection network (RODNet) that only takes RF image sequences as the input and estimates object confidence maps (ConfMaps). Moreover, to accomplish radar-based multi-object tracking (RadarMOT), we further propose the RadarMOT framework, which jointly predicts object detection and radar instance features for tracking. After exploiting radar-only object perception tasks, we propose a camera-radar cross-modality check pipeline when we have perception results from both camera and radar sensor modalities. The cross-modality check pipeline can be divided into three stages. First, we conduct detection alignment between the camera and radar through a proposed camera-radar bilateral coordinate projection (BCP). Second, for the aligned detections, we conduct alignment refinement to achieve geometrical consistency. Third, for unaligned detection, we introduce an alignment verification stage by considering temporal continuity. Overall, the contributions of this dissertation can be concluded as follows:- An accurate and robust object perception system via camera-radar cross-modality learning for autonomous or assisted driving applications. - A new dataset, named CRUW, containing synchronized camera-radar frames, is collected and can serve as a valuable dataset for camera-radar cross-modality research. - A novel and robust radar object detection network, called \textit{RODNet}, for robust object detection in severe driving scenarios, which can be used for adverse autonomous or assisted driving without camera or LiDAR information. - A radar multi-object tracking (RadarMOT) method that can reliably track objects with deep features from the RF images. - A reliable camera-radar cross-modality check pipeline that can accurately detect objects when both camera and radar are present, considering geometrical consistency and temporal continuity. In conclusion, this dissertation is aimed to explore a ``camera+radar'' solution for object 3D perception in autonomous driving applications. The overall system has great robustness and is also accurate and cost-efficient compared with camera or LiDAR based solutions. Potentially, our solution can become a cheap and portable autonomous or assisted driving solution in the future.