(Translated by https://www.hiragana.jp/)
Rotation Initialization and Stepwise Refinement for Universal LiDAR Calibration

Rotation Initialization and Stepwise Refinement for Universal LiDAR Calibration

Yifan Duan, Xinran Zhang, Guoliang You, Yilong Wu, Xingchen Li, Yao Li, Xiaomeng Chu, Jie Peng,
Yu Zhang, Jianmin Ji, and Yanyong Zhang* 
* The corresponding author.School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026, China {dyf0202, zxrr, glyou, elonwu, starlet, zkdly, cxmeng, pengjieb}@mail.ustc.edu.cn, {yuzhang, jianmin, yanyongz}@ustc.edu.cn.
Abstract

Autonomous systems often employ multiple LiDARs to leverage the integrated advantages, enhancing perception and robustness. The most critical prerequisite under this setting is the estimating the extrinsic between each LiDAR, i.e., calibration. Despite the exciting progress in multi-LiDAR calibration efforts, a universal, sensor-agnostic calibration method remains elusive. According to the coarse-to-fine framework, we first design a spherical descriptor TERRA for 3-DoF rotation initialization with no prior knowledge. To further optimize, we present JEEP for the joint estimation of extrinsic and pose, integrating geometric and motion information to overcome factors affecting the point cloud registration. Finally, the LiDAR poses optimized by the hierarchical optimization module are input to time synchronization module to produce the ultimate calibration results, including the time offset. To verify the effectiveness, we conduct extensive experiments on eight datasets, where 16 diverse types of LiDARs in total and dozens of calibration tasks are tested. In the challenging tasks, the calibration errors can still be controlled within 5cm5𝑐𝑚5cm5 italic_c italic_m and 1superscript11^{\circ}1 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT with a high success rate.

Index Terms:
Calibration, LiDAR, descriptor, hand-eye, and SLAM.
publicationid: pubid: 0000–0000/00$00.00 © 2021 IEEE

I Introduction

In recent years, the field of 3D vision technology has witnessed remarkable advancements, leading to the widespread adoption of depth sensors such as LiDARs for their ability to obtain accurate depth information [1, 2]. As the demand for enhanced performance continues to grow, there is a notable trend toward equipping autonomous systems, like self-driving vehicles and robotics, with multiple LiDARs. This approach brings numerous benefits, including improved robustness in self-localization and mapping, as well as the ability to prevent system crashes resulting from a single hardware failure [3]. Moreover, the utilization of multiple LiDARs enables the generation of a denser point cloud with a wider field of view (FoV), thereby significantly enhancing the overall perception performance [4]. Furthermore, by carrying various types of LiDARs, the entire perception system can leverage the unique characteristics of different LiDARs to enhance the performance. For example, conventional mechanical LiDARs provide a 360superscript360360^{\circ}360 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT horizontal FoV, ensuring uniform spatial point cloud distribution [5]. Additionally, solid-state LiDARs are becoming increasingly popular in autonomous driving applications, offering denser point clouds within a limited FoV at a relatively lower cost [2, 6, 7].

To effectively utilize the benefits of multiple LiDAR, it is crucial to establish a common coordinate system by accurately transforming the data from their respective individual coordinate systems. This transformation between different LiDAR coordinate systems is parameterized by the extrinsic parameters, and the process of finding the optimal parameters is known as calibration, which plays a pivotal role in ensuring aligned and synchronized data from each sensor.

Refer to caption
Figure 1: The illustration of seven types of LiDAR in the simulated dataset. Mechanical LiDARs exhibit varying densities, solid-state LiDARs have different FOV, and mechanical and solid-state LiDARs have distinctly different point cloud distributions. This diversity presents new challenges for LiDAR calibration. This paper aims to introduce a universal, sensor-agnostic calibration method.

Calibration is a widely researched topic, and we mainly focus on the calibration of LiDARs in this paper.

Existing LiDAR calibration methods typically employ a coarse-to-fine framework to solve for the calibration parameters progressively. Finding an appropriate initial value is a prerequisite for obtaining precise parameters, which can be done by utilizing Hand-Eye method [8] or prior knowledge of specific scenarios [9]. After the initialization step, it’s commonly to perform a calibration refinement process to further improve the accuracy of calibration parameters. Specifically, the fundamental principle of the refinement step is to identify elements that can be correlated between LiDARs, such as similar features in the point cloud [10, 11] or their similar motion trajectories [12, 13]. Therefore, the refinement step is typically modeled as either a point cloud registration problem or a trajectory alignment problem. It is worth noting that the existing methods are usually tailored for specific LiDAR type, i.e., leveraging the prior knowledge of point cloud distribution or designing corresponding odometry module for pose estimation.

Although multi-LiDAR calibration has been extensively researched, it becomes increasingly challenging when it comes to the calibration between LiDARs with different types as shown in Fig. 1. First, the FoV of different LiDARs varies significantly. For instance, the Tele111https://www.livoxtech.com/tele-15 from Livox has only 14.516.2superscript14.5superscript16.214.5^{\circ}*16.2^{\circ}14.5 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ∗ 16.2 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT FoV, whereas the mechanical LiDAR222https://www.robosense.ai/en/rslidar/RS-Helios has a 36030superscript360superscript30360^{\circ}*30^{\circ}360 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ∗ 30 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT FoV. Additionally, due to the different scanning methods of LiDAR, the distribution of point clouds generated by different LiDARs for the same area also differ. Lastly, the overlap between two LiDARs may be limited, as a primary motivation for employing multiple LiDARs is to expand the FoV of the agent. These factors present new challenges to the coarse-to-fine LiDAR calibration framework. The different distribution and density of point clouds make it challenging to develop a universal initialization method that can consistently produce qualified initial values for any two LiDARs. Additionally, the minimal overlap and sparsity of the point cloud contribute to instability in the results during further precise extrinsic parameter refinement, even if the calibration converges.

To effectively address the aforementioned challenges, we propose a new calibration approach for multiple LiDARs of any types. To this end, the issue of parameter initialization needs to be addressed first. According to the experiments in Sec. VII-D2, we highlight the importance of initializing rotation parameters rather than translation parameters, given the paper’s concentration on single-platform (one vehicle or one robot) sensor calibration. Therefore, we first introduce a point cloud descriptor for rotation initialization, namely TERRA, i.e., a Three degree of freedom (DoF) Embedding of point cloud for Robust Rotation Alignment. We carefully design a distance function between multiple TERRAs to overcome the impact of translation on rotation estimation robustly. After initialization, to address poor registration accuracy caused by the sparsity of the point cloud and scene degeneracy, a method called JEEP for the Joint Estimation of Extrinsic and Poses is proposed to obtain optimized, usable extrinsics by integrating geometric and motion information. To further refine these parameters with all sequence data from LiDARs, we implement a straightforward multi-LiDAR SLAM approach and utilize the hierarchical optimization to optimize the pose of each point cloud frame, subsequently enhancing the accuracy of the extrinsics estimation. In this way, we also estimate the offset between the data reception time of the two LiDARs by a proposed time synchronization module.

We conduct experiments on eight datasets, including both simulated and real-world environments with 16 diverse types of LiDARs in total and dozens of LiDAR pairs. Results show our rotation initialization and stepwise refinement methods have strong universality for different calibration tasks. We also conduct comprehensive ablation experiments on the key parameters involved in the methods. For typical calibration tasks for mechanical LiDARs, the calibration error can be controlled within 1cm1𝑐𝑚1cm1 italic_c italic_m and 0.1superscript0.10.1^{\circ}0.1 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT and in tasks involving challenging factors such as partial overlap of point clouds, calibration errors can still be controlled within 5cm5𝑐𝑚5cm5 italic_c italic_m and 1superscript11^{\circ}1 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT with a high success rate.

In summary, our main contributions are as follows:

  • We propose a novel method for universal LiDAR calibration, regardless of the specific type or model of LiDAR.

  • We design a point cloud descriptor namely TERRA, i.e., a Three-DoF Embedding of point cloud for Robust Rotation Alignment, to perform the rotation parameters initialization between any two LiDARs effectively.

  • To estimate the extrinsic parameters accurately, we introduce a module called JEEP, named after the Joint Estimation of Extrinsic and Poses, integrating geometric and motion information to overcome factors affecting the point cloud registration. Following that, we utilize the hierarchical optimization module and time synchronization module to further enhance the accuracy of extrinsics and time offset estimation.

  • To fully verify the effectiveness of our method, we conduct extensive experiments on eight datasets, including both simulated and real-world environments, where 16 diverse types of LiDARs in total and dozens of LiDAR pairs are tested. The code and related dataset will be open sourced on https://github.com/yjsx/ULC.

II Related Work

II-A LiDAR-LiDAR Calibration

In this section, we primarily discuss the existing calibration methods for LiDAR-LiDAR. We also briefly introduce the calibration between other sensors due to the similarities among calibration tasks. We divide these methods into two groups: geometry-based methods and motion-based methods.

II-A1 Geometry-based methods

Multi-LiDAR calibration is essentially a process to estimate the relative positions of two LiDARs, typically addressed through point cloud registration [14]. Point cloud registration is a widely studied problem, including classic methods like point-point ICP [15], NDT [16], point-plane ICP [17], and generalized-ICP [18] etc. However, the accuracy of point cloud registration is influenced by numerous factors, such as the initial transformation, scene degeneracy, and point cloud bias and noise [19, 20, 21]. In calibration tasks where the initial values are missing and point cloud distribution varies, these factors become challenges in the solving process. Leveraging the typical characteristics of road scenes, CROON [9] initializes the extrinsic parameters by aligning the ground plane, followed by using an octree-based refinement method to optimize the LiDARs’ pose. Jiao et al. [10] and Ma et al. [11] carefully select the environment for calibration and introduce an automatic multi-LiDAR extrinsic calibration algorithm using corner planes. Brahayam et al. [22] also focus on street scenes, selecting the ground and commonly seen poles as reference objects. By matching and aligning, they solve the 6-DoF parameters in a stepwise manner. GMMCalib [23] implements a GMM-based joint registration algorithm to enhance the robustness of the calibration process.

In multi-LiDAR calibration tasks, a challenge arises when the overlapping areas between LiDARs are minimal, leading to insufficient constraints for point cloud registration. Nie et al. [24] developed a novel surface normal estimation method for two mechanically tilted placed LiDARs taking account of the uneven distribution of point cloud density and the edge information of planes. LiDAR-Link [25] employs a large-FoV LiDAR as a “LiDAR-Bridge” to connect other small-FoV LiDARs, subsequently introducing the LiDAR-Align method to collect point clouds across multiple scenes and utilize the Generalized-ICP in parallel spaces to obtain the extrinsics. For small-FoV solid-state LiDAR calibration, the challenge intensifies as there may be no overlapping areas between the LiDARs. To address this problem, MLCC [26, 27] generates overlapping areas by constructing a local map of each LiDAR and formulate the multiple LiDAR extrinsic calibration into a LiDAR bundle adjustment (BA) problem. Heng et al. [28] also employs the approach of constructing maps to calibrate LiDARs and millimeter-wave radars.

II-A2 Motion-based methods

Since calibration tasks typically involve sensors rigidly connected, the relative motion between two sensors can be established using the hand-eye method [29, 30]. Das et al. [12][13] estimate the motion trajectories of each sensor separately and recover the initial rotation and translation by aligning these trajectories, subsequently optimizing them using dual quaternion-based hand-eye calibration. Taylor et al. [31] propose a universal method for LiDARs, cameras, GPS, and IMU that sequentially solves for time offsets, rotational offsets, and translational offsets based on motion information. Hand-Eye calibration is also a method used to assign initial values to extrinsic parameters [10, 32, 8], which typically requires specific trajectories and sufficient accuracy.

In this paper, we emphasize and solely focus on the rotation initialization. In the refinement phase, we integrate the advantages of geometry-based and motion-based methods to overcome the challenges affecting point cloud registration. Furthermore, through backend optimization, we enhance the calibration accuracy.

II-B Point Cloud Descriptor

Extracting descriptor from point cloud is a common method for transforming point clouds into feature spaces, with numerous applications such as place recognition and pose estimation. We primarily focus on descriptors generated through projection and categorized them based on the projection method into two types: BEV (Bird’s Eye View) projection-based methods and spherical projection-based methods.

II-B1 BEV Projection Based Methods

Projecting point clouds onto the BEV plane is a common approach in descriptor generation. He et al. propose M2DP [33], which is generated by projecting a 3D point cloud to multiple 2D planes with density signatures and generating the descriptor of the 3D point cloud using the signatures. Scancontext [34] partitions the point cloud into bins on the x-y plane based on both azimuthal and radial directions. This process creates a two-dimensional matrix to facilitate global position recognition by finding the optimal rotation for rotational invariance. Their sequel, ScanContext++ [35], introduced a new spatial division strategy to achieve lateral invariance and utilized a sub-descriptor to speed up loop retrieval, showing promising results in urban environments. More recently, some methods [36, 37] use neural networks to generate BEV features and conduct end-to-end training to achieve place recognition.

II-B2 Spherical Projection Based Methods

PHASER [38] projects both source and target point cloud onto the sphere and performs a spherical Fourier transform to find the rotation that maximizes the spherical correlation. Similarly, Han et al. propose DiLO [39] implementing LiDAR odometry based on the spherical range image (SRI). Shan et al. [40] utilize the traditional BoW algorithm in visual place recognition for LiDAR-based global localization. This approach involves a projection of high-resolution LiDAR point cloud intensity data into a spherical visual format, facilitating the extraction of features based on ORB [41]. SphereVLAD++ [42] processes a query LiDAR scan by converting it into a panorama by spherical projection and encodes local orientation-equivalent features via spherical convolution for descriptor generation. OverlapNet [43] takes spherical projections of 3D LiDAR data as input and estimates the overlap between two range images by calculating all possible differences for each pixel. Bernreiteret et al. [44] employ spherical CNNs [45],  [46],  [47] to learn a multi-modal descriptor from spherical and visual images.

In this paper, we opt to project the point cloud onto the sphere, generating the descriptor TERRA, for robust rotation alignment. Compared to other methods, we employ a novel sampling method, diverging from the conventional angle-based sampling, which endows our descriptor with isotropy. This characteristic is crucial for 3-DoF rotation estimation. What’s more, with the help of a carefully designed distance calculation function, it can robustly estimate the rotation even in the presence of translational disturbances.

II-C Multi-LiDAR SLAM

LiDAR SLAM has seen substantial development in recent years. Inspired by LOAM [1], numerous variants improve it from different perspectives, including enhancing accuracy [48, 49, 50], feature extraction [51, 52], and feature selection [53, 54]. Compared to single-LiDAR SLAM or LiDAR-IMU fusion SLAM, multi-LiDAR SLAM is relatively a nascent field. M-LOAM [8] proposes a complete and robust solution for uncertainty-aware multi-LiDAR extrinsic calibration and SLAM. Lin et al. [3] develop a decentralized framework for simultaneous calibration, localization and mapping with multiple LiDARs to remove the single point of failure problem. Traj-LO [55, 56] regards the data of LiDAR as streaming points continuously captured at high frequency and applies the continuous-time trajectory estimation into multi-LiDAR SLAM. Locus 2.0 [57] employs a novel normals-based G-ICP formulation, an adaptive voxel grid filter and a sliding-window map approach to reduce memory consumption in multi-LiDAR SLAM. It has been successfully deployed on heterogeneous robotic platforms in the DARPA Subterranean Challenge.

In this paper, multi-LiDAR is not the primary issue to be addressed. However, to achieve precise extrinsic estimation, we implement a simple multi-LiDAR SLAM to provide the poses of each frame and optimize them using hierarchical optimization. Subsequently, the final calibration results and the time offsets between the LiDARs are outputted through time synchronization.

III Preliminaries

TABLE I: Nomenclature
Notation Description Type
𝐩𝐩\mathbf{p}bold_p Point 3superscript3\mathbb{R}^{3}blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
𝒮atsuperscriptsubscript𝒮𝑎𝑡{}^{t}\mathcal{S}_{a}start_FLOATSUPERSCRIPT italic_t end_FLOATSUPERSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT Scan from sensor “a𝑎aitalic_a” at time t𝑡titalic_t {3}superscript3\{\mathbb{R}^{3}\}{ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT }
atsuperscriptsubscript𝑎𝑡{}^{t}\mathcal{M}_{a}start_FLOATSUPERSCRIPT italic_t end_FLOATSUPERSCRIPT caligraphic_M start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT Map stacked by scans from “a𝑎aitalic_a” at time t𝑡titalic_t {3}superscript3\{\mathbb{R}^{3}\}{ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT }
𝐏atsuperscriptsubscript𝐏𝑎𝑡{}^{t}\mathbf{P}_{a}start_FLOATSUPERSCRIPT italic_t end_FLOATSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT Pose of sensor “a𝑎aitalic_a” at time t𝑡titalic_t SE(3)𝑆𝐸3SE(3)italic_S italic_E ( 3 )
𝐓at+1tsuperscriptsubscriptsubscript𝐓𝑎𝑡1𝑡{}^{t}_{t+1}\mathbf{T}_{a}start_FLOATSUPERSCRIPT italic_t end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT bold_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT Transformation of “a𝑎aitalic_a” between t𝑡titalic_t and t+1𝑡1t+1italic_t + 1 SE(3)𝑆𝐸3SE(3)italic_S italic_E ( 3 )
𝐂𝐂\mathbf{C}bold_C Extrinsic parameters between “a𝑎aitalic_a” and “b𝑏bitalic_b SE(3)𝑆𝐸3SE(3)italic_S italic_E ( 3 )
𝐂superscript𝐂\mathbf{C}^{*}bold_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT Result of calibration SE(3)𝑆𝐸3SE(3)italic_S italic_E ( 3 )
𝐂superscript𝐂\mathbf{C}^{{}^{\prime}}bold_C start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT Perturbation of calibration, i.e., 𝐂=𝐂𝐂superscript𝐂superscript𝐂𝐂\mathbf{C}^{*}=\mathbf{C}^{{}^{\prime}}\cdot\mathbf{C}bold_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = bold_C start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ⋅ bold_C SE(3)𝑆𝐸3SE(3)italic_S italic_E ( 3 )
𝔽𝔽\mathbb{F}blackboard_F TERRA, descriptor for rotation initializaiton Nsuperscript𝑁\mathbb{R}^{N}blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT

The objective of calibration is to estimate the extrinsic parameters between two independent sensors, which can transform the data from each sensor into the same coordinate system. This is the foundation for many downstream tasks. We refer to one of these sensors as the “base” sensor, denoted by a𝑎aitalic_a, and the other as the “target” sensor, denoted by b𝑏bitalic_b. The related notations are listed in Tab. I, where SE(3)𝑆𝐸3SE(3)italic_S italic_E ( 3 ) denotes a rigid transformation, which means 𝐏,𝐓,𝐂𝐏𝐓𝐂\mathbf{P},\mathbf{T},\mathbf{C}bold_P , bold_T , bold_C are composed of a rotation matrix 𝐑SO(3)𝐑𝑆𝑂3\mathbf{R}\in SO(3)bold_R ∈ italic_S italic_O ( 3 ) and a translation vector 𝐭3𝐭superscript3\mathbf{t}\in\mathbb{R}^{3}bold_t ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT. The relationships between the notations are detailed in the following three sections. In the following, we describe the processes of point cloud registration, LiDAR-based SLAM, and hand-eye calibration with the notations mentioned above.

III-A Point Cloud Registration

The pose transformations between point clouds are typically estimated through point cloud registration. As specified in [58], given two point clouds 𝒫0,𝒫1subscript𝒫0subscript𝒫1\mathcal{P}_{0},\mathcal{P}_{1}caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, the goal of registration is to estimate the transformation matrix 𝐓superscript𝐓\mathbf{T}^{*}bold_T start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT to align 𝒫1subscript𝒫1\mathcal{P}_{1}caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT with 𝒫0subscript𝒫0\mathcal{P}_{0}caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. The Iterative Closest Point (ICP) algorithm [15] is the most common solution to the registration problem, where 𝐓superscript𝐓\mathbf{T}^{*}bold_T start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT can be calculated by

𝐓superscript𝐓\displaystyle\mathbf{T}^{*}bold_T start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT =argmin𝐓d(𝒫0,𝐓(𝒫1))2,absentsubscript𝐓superscriptnorm𝑑subscript𝒫0𝐓subscript𝒫12\displaystyle=\mathop{\arg\min}_{\mathbf{T}}\ \ \|d(\mathcal{P}_{0},\mathbf{T}% (\mathcal{P}_{1}))\|^{2},= start_BIGOP roman_arg roman_min end_BIGOP start_POSTSUBSCRIPT bold_T end_POSTSUBSCRIPT ∥ italic_d ( caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_T ( caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (1)

where d(,)𝑑d(\cdot,\,\cdot)italic_d ( ⋅ , ⋅ ) denotes the loss function to measure the distance between the two inputs and 𝐓(𝒫)𝐓𝒫\mathbf{T}(\mathcal{P})bold_T ( caligraphic_P ) means each point in 𝒫𝒫\mathcal{P}caligraphic_P is translated and rotated to form a new point cloud, i.e., {𝐑𝐩+𝐭|for𝐩in𝒫}conditional-set𝐑𝐩𝐭𝑓𝑜𝑟𝐩𝑖𝑛𝒫\{\mathbf{R}\mathbf{p}+\mathbf{t}|for~{}\mathbf{p}~{}in~{}\mathcal{P}\}{ bold_Rp + bold_t | italic_f italic_o italic_r bold_p italic_i italic_n caligraphic_P }.

In this paper, we utilize a more robust point-to-plane ICP algorithm [17] due to the sparsity of the LiDAR point cloud, which results in the point-to-point ICP lacking sufficient correct correspondences to work properly. For two frame of point cloud 𝒫0,𝒫1subscript𝒫0subscript𝒫1\mathcal{P}_{0},\mathcal{P}_{1}caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, the d(𝒫0,𝐓(𝒫1))𝑑subscript𝒫0𝐓subscript𝒫1d(\mathcal{P}_{0},\mathbf{T}(\mathcal{P}_{1}))italic_d ( caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_T ( caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) of point-to-plane ICP is defined as:

d(𝒫0,𝐓(𝒫1))=p𝒫1(𝐓(𝐩)p0)𝐧0,𝑑subscript𝒫0𝐓subscript𝒫1subscript𝑝subscript𝒫1𝐓𝐩subscript𝑝0subscript𝐧0d(\mathcal{P}_{0},\mathbf{T}(\mathcal{P}_{1}))=\sum_{p\in\mathcal{P}_{1}}(% \mathbf{T}(\mathbf{p})-p_{0})~{}\mathbf{n}_{0},italic_d ( caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_T ( caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) = ∑ start_POSTSUBSCRIPT italic_p ∈ caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_T ( bold_p ) - italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) bold_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , (2)

where p0subscript𝑝0p_{0}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is any point on the corresponding plane and 𝐧asubscript𝐧𝑎\mathbf{n}_{a}bold_n start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT is the unit normal vector of the plane. The optimal pose estimation can be derived by solving the non-linear equation through Gauss-Newton method [59]. The Jacobian matrix can be solved by the chain rule as follows:

𝐉dsubscript𝐉𝑑\displaystyle\mathbf{J}_{d}bold_J start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT =d(𝒫0,𝐓(𝒫1))𝐓𝐩𝐓𝐩δξabsent𝑑subscript𝒫0𝐓subscript𝒫1𝐓𝐩𝐓𝐩𝛿𝜉\displaystyle=\frac{\partial\,d(\mathcal{P}_{0},\mathbf{T}(\mathcal{P}_{1}))}{% \partial\mathbf{T}\mathbf{p}}\frac{\partial\,\mathbf{T}\mathbf{p}}{\partial% \delta\xi}= divide start_ARG ∂ italic_d ( caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_T ( caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) end_ARG start_ARG ∂ bold_Tp end_ARG divide start_ARG ∂ bold_Tp end_ARG start_ARG ∂ italic_δ italic_ξ end_ARG (3)
=𝐧0𝐉p,absentsubscript𝐧0subscript𝐉𝑝\displaystyle=\mathbf{n}_{0}\mathbf{J}_{p},= bold_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT bold_J start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ,

where 𝐉psubscript𝐉𝑝\mathbf{J}_{p}bold_J start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT can be estimated by applying left perturbation model with δξ𝔰𝔢(3)𝛿𝜉𝔰𝔢3\delta\xi\in\mathfrak{se}(3)italic_δ italic_ξ ∈ fraktur_s fraktur_e ( 3 ):

𝐉p=𝐓𝐩δξsubscript𝐉𝑝𝐓𝐩𝛿𝜉\displaystyle\mathbf{J}_{p}=\frac{\partial\,\mathbf{T}\mathbf{p}}{\partial% \delta\xi}bold_J start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = divide start_ARG ∂ bold_Tp end_ARG start_ARG ∂ italic_δ italic_ξ end_ARG =limδξ0exp(δξ)𝐓𝐩𝐓𝐩δξabsentsubscript𝛿𝜉0𝑒𝑥𝑝𝛿𝜉𝐓𝐩𝐓𝐩𝛿𝜉\displaystyle=\lim_{\delta\xi\rightarrow 0}\frac{exp(\delta\xi)\mathbf{T}% \mathbf{p}-\mathbf{T}\mathbf{p}}{\delta\xi}= roman_lim start_POSTSUBSCRIPT italic_δ italic_ξ → 0 end_POSTSUBSCRIPT divide start_ARG italic_e italic_x italic_p ( italic_δ italic_ξ ) bold_Tp - bold_Tp end_ARG start_ARG italic_δ italic_ξ end_ARG (4)
=[𝐈3×3[𝐓𝐩]×𝟎1×3𝟎1×3],absentmatrixsubscript𝐈33subscriptdelimited-[]𝐓𝐩subscript013subscript013\displaystyle=\begin{bmatrix}\mathbf{I}_{3\times 3}&-[\mathbf{T}\mathbf{p}]_{% \times}\\ \mathbf{0}_{1\times 3}&\mathbf{0}_{1\times 3}\end{bmatrix},= [ start_ARG start_ROW start_CELL bold_I start_POSTSUBSCRIPT 3 × 3 end_POSTSUBSCRIPT end_CELL start_CELL - [ bold_Tp ] start_POSTSUBSCRIPT × end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_0 start_POSTSUBSCRIPT 1 × 3 end_POSTSUBSCRIPT end_CELL start_CELL bold_0 start_POSTSUBSCRIPT 1 × 3 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ,

where [𝐓𝐩]×subscriptdelimited-[]𝐓𝐩[\mathbf{T}\mathbf{p}]_{\times}[ bold_Tp ] start_POSTSUBSCRIPT × end_POSTSUBSCRIPT means the skew symmetric matrix of the transformed point. It is important to note that the accuracy of point cloud registration is often influenced by various factors such as the initial transformation, scene degeneracy, and point cloud bias and noise [19, 20, 21]. This paper primarily addresses registration failures caused by the first two factors.

III-B SLAM

In this paper, we also use some concepts commonly used in the SLAM, i.e., simultaneous localization and mapping, which plays a vital role in the navigation of autonomous systems, such as autonomous carscite and robots. The problem that SLAM aims to solve is to estimate the position of a sensor when each frame of data is received. This position is then used to build a map using the sensor data. In specific, for LiDAR-based SLAM, a sequence of point clouds {0𝒮,1𝒮,,t𝒮}\{^{0}\mathcal{S},^{1}\mathcal{S},\dots,^{t}\mathcal{S}\}{ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT caligraphic_S , start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT caligraphic_S , … , start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT caligraphic_S } is received at consecutive time intervals, and the poses 𝐏𝐏\mathbf{P}bold_P at each time is estimated using point cloud registration. The map is then constructed as follow:

t=i[0,t]𝐏i(i𝒮),{}^{t}\mathcal{M}=\bigcup_{i\in[0,t]}~{}^{i}\mathbf{P}(^{i}\mathcal{S}),start_FLOATSUPERSCRIPT italic_t end_FLOATSUPERSCRIPT caligraphic_M = ⋃ start_POSTSUBSCRIPT italic_i ∈ [ 0 , italic_t ] end_POSTSUBSCRIPT start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_P ( start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT caligraphic_S ) , (5)

where the pose is calculated by

𝐏i=i0𝐓=10𝐓𝐓21ii1𝐓.subscriptsuperscript0𝑖superscript𝐏𝑖𝐓subscriptsuperscript01𝐓superscriptsubscript𝐓21subscriptsuperscript𝑖1𝑖𝐓{}^{i}\mathbf{P}=^{0}_{i}\mathbf{T}=^{0}_{1}\mathbf{T}~{}^{1}_{2}\mathbf{T}% \dots^{i-1}_{i}\mathbf{T}.start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_P = start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_T = start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_T start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_T … start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_T .

III-C Hand-Eye Method

Hand-eye calibration is originally a method of calibrating the end-effector of the robot and the camera fixed on it. The position changes 𝐀SE(3)𝐀𝑆𝐸3\mathbf{A}\in SE(3)bold_A ∈ italic_S italic_E ( 3 ) of end-effector are obtained through robot kinematics, meanwhile the corresponding position changes 𝐁SE(3)𝐁𝑆𝐸3\mathbf{B}\in SE(3)bold_B ∈ italic_S italic_E ( 3 ) of the camera are obtained through pattern pose estimation. The transformation 𝐀𝐀\mathbf{A}bold_A and 𝐁𝐁\mathbf{B}bold_B conform to the following equation:

𝐀𝐗=𝐗𝐁,𝐀𝐗𝐗𝐁\mathbf{A~{}X}=\mathbf{X~{}B},bold_A bold_X = bold_X bold_B , (6)

where the transformation between the end-effector and the camera is denoted as 𝐗𝐗\mathbf{X}bold_X.

In our calibration task, since the base sensor and the target sensor are also rigidly connected, their motion conforms to the following equation as well:

tt+1𝐓a𝐂=𝐂𝐓bt+1t.^{t}_{t+1}\mathbf{T}_{a}~{}\mathbf{C}=\mathbf{C}~{}^{t}_{t+1}\mathbf{T}_{b}.start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT bold_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT bold_C = bold_C start_FLOATSUPERSCRIPT italic_t end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT bold_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT . (7)

Please note that in the above equation, the 𝐓at+1t,t+1t𝐓b{}^{t}_{t+1}\mathbf{T}_{a},^{t}_{t+1}\mathbf{T}_{b}start_FLOATSUPERSCRIPT italic_t end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT bold_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT bold_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT and 𝐂𝐂\mathbf{C}bold_C are all unknown variables, which need to be calculated by the registration mentioned before.

IV System Overview

The objective of this study is to calibrate two LiDARs without any prior knowledge of the initial extrinsic parameters, the point cloud distribution, or the field of view. To address this challenge, we design a universal LiDAR calibration method consisting of four main steps as shown in Fig. 2. Firstly, we develop a versatile rotation initialization method based on the proposed TERRA which is detailed in Sec. V. Secondly, building upon the initial rotation, we use JEEP to jointly estimate the extrinsic parameters and poses with consecutive frames, yielding refined extrinsic parameters. The method is elaborated in Sec. VI-A. Finally, we implement a hierarchical optimization that not only reduces the calibration errors to the millimeter-level but also achieves time synchronization of the LiDARs, as described in Sec. VI-B and Sec. 9.

Refer to caption
Figure 2: The pipeline of the universal LiDAR calibration involving four core steps, i.e., TERRA based rotation initialization, JEEP for joint estimation of extrinsic and poses, hierarchical optimization based pose refinement and time synchronization for enhancing the accuracy of extrinsic parameters and estimating the time offset.

V Rotation Initialization

In this section, we propose a novel method for initializing the rotation part of the extrinsic parameter to ensure the convergence of subsequent steps. It is important to note that, when calibrating sensors on a single mobile platform, i.e., a car or a robot, the translation offset typically does not result in a considerable distance between point pairs which should be matched during registration. This is because the distance between sensors is limited by the size of the mobile platform, and we will conduct experiments to demonstrate it in Sec. VII-D2. However, the rotation offset can create a significant distance, potentially leading to registration failure. To address this problem, we simplify the initialization problem by focusing solely on estimating the 3-DoF rotation part. We will begin by presenting the newly proposed TERRA, a Three-DoF Embedding of point cloud for Robust Rotation Alignment. Subsequently, we will explain how TERRA is utilized to assess rotation and introduce a well-designed method to calculate the distance between two TERRAs to enhance the robustness. Finally, a two-step search method is designed to quickly identify the most suitable initial rotation value. For clarity, we employ the term “descriptor” interchangeably with “embedding” in this paper, following common usage.

V-A Design of TERRA

Extracting embeddings or descriptors is a common task, often used in place recognition or pose estimation tasks [60, 43, 61, 62]. However, these tasks often require the rotation invariance of the descriptors, in order to improve the algorithm’s robustness in dealing with diverse data. This requirement conflicts with our need to initialize rotation, which demands accurately reflecting changes in angles. A classic rotation-aware descriptor is scan context [34], which partitions the point cloud into bins on the xy𝑥𝑦x-yitalic_x - italic_y plane according to both azimuthal and radial directions. Each bin is encoded with the maximum height of the segmented point cloud in the bin, i.e., the max value of z𝑧zitalic_z. In the end, a matrix is extracted as the descriptor to represent the point cloud as shown in Fig. 3 and the rows and columns of the matrix represent different azimuths and radii, respectively. Shifting the columns of the original matrix to create the new matrix represents rotating the point cloud around the z𝑧zitalic_z axis, which enables comparing the similarity of descriptors at different angles and thus get the most suitable rotation. However, this method can only search for the optimal angle in 1-DoF, while the 3-DoF estimation is necessary for the task of multi LiDAR calibration.

To solve this problem, inspired by globe in a gimbal, which can rotate on three axes to display the earth, we project the point cloud onto a sphere to generate descriptors instead of projecting onto the xy𝑥𝑦x-yitalic_x - italic_y plane as previously done with the scan context. The sphere, being a 3D structure, intuitively supports 3-DoF rotation, similar to a gimbal. Additionally, we will not lose any information during the projection, that is, a position on the sphere corresponds to one and only one point in the point cloud. This is because the imaging principle of LiDAR dictates that the laser is unable to penetrate obstacles.

The next point of consideration is the selection of the most suitable data structure to represent the sphere. For spherical data, a readily conceivable technique is the application of the Mercator projection [63], which is a type of cartographic mapping method signifying a sphere’s surface into a plane. Upon application of Mercator projection, a 2D matrix can serve as the data structure for TERRA. However, on one hand, this method lacks isotropy, subject to considerable distortion toward the poles, resulting in a misrepresentation of relative size and distance of land masses as seen in world maps. On the other hand, the only reasonable rotation for world maps is around the Earth’s axis. In fact, Gauss’s Theorema Egregium [64] proves that a sphere’s surface cannot be represented on a plane without distortion. Based on the discussion, we can summarize that the optimal data structure should encompass two fundamental characteristics: (1) Isotropy, which implies that the descriptor should remain unbiased regardless of the position of the points when encoding them. (2) The capability to effectively perform rotational operations for comparing similarities under different rotations.

Refer to caption
Figure 3: Instructions for the generation of scan context. By projecting the point cloud onto the xy𝑥𝑦x-yitalic_x - italic_y plane and dividing it into bins based on azimuth and radius, a two-dimensional descriptor is generated.

To meet the above requirements, we propose TERRA, a Three-DoF Embedding of point cloud for Robust Rotation Alignment. We conduct three steps to convert a frame of point cloud into the corresponding TERRA as shown in Fig. 4, i.e., spherical sampling, projection, and populating. Specifically, we employ a uniform spherical sampling method known as the Fibonacci lattice [65]. This method uniformly distributes N𝑁Nitalic_N points across the surface of a sphere with each point approximately covering the same area, which demonstrates the high isotropy of TERRA. The Fibonacci lattice sampling approach is detailed in Alg. 1, which takes an integer N𝑁Nitalic_N as input and returns a spherical point cloud, denoted as \mathcal{FL}caligraphic_F caligraphic_L, consisting of N𝑁Nitalic_N uniformly distributed points over the sphere.

Refer to caption
Figure 4: Instructions for the generation of TERRA. The process is divided into three steps: sampling the sphere using the Fibonacci lattice, projecting the point cloud from LiDAR onto the unit sphere, and populating the values into TERRA based on the correspondences.
Refer to caption
Figure 5: Instructions for rotation initialization. Each LiDAR frame has a TERRA extracted from it. Then the descriptor of one LiDAR is fixed, while rotating the other to find the optimal match. The rotation corresponding to the optimal match is the result of the rotation initialization. Different colors represent different values and the TERRA is a vector of length N𝑁Nitalic_N.

After preparing the point cloud \mathcal{FL}caligraphic_F caligraphic_L, we can encode any given LiDAR frame to TERRA, which is a vector of length N𝑁Nitalic_N. In specific, the LiDAR frame is first normalized, i.e., projecting each point onto a unit sphere. Then, the closest point in \mathcal{FL}caligraphic_F caligraphic_L for each point is efficiently searched by KD-tree [66]. LiDAR points corresponding to the same point in the \mathcal{FL}caligraphic_F caligraphic_L are considered the smallest unit to extract TERRA, which means that the “characteristic” of these points forms one of the N𝑁Nitalic_N values in the TERRA. In this paper, we use the minimum distance from these covered points to the origin as the “characteristic” to populate TERRA, aiming to minimize the effects of occlusion. Finally, a descriptor of length N𝑁Nitalic_N is generated, as illustrated in the Fig. 4.

0:  The number of points: N𝑁Nitalic_N
0:  Sphere Points: \mathcal{FL}caligraphic_F caligraphic_L
1:  ϕ5+12italic-ϕ512\phi\leftarrow\frac{\sqrt{5}+1}{2}italic_ϕ ← divide start_ARG square-root start_ARG 5 end_ARG + 1 end_ARG start_ARG 2 end_ARG (golden ratio)
2:  \mathcal{FL}\leftarrow\emptysetcaligraphic_F caligraphic_L ← ∅ 
3:  for i=0𝑖0i=0italic_i = 0 to N1𝑁1N-1italic_N - 1 do
4:     y12iN1𝑦12𝑖𝑁1y\leftarrow 1-\frac{2i}{N-1}italic_y ← 1 - divide start_ARG 2 italic_i end_ARG start_ARG italic_N - 1 end_ARG
5:     radius1y2𝑟𝑎𝑑𝑖𝑢𝑠1superscript𝑦2radius\leftarrow\sqrt{1-y^{2}}italic_r italic_a italic_d italic_i italic_u italic_s ← square-root start_ARG 1 - italic_y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
6:     θ2πiϕ𝜃2𝜋𝑖italic-ϕ\theta\leftarrow\frac{2\pi i}{\phi}italic_θ ← divide start_ARG 2 italic_π italic_i end_ARG start_ARG italic_ϕ end_ARG
7:     xradius×cos(θ)𝑥𝑟𝑎𝑑𝑖𝑢𝑠𝜃x\leftarrow radius\times\cos(\theta)italic_x ← italic_r italic_a italic_d italic_i italic_u italic_s × roman_cos ( italic_θ )
8:     zradius×sin(θ)𝑧𝑟𝑎𝑑𝑖𝑢𝑠𝜃z\leftarrow radius\times\sin(\theta)italic_z ← italic_r italic_a italic_d italic_i italic_u italic_s × roman_sin ( italic_θ )
9:     Append point (x,y,z)𝑥𝑦𝑧(x,y,z)( italic_x , italic_y , italic_z ) to \mathcal{FL}caligraphic_F caligraphic_L
10:  end for
Algorithm 1 Fibonacci lattice

V-B Rotation of the TERRA

After preparing the TERRA for each LiDAR frame, we need to design a rotation operation to search for the optimal rotation required. For the reason that each value in TERRA corresponds one-to-one with points in \mathcal{FL}caligraphic_F caligraphic_L. Therefore, rotating the point cloud \mathcal{FL}caligraphic_F caligraphic_L is equivalent to rotating TERRA.

Specifically, we rotate the point cloud \mathcal{FL}caligraphic_F caligraphic_L by a certain rotation R, and the resulting rotated point cloud is denoted as superscript\mathcal{FL}^{\prime}caligraphic_F caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Similarly, KD-tree is employed to establish a one-to-one correspondence between point pairs in the rotated point cloud superscript\mathcal{FL}^{\prime}caligraphic_F caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and the original \mathcal{FL}caligraphic_F caligraphic_L. In this case, each pair consists of two points with different indices. By taking the value from the original index of TERRA and filling it into the corresponding point’s index position, we create a rotated TERRA. Essentially, we’re reorganizing the original TERRA values according to the new index alignment post-rotation, thus forming a rotated version of TERRA. Fig. 5 shows the process of \mathcal{FL}caligraphic_F caligraphic_L rotation and rotated TERRA generation.

For a descriptor TERRA of length N𝑁Nitalic_N, the rotation process entails a time complexity of O(N)𝑂𝑁O(N)italic_O ( italic_N ) for rotating the point cloud \mathcal{FL}caligraphic_F caligraphic_L, and O(NlogN)𝑂𝑁𝑙𝑜𝑔𝑁O(NlogN)italic_O ( italic_N italic_l italic_o italic_g italic_N ) for matching points using a KD-tree, followed by an additional O(N)𝑂𝑁O(N)italic_O ( italic_N ) for generating the updated descriptor. Thus, the total time for performing a rotation is O(NlogN)𝑂𝑁𝑙𝑜𝑔𝑁O(NlogN)italic_O ( italic_N italic_l italic_o italic_g italic_N ), offering a substantial efficiency over sequentially rotating a LiDAR frame and then generating a new TERRA. Additionally, since each point independently searches for its match, the entire rotation process can be further accelerated through parallel computing.

V-C Distance between TERRA

We evaluate the distance between TERRAs to identify the most suitable rotation. The small distance between the TERRA indicates that the landmarks projected to sphere are in the correct corresponding relationship, meaning the correct rotation has been identified. An effective distance function typically needs to exhibit two key qualities. On one hand, it should show a monotonically increasing or decreasing trend within a local area, thus enabling the use of gradient descent to find the optimal solution. On the other hand, having fewer local optima in the function’s domain is beneficial as it facilitates the more efficient identification of the global optimum.

When the origins of two LiDARs coincide perfectly, the angles between landmarks projected onto the two unit spheres remain constant, because the positions of the two unit spheres are identical. In this ideal scenario, comparing TERRA to estimate rotation is theoretically correct. However, this assumption is not feasible in reality, as the translation between two LiDARs causes shifts in the projection positions of objects. As illustrated in Fig. 6, we abstract this effect into a task where a big circle (landmarks) colors two small circles (the unit sphere). In Fig. 6(a), the black dot, representing landmarks in the same direction as the translation, do not shift in projection position, whereas the red dot, representing landmarks perpendicular to the translation direction, show the maximum shift.

To address the impact of translation, we draw inspiration from the phenomenon that distant scenery moves slower in the field of view when traveling by train, which indicates that distant landmarks should have smaller shifts in their projection positions. To test this hypothesis, we change the radius of the big circle and quantitatively calculate the coloring differences between the two small circles as shown in Fig. 6(b). The results show that as the radius of the great circle increases, the coloring differences decrease.

Based on the observations, we define a mask 𝐌=[mi]i=1N𝐌superscriptsubscriptdelimited-[]subscript𝑚𝑖𝑖1𝑁\mathbf{M}=\left[m_{i}\right]_{i=1}^{N}bold_M = [ italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT to indicate which positions are suitable for distance calculation for two TERRA 𝔽1subscript𝔽1\mathbb{F}_{1}blackboard_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 𝔽2subscript𝔽2\mathbb{F}_{2}blackboard_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. misubscript𝑚𝑖m_{i}italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is defined as:

mi={1,if (𝔽1i>d1)(𝔽2i>d1)(|𝔽1i𝔽2i|d2)0,otherwise,subscript𝑚𝑖cases1if superscriptsubscript𝔽1𝑖subscript𝑑1superscriptsubscript𝔽2𝑖subscript𝑑1superscriptsubscript𝔽1𝑖superscriptsubscript𝔽2𝑖subscript𝑑20otherwisem_{i}=\begin{cases}1,&\text{if }(\mathbb{F}_{1}^{i}>d_{1})\land(\mathbb{F}_{2}% ^{i}>d_{1})\land(|\mathbb{F}_{1}^{i}-\mathbb{F}_{2}^{i}|\leq d_{2})\\ 0,&\text{otherwise}\end{cases},italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { start_ROW start_CELL 1 , end_CELL start_CELL if ( blackboard_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT > italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∧ ( blackboard_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT > italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∧ ( | blackboard_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT - blackboard_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | ≤ italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise end_CELL end_ROW , (8)

where 𝔽1isuperscriptsubscript𝔽1𝑖\mathbb{F}_{1}^{i}blackboard_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is the ith𝑖𝑡i-thitalic_i - italic_t italic_h position of 𝔽1subscript𝔽1\mathbb{F}_{1}blackboard_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and we use the parameter d1subscript𝑑1d_{1}italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to filter out the distant landmarks from the LiDARs to reduce the interference caused by translation between LiDARs and d1subscript𝑑1d_{1}italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is set to 20m20𝑚20m20 italic_m in the practice. The parameter d2subscript𝑑2d_{2}italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is used to select positions that may belong to the same object for subsequent calculations. d2subscript𝑑2d_{2}italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT can be adjusted according to the actual distance between LiDARs. In our experiments, it is set to 5m5𝑚5m5 italic_m based on the typical size of vehicles.

Therefore, we define the distance calculation function as follows:

D(F1,F2)𝐷subscript𝐹1subscript𝐹2\displaystyle D(F_{1},F_{2})italic_D ( italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) =(𝔽1𝔽2)𝐌T1𝐌1α,absentsubscriptnormsubscript𝔽1subscript𝔽2superscript𝐌𝑇1superscriptsubscriptnorm𝐌1𝛼\displaystyle=\frac{||(\mathbb{F}_{1}-\mathbb{F}_{2})\mathbf{M}^{T}||_{1}}{||% \mathbf{M}||_{1}^{\alpha}},= divide start_ARG | | ( blackboard_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - blackboard_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) bold_M start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG | | bold_M | | start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_ARG , (9)

where α𝛼\alphaitalic_α set to 2 is used to increase the influence of valid number on the distance, preventing scenarios with few correspondences but also minimal distances. For instance, even if there’s only one valid position with a short distance, the distance remains small. The smaller distance indicates greater similarity between the two TERRA, representing that the rotation is more likely to be close to the ground truth.

Refer to caption
(a) The projection positions of different landmarks. The projection of the black dot shows no shift, while the red dot exhibit the maximum shift.
Refer to caption
(b) Comparing the impact of landmarks at different distances. As the radius of the large circle increases, the color differences gradually decrease while the radius of the smaller circle remains constant.
Figure 6: The diagram illustrates the impact of translation between two LiDARs on the projected positions of landmarks. Landmarks are represented by the large circle, while the unit spheres used to generate TERRA are depicted as small circles.

V-D Two-step Search Method

Despite the minor time expenditure for a single rotation, traversing all feasible rotations still demands an unacceptable amount of time, given the vastness of the rotation space. Therefore, we need to devise a search strategy that efficiently identifies the optimal rotation.

In the first step, we start by checking if the initial rotation will result in an overlap between the two point clouds using Eq. (8), that is, whether the length of mask 𝐌𝐌\mathbf{M}bold_M is not zero. If there is no overlap, the rotation space will be traversed with a significantly large step, e.g., 30superscript3030^{\circ}30 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, until a suitable initial rotation that produces an overlap area is found. This step is time-efficient. Starting from this initial rotation, we explore the surrounding rotation space with a large step size, e.g., 10superscript1010^{\circ}10 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, sampling among all rotations that lead to an overlap area and scoring each rotation using Eq. (9). The top ten rotations with the smallest distances will be input into the next stage.

In the second stage, building on the groundwork laid by the first, a heuristic search is employed to find improved solutions in the vicinity of each discrete rotation identified earlier. Specifically, in each area, we explore with a step size of 1superscript11^{\circ}1 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, moving along the path of decreasing distance as calculated in Eq. (9), until a local optimum is reached.

VI Stepwise Refinement with Data Sequences

After obtaining a appropriate initial rotation value, we first integrate hand-eye calibration, using the initial frames of the sequences to obtain optimized extrinsic parameters. Next, to mitigate the impact of scene degeneracy, we employ a hierarchical optimization approach, continuously optimizing the extrinsic parameters during the multi-LiDAR mapping process. Finally, we provide a method for synchronizing the timing between LiDARs.

VI-A Joint Estimation of Extrinsic and Pose

As previously mentioned in Sec. III, we model the calibration problem as a point cloud registration problem. Through scan-to-scan registration [9], we form the most direct calibration method as follows:

𝐂=argmin𝐓d(t𝒮a,𝐓(t𝒮b))2,\mathbf{C}^{*}=\mathop{\arg\min}_{\mathbf{T}}\ \ \|d(^{t}\mathcal{S}_{a},% \mathbf{T}(^{t}\mathcal{S}_{b}))\|^{2},\\ bold_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = start_BIGOP roman_arg roman_min end_BIGOP start_POSTSUBSCRIPT bold_T end_POSTSUBSCRIPT ∥ italic_d ( start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , bold_T ( start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (10)

where each calibration takes the scans 𝒮at and t𝒮bsuperscriptsubscript𝒮𝑎𝑡superscript and 𝑡subscript𝒮𝑏{}^{t}\mathcal{S}_{a}\text{ and }^{t}\mathcal{S}_{b}start_FLOATSUPERSCRIPT italic_t end_FLOATSUPERSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT from two LiDARs at the same time t𝑡titalic_t as input, and outputs the transformation 𝐂superscript𝐂\mathbf{C}^{*}bold_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT between them.

However, since only a single frame of data provides constraints for estimating transformation, the impact of noise and scene degeneracy becomes significant [8]. To mitigate these disadvantages, we propose JEEP for Joint Estimation of Extrinsic and Pose. By utilizing the motion principle of sensors connected by rigid links, we utilize multiple frames from both LiDARs to refine extrinsic parameters, while estimating the pose at different moments.

Refer to caption
Figure 7: Instructions for JEEP. For point cloud pairs input at the same time, their transformation relative to the map, i.e., poses, and the transformation between them, i.e., extrinsic, are iteratively solved until either convergence is achieved or a convergence failure is indicated by the number of iterations.

In detail, JEEP is divided into two phases: pose estimation phase and calibration phase as shown in Fig. 7. During the pose estimation phase, we utilize the equation of hand-eye calibration Eq. (7), combining the extrinsic parameters and the motion of the base LiDAR a𝑎aitalic_a to derive the motion of the target LiDAR b𝑏bitalic_b. This means that we can define the pose estimation problem as:

t𝐏a=\displaystyle^{t}\mathbf{P}^{*}_{a}=start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = argmin𝐓(d(t1a,𝐓(t𝒮a))2+\displaystyle\mathop{\arg\min}_{\mathbf{T}}(\ \ \|d(^{t-1}\mathcal{M}_{a},% \mathbf{T}(^{t}\mathcal{S}_{a}))\|^{2}+start_BIGOP roman_arg roman_min end_BIGOP start_POSTSUBSCRIPT bold_T end_POSTSUBSCRIPT ( ∥ italic_d ( start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT caligraphic_M start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , bold_T ( start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + (11)
d(t1b,(t1𝐂1𝐓𝐂t1)(t𝒮b))2),\displaystyle\|d(^{t-1}\mathcal{M}_{b},(^{t-1}\mathbf{C}^{*-1}~{}\mathbf{T}~{}% ^{t-1}\mathbf{C}^{*})(^{t}\mathcal{S}_{b}))\|^{2}),∥ italic_d ( start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT caligraphic_M start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , ( start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT bold_C start_POSTSUPERSCRIPT ∗ - 1 end_POSTSUPERSCRIPT bold_T start_FLOATSUPERSCRIPT italic_t - 1 end_FLOATSUPERSCRIPT bold_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ( start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ,

where at1 and t1bsuperscriptsubscript𝑎𝑡1superscript and 𝑡1subscript𝑏{}^{t-1}\mathcal{M}_{a}\text{ and }^{t-1}\mathcal{M}_{b}start_FLOATSUPERSCRIPT italic_t - 1 end_FLOATSUPERSCRIPT caligraphic_M start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT caligraphic_M start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT are the maps built according to Eq. (5), and the pose of LiDAR b𝑏bitalic_b 𝐏btsuperscriptsubscriptsuperscript𝐏𝑏𝑡{}^{t}\mathbf{P}^{*}_{b}start_FLOATSUPERSCRIPT italic_t end_FLOATSUPERSCRIPT bold_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT is replaced by 𝐂1t1𝐏at𝐂t1superscriptsuperscript𝐂absent1𝑡1superscriptsubscriptsuperscript𝐏𝑎𝑡superscriptsuperscript𝐂𝑡1{}^{t-1}\mathbf{C}^{*-1}~{}^{t}\mathbf{P}^{*}_{a}~{}^{t-1}\mathbf{C}^{*}start_FLOATSUPERSCRIPT italic_t - 1 end_FLOATSUPERSCRIPT bold_C start_POSTSUPERSCRIPT ∗ - 1 end_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT italic_t end_FLOATSUPERSCRIPT bold_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_FLOATSUPERSCRIPT italic_t - 1 end_FLOATSUPERSCRIPT bold_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. At startup, the map uses the first frame as a substitute, i.e., a0=0𝒮asuperscript0superscriptsubscript𝑎0subscript𝒮𝑎{}^{0}\mathcal{M}_{a}=^{0}\mathcal{S}_{a}start_FLOATSUPERSCRIPT 0 end_FLOATSUPERSCRIPT caligraphic_M start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and b0=0𝒮bsuperscript0superscriptsubscript𝑏0subscript𝒮𝑏{}^{0}\mathcal{M}_{b}=^{0}\mathcal{S}_{b}start_FLOATSUPERSCRIPT 0 end_FLOATSUPERSCRIPT caligraphic_M start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT. The rotation part of C0superscriptsuperscript𝐶0{}^{0}C^{*}start_FLOATSUPERSCRIPT 0 end_FLOATSUPERSCRIPT italic_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is initialized using TERRA, and the translation part is set to (0,0,0)000(0,0,0)( 0 , 0 , 0 ). The Jacobian matrix for the first part is presented in Sec. III, and the second part is as follows:

𝐉psubscript𝐉𝑝\displaystyle\mathbf{J}_{p}bold_J start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT =𝐂1𝐓𝐂𝐩δξ=limδξ0𝐂1exp(δξ)𝐓𝐂𝐩𝐂1𝐓𝐂𝐩δξabsentsuperscript𝐂1𝐓𝐂𝐩𝛿𝜉subscript𝛿𝜉0superscript𝐂1𝑒𝑥𝑝𝛿𝜉𝐓𝐂𝐩superscript𝐂1𝐓𝐂𝐩𝛿𝜉\displaystyle=\frac{\partial\,\mathbf{C}^{-1}\mathbf{T}\mathbf{C}\mathbf{p}}{% \partial\delta\xi}=\lim_{\delta\xi\rightarrow 0}\frac{\mathbf{C}^{-1}exp(% \delta\xi)\mathbf{T}\mathbf{C}\mathbf{p}-\mathbf{C}^{-1}\mathbf{T}\mathbf{C}% \mathbf{p}}{\delta\xi}= divide start_ARG ∂ bold_C start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_TCp end_ARG start_ARG ∂ italic_δ italic_ξ end_ARG = roman_lim start_POSTSUBSCRIPT italic_δ italic_ξ → 0 end_POSTSUBSCRIPT divide start_ARG bold_C start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_e italic_x italic_p ( italic_δ italic_ξ ) bold_TCp - bold_C start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_TCp end_ARG start_ARG italic_δ italic_ξ end_ARG (12)
=[𝐑3×3𝐂1𝟎1×3][𝐈3×3[𝐑𝐓𝐑𝐂𝐩+𝐑𝐓𝐭𝐂+𝐭𝐓]×𝟎1×3𝟎1×3],absentmatrixsubscriptsuperscript𝐑superscript𝐂133subscript013matrixsubscript𝐈33subscriptdelimited-[]superscript𝐑𝐓superscript𝐑𝐂𝐩superscript𝐑𝐓superscript𝐭𝐂superscript𝐭𝐓subscript013subscript013\displaystyle=\begin{bmatrix}\mathbf{R}^{\mathbf{C}^{-1}}_{3\times 3}\\ \mathbf{0}_{1\times 3}\end{bmatrix}\begin{bmatrix}\mathbf{I}_{3\times 3}&-[% \mathbf{R}^{\mathbf{T}}\mathbf{R}^{\mathbf{C}}\mathbf{p}+\mathbf{R}^{\mathbf{T% }}\mathbf{t}^{\mathbf{C}}+\mathbf{t}^{\mathbf{T}}]_{\times}\\ \mathbf{0}_{1\times 3}&\mathbf{0}_{1\times 3}\end{bmatrix},= [ start_ARG start_ROW start_CELL bold_R start_POSTSUPERSCRIPT bold_C start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 3 × 3 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_0 start_POSTSUBSCRIPT 1 × 3 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL bold_I start_POSTSUBSCRIPT 3 × 3 end_POSTSUBSCRIPT end_CELL start_CELL - [ bold_R start_POSTSUPERSCRIPT bold_T end_POSTSUPERSCRIPT bold_R start_POSTSUPERSCRIPT bold_C end_POSTSUPERSCRIPT bold_p + bold_R start_POSTSUPERSCRIPT bold_T end_POSTSUPERSCRIPT bold_t start_POSTSUPERSCRIPT bold_C end_POSTSUPERSCRIPT + bold_t start_POSTSUPERSCRIPT bold_T end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT × end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_0 start_POSTSUBSCRIPT 1 × 3 end_POSTSUBSCRIPT end_CELL start_CELL bold_0 start_POSTSUBSCRIPT 1 × 3 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ,

where 𝐑Csuperscript𝐑𝐶\mathbf{R}^{C}bold_R start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT is the rotation of 𝐂t1superscript𝐂𝑡1{}^{t-1}\mathbf{C}start_FLOATSUPERSCRIPT italic_t - 1 end_FLOATSUPERSCRIPT bold_C, and 𝐑𝐓superscript𝐑𝐓\mathbf{R}^{\mathbf{T}}bold_R start_POSTSUPERSCRIPT bold_T end_POSTSUPERSCRIPT, 𝐭𝐓superscript𝐭𝐓\mathbf{t}^{\mathbf{T}}bold_t start_POSTSUPERSCRIPT bold_T end_POSTSUPERSCRIPT are the rotation and translation of 𝐓𝐓\mathbf{T}bold_T.

In the calibration phase, we employ the following objective function to align two maps:

t𝐂=argmin𝐓d(ta,𝐓(tb))2.^{t}\mathbf{C}^{*}=\mathop{\arg\min}_{\mathbf{T}}\ \ \|d(^{t}\mathcal{M}_{a},% \mathbf{T}(^{t}\mathcal{M}_{b}))\|^{2}.\\ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = start_BIGOP roman_arg roman_min end_BIGOP start_POSTSUBSCRIPT bold_T end_POSTSUBSCRIPT ∥ italic_d ( start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT caligraphic_M start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , bold_T ( start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT caligraphic_M start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (13)

We iteratively execute these two processes until both 𝐂tsuperscriptsuperscript𝐂𝑡{}^{t}\mathbf{C}^{*}start_FLOATSUPERSCRIPT italic_t end_FLOATSUPERSCRIPT bold_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and 𝐏atsuperscriptsubscriptsuperscript𝐏𝑎𝑡{}^{t}\mathbf{P}^{*}_{a}start_FLOATSUPERSCRIPT italic_t end_FLOATSUPERSCRIPT bold_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT stabilize and converge. During this process, when 𝐂tsuperscriptsuperscript𝐂𝑡{}^{t}\mathbf{C}^{*}start_FLOATSUPERSCRIPT italic_t end_FLOATSUPERSCRIPT bold_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is inaccurate, estimating 𝐏atsuperscriptsubscriptsuperscript𝐏𝑎𝑡{}^{t}\mathbf{P}^{*}_{a}start_FLOATSUPERSCRIPT italic_t end_FLOATSUPERSCRIPT bold_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT is difficult to converge due to the incorrect position of 𝒮btsuperscriptsubscript𝒮𝑏𝑡{}^{t}\mathcal{S}_{b}start_FLOATSUPERSCRIPT italic_t end_FLOATSUPERSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT. Additionally, using a local map for the calibration introduces more constraints, making the point cloud registration in Eq. 13 more accurate than in Eq. 10. Exceeding a certain number of iterations will be considered a failure of JEEP.

However, repetitively executing this procedure for every frame yields little benefit and is impractical. On one hand, the iterative computation of each frame cannot meet real-time requirements, and the continuous growth in map size will further intensify this issue. On the other hand, due to the use of a map-to-map registration for calibration as specified in Eq. (13), the cumulative error in pose estimation inevitably leads to map deformation, which negatively impacts the accuracy of the calibration. Therefore, we restrict its execution to the initial phase of the data sequence as illustrated in Fig. 2, where it proves to be cost-effective and highly accurate. After JEEP, we develop a straightforward multi-LiDAR SLAM method using only the pose estimation phase to provide poses for subsequent steps.

VI-B Pose Refinement by Hierarchical Optimization

Refer to caption
Figure 8: Hierarchical optimization for pose refinement. Each LiDAR constructs local maps at different levels and optimizes the pose through local LiDAR BA with in the same window.

Despite using multi-frame data and incorporating motion constraints in JEEP, calibrating in a single environment remains risky due to the impact of scene degeneracy on point cloud registration accuracy [67, 8]. Collecting data across different environments by SLAM to jointly produce extrinsic parameters is advisable. However, errors in motion estimation can lead to incorrect calibration results, as analyzed in the previous section. Therefore, we employ a hierarchical optimization module to optimize the poses generated by multi-LiDAR SLAM, aiming to reduce pose errors and use all data collectively to estimate the extrinsics.

In the field of SLAM, a common method used for backend optimization involves using factor graphs to optimize all variables together, achieving global consistency. However, this approach typically optimizes only at the level of poses, overlooking the constraints introduced by the point cloud itself. Inspired by HBA [68], we employ a similar hierarchical optimization architecture here to achieve more accurate poses for each data frame from both LiDARs. Specifically, as shown in Fig. 8, we first construct a hierarchical structure for each LiDAR’s data by building submaps at various levels. For each LiDAR, scans are grouped according to a window size w𝑤witalic_w. A local LiDAR bundle adjustment (BA) [69] is performed in each local window using the initial pose estimated by the multi-LiDAR SLAM, optimizing the relative pose between each frame and the first frame in this window. Scans within the same window with the poses optimized by BA, form a local submap which serves as a frame in the next layer. This process can be repeatedly performed from the lower layer to the upper layer. Subsequently, the hierarchical structures of the two LiDARs are linked through the registration of different layer submaps. Finally, the poses of all frames and submaps are linked into a factor graph and solved using the Levenberg-Marquardt method with GTSAM [70].

In addition, since the BA process within each submap is independent, it is suitable to use parallel computing to accelerate the process. To achieve that, we maintain a thread pool and a task queue. When each submap needs to be stitched, the task is added to the task queue and the idle thread will retrieve tasks from the task queue.

VI-C Time Synchronization

Refer to caption
Figure 9: The estimated extrinsic is disturbed by motion due to time offset. We identify the optimal extrinsics by determining the time offset.

After JEEP and hierarchical optimization, each LiDAR frame’s pose is as accurate as possible. However, as illustrated in Fig. 9, in the absence of time synchronization for the LiDAR pairs, the pose between two aligned frames from two LiDARs, denoted as 𝐂ri=i𝐏a1𝐏bisuperscript𝑖superscriptsuperscript𝐂𝑟𝑖superscriptsubscript𝐏𝑎1superscriptsubscript𝐏𝑏𝑖{}^{i}\mathbf{C}^{r}=^{i}\mathbf{P}_{a}^{-1}~{}^{i}\mathbf{P}_{b}start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT = start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT here, is not the actual extrinsic parameters, denoted as 𝐂isuperscriptsuperscript𝐂𝑖{}^{i}\mathbf{C}^{*}start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, between the LiDARs. Therefore, we propose a method for time synchronization to estimate the time offset between different LiDARs.

It’s worth noting that we assume a consistent frame rate for the LiDAR, meaning the time offset is a constant value. Moreover, between adjacent frames from the same LiDAR, the motion status remains consistent, allowing us to obtain the position at any given moment through interpolation.

More specifically, we hypothesize the time offset between two LiDARs as δt𝛿𝑡\delta titalic_δ italic_t and denote the interval of LiDAR capturing data as tfsubscript𝑡𝑓t_{f}italic_t start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT. The ratio between δt𝛿𝑡\delta titalic_δ italic_t and tfsubscript𝑡𝑓t_{f}italic_t start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT is denoted as α=δttf𝛼𝛿𝑡subscript𝑡𝑓\alpha=\frac{\delta t}{t_{f}}italic_α = divide start_ARG italic_δ italic_t end_ARG start_ARG italic_t start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_ARG. The pose bias caused by the time offset is denoted as:

𝐓bδi=α𝐏b1i1𝐏bi,superscriptsuperscriptsubscript𝐓𝑏𝛿𝑖𝛼superscriptsuperscriptsubscript𝐏𝑏1𝑖1superscriptsubscript𝐏𝑏𝑖{}^{i}\mathbf{T}_{b}^{\delta}=\alpha~{}^{i-1}\mathbf{P}_{b}^{-1}~{}^{i}\mathbf% {P}_{b},\\ start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT = italic_α start_FLOATSUPERSCRIPT italic_i - 1 end_FLOATSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , (14)

where α𝛼\alphaitalic_α means interpolating the pose. The translation part is linear interpolation under constant velocity assumption, while the rotation part is spherical linear interpolation (Slerp) [71]. Following that, the triangular relationship in the Fig. 9 can be expressed as:

𝐂i𝐓bδi=i𝐂r.superscript𝑖superscriptsuperscript𝐂𝑖superscriptsuperscriptsubscript𝐓𝑏𝛿𝑖superscript𝐂𝑟{}^{i}\mathbf{C}^{*}~{}^{i}\mathbf{T}_{b}^{\delta}=^{i}\mathbf{C}^{r}.start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT = start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT . (15)

Substituting 𝐓bδisuperscriptsuperscriptsubscript𝐓𝑏𝛿𝑖{}^{i}\mathbf{T}_{b}^{\delta}start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT and 𝐂irsubscriptsuperscript𝐂𝑟𝑖\mathbf{C}^{r}_{i}bold_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT into Eq. 15, we get

𝐂i=i𝐏a1𝐏bi(α𝐏b1i1𝐏bi)1,superscript𝑖superscriptsuperscript𝐂𝑖superscriptsubscript𝐏𝑎1superscriptsubscript𝐏𝑏𝑖superscript𝛼superscriptsuperscriptsubscript𝐏𝑏1𝑖1superscriptsubscript𝐏𝑏𝑖1{}^{i}\mathbf{C}^{*}=^{i}\mathbf{P}_{a}^{-1}~{}^{i}\mathbf{P}_{b}(\alpha~{}^{i% -1}\mathbf{P}_{b}^{-1}~{}^{i}\mathbf{P}_{b})^{-1},start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_α start_FLOATSUPERSCRIPT italic_i - 1 end_FLOATSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , (16)

where 𝐂isuperscriptsuperscript𝐂𝑖{}^{i}\mathbf{C}^{*}start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is a function of α𝛼\alphaitalic_α. We model the process of finding 𝐂superscript𝐂\mathbf{C}^{*}bold_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and α𝛼\alphaitalic_α as minimizing the variance of 𝐂superscript𝐂\mathbf{C}^{*}bold_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, as 𝐂superscript𝐂\mathbf{C}^{*}bold_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is considered constant over short periods once the time offset interference is excluded. The calculation process is as follows:

αsuperscript𝛼\displaystyle\alpha^{*}italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT =argminαVar(|α),absentsubscript𝛼𝑉𝑎𝑟conditionalsuperscript𝛼\displaystyle=\mathop{\arg\min}_{\alpha}Var(\mathbb{C}^{*}|\alpha),= start_BIGOP roman_arg roman_min end_BIGOP start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_V italic_a italic_r ( blackboard_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | italic_α ) , (17)
𝐂superscript𝐂\displaystyle\mathbf{C}^{*}bold_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT =Mean(|α),absent𝑀𝑒𝑎𝑛conditionalsuperscriptsuperscript𝛼\displaystyle=Mean(\mathbb{C}^{*}|\alpha^{*}),= italic_M italic_e italic_a italic_n ( blackboard_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) , (18)

where ={1𝐂,2𝐂,,i𝐂}\mathbb{C}^{*}=\{^{1}\mathbf{C}^{*},^{2}\mathbf{C}^{*},\cdots,^{i}\mathbf{C}^{% *}\}blackboard_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = { start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT bold_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , ⋯ , start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT }, and the 𝐂superscript𝐂\mathbf{C}^{*}bold_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is the final output of the whole algorithm.

VII Experiments

To fully verify the effectiveness of our method, we conduct comprehensive experiments using both simulated and real-world datasets to evaluate the performance of the proposed methods compared to existing approaches. We first introduce the datasets used in our experiments in Sec. VII-A. In Sec. VII-B, we describe the quantitative evaluation metrics used in the subsequent experiments. In Sec. VII-C, the comparison between TERRA and five common methods demonstrates that our method provides significant robustness for diverse LiDAR pairs. Quantitative experiments are conducted on both simulated and real-world datasets, along with essential ablation studies. In Sec. VII-D, we test our extrinsic refinement methods JEEP, the hierarchical optimization module and the time synchronization module against two open-source calibration methods as well as the direct scan-to-scan method. Since the last two modules need to work together, we abbreviate them as HOTS. Please note that all registration processes are implemented using the widely adopted point-to-plane ICP method [17]. The parameters and their values used in our method are as shown in the Tab. II. Except for the ablation studies, all experiments are conducted according to this setup.

TABLE II: Parameters and Their Values Used in Our Method
Module Description Value
TERRA N𝑁Nitalic_N, sphere sampling number 10000
d1subscript𝑑1d_{1}italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, used in distance calculation 20
d2subscript𝑑2d_{2}italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, used in distance calculation 5
First-stage search stride 10
Second-stage search stride 1
JEEP Input frame number to JEEP 5
Failure judgment threshold 10
Hierarchical optimization w𝑤witalic_w, submap window size 5
Layer number 3

VII-A Datasets

Refer to caption
(a) COLORFUL
Refer to caption
(b) Campus-SS
Refer to caption
(c) Campus-BS
Refer to caption
(d) USTC FLICAR
Refer to caption
(e) TIERS
Refer to caption
(f) PandaSet
Refer to caption
(g) A2D2
Refer to caption
(h) UrbanNav
Figure 10: The platform and point clouds of utilized datasets. Three datasets are self-collected, five are publicly available; one is a simulated dataset, and seven are real-world datasets. These encompass 16 diverse types of LiDAR. For each dataset, the upper figure represents the actual position of the LiDARs, and the lower one illustrates the calibrated point clouds from LiDARs.
TABLE III: Datasets for Evaluating the Proposed Method. 16 Different Types of LiDAR are Involved.
Dataset LiDAR Type Self-Collected
Spinning Solid-State
COLORFUL LiDAR-{16, 32, 64} Livox-{Horizon, Mid40, Avia, Tele} Simulated
Campus-SS RS-Helios RS-LiDAR-M1, Livox-{Mid70, Avia} Realistic
Campus-BS RS-{Bpearl×\times×2, Helios×\times×2} / Realistic
USTC FLICAR [72] Ouster OS0-128, Velodyne-{VLP-32C,HDL-32E} Livox Avia Realistic
TIERS [73] VLP-16, Ouster-{OS1-64, OS0-128} Livox-{Horizon, Avia} Realistic
PandaSet  [74] Pandar64 PandarGT Realistic
A2D2 [75] VLP-16×\times×5 / Realistic
UrbanNav [76] Velodyne-{VLP16, HDL-32E}, Lslidar C16 / Realistic

To demonstrate the versatility of our method, a variety of real-world and simulated datasets are utilized, encompassing 16 diverse types of LiDARs. The datasets used in our experiments are listed in Tab. III. The calibrated point clouds and the platforms are shown in Fig. 10. A brief description of each dataset is provided below:

  • COLORFUL: We collected a multi-LiDAR dataset in CARLA [77], an open urban driving simulator. This dataset is captured using three mechanical LiDARs and four different Livox LiDARs333https://www.livoxtech.com/, each displaying data in various colors as illustrated in Fig. 10(a), hence the name “colorful”. The dataset covers a duration of 120 seconds, with each sensor operating at a rate of 20 Hz. The simulated environment provides accurate relative extrinsic parameters among the sensors, which serve as the ground truth for the calibration task. Therefore, the quantitative experiments described below are primarily performed using this dataset.

  • Campus-SS We collected this dataset using a low-speed vehicle carrying 4 LiDARs on the campus of USTC. This dataset is used to investigate the performance of algorithms in calibrating various solid-state LiDARs, such as Livox models and the RS-LiDAR-M1444https://www.robosense.ai/en/rslidar/RS-LiDAR-M1. For this reason, we have named it Campus-SS. The ground truth for this dataset is established by manually providing initial values for point cloud registration, performing the calculation five times, and taking the average. The calibration scenes are manually selected to be suitable for calibration tasks, ensuring sufficient constraints in all directions.

  • Campus-BS We collected the dataset on an autonomous logistic vehicle on USTC campus, equipped with two 32-beam LiDARs (RS-Helios555https://www.robosense.ai/en/rslidar/RS-Helios) and two special LiDARs for near-field blind-spots detection (RS-Bpearl666https://www.robosense.ai/en/rslidar/RS-Bpearl). We use this dataset to investigate the performance related to blind-spots LiDARs. For this reason, we have named it Campus-BS. The method for obtaining the ground truth is the same as Campus-SS dataset.

  • USTC FLICAR [72]: This dataset, collected by Wang et al., used an aerial platform on a bucket truck carrying four LiDARs for aerial work. It’s important to highlight that this dataset features two vertically positioned LiDARs with a reduced overlapping FoV, introducing challenges for calibration. Meanwhile, this dataset features movement trajectories distinct from those found in autonomous driving datasets, including vertical movements.

  • TIERS [73]: The TIERS dataset, collected by Li et al., encompasses data captured by different types of LiDARs, including spinning LiDARs with varying resolutions (16, 64, and 128 beams) and solid-state LiDARs with two different scan patterns. The dataset offers a diverse collection of environments, including indoor, outdoor, urban roads, forests, and various other scenes.

  • PandaSet [74]: This dataset was collected by Xiao et al., equipped with one mechanical spinning LiDAR (Pandar64) and one forward-facing LiDAR (PandarGT) for autonomous driving scenarios.

  • A2D2 [75]: The A2D2 dataset was collected by Geyer et al. with five 16-beam LiDARs on highways, country roads, and cities in the south of Germany. This dataset represents a practical vehicle sensor layout strategy, utilizing five LiDARs to provide comprehensive surround sensor coverage.

  • UrbanNav [76]: The UrbanNav dataset, collected by Hsu et al. in Hong Kong with 3 mechanical LiDARs, focuses on urban canyon scenarios. In our experiments, it is primarily used to test the performance of online calibration in these typical urban driving situations.

VII-B Metrics

VII-B1 Accuracy

To measure the difference between the calibration results and the ground truth, we define two metrics to quantify the errors in the rotation and translation components.

The rotation error is calculated using the following formula with the reference to [78, 79]:

erot=arccos((trace(Rgt1Rest)1)/2),subscript𝑒𝑟𝑜𝑡𝑡𝑟𝑎𝑐𝑒superscriptsubscriptR𝑔𝑡1subscriptR𝑒𝑠𝑡12e_{rot}=\arccos((trace(\textbf{R}_{gt}^{-1}\textbf{R}_{est})-1)/2),italic_e start_POSTSUBSCRIPT italic_r italic_o italic_t end_POSTSUBSCRIPT = roman_arccos ( ( italic_t italic_r italic_a italic_c italic_e ( R start_POSTSUBSCRIPT italic_g italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT R start_POSTSUBSCRIPT italic_e italic_s italic_t end_POSTSUBSCRIPT ) - 1 ) / 2 ) , (19)

where RestsubscriptR𝑒𝑠𝑡\textbf{R}_{est}R start_POSTSUBSCRIPT italic_e italic_s italic_t end_POSTSUBSCRIPT and RgtsubscriptR𝑔𝑡\textbf{R}_{gt}R start_POSTSUBSCRIPT italic_g italic_t end_POSTSUBSCRIPT respectively represent the estimated and ground truth values of the rotation. This formula calculates the angle between two rotation matrices in radians. The smaller the angle, the more accurate the estimation of rotation is.

The translation error is calculated using the formula that calculates the Euclidean distance in meters between two sensor positions, as follows:

etrans=testtgt2,subscript𝑒𝑡𝑟𝑎𝑛𝑠subscriptnormsubscriptt𝑒𝑠𝑡subscriptt𝑔𝑡2e_{trans}=\|\textbf{t}_{est}-\textbf{t}_{gt}\|_{2},italic_e start_POSTSUBSCRIPT italic_t italic_r italic_a italic_n italic_s end_POSTSUBSCRIPT = ∥ t start_POSTSUBSCRIPT italic_e italic_s italic_t end_POSTSUBSCRIPT - t start_POSTSUBSCRIPT italic_g italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , (20)

where testsubscriptt𝑒𝑠𝑡\textbf{t}_{est}t start_POSTSUBSCRIPT italic_e italic_s italic_t end_POSTSUBSCRIPT and tgtsubscriptt𝑔𝑡\textbf{t}_{gt}t start_POSTSUBSCRIPT italic_g italic_t end_POSTSUBSCRIPT respectively represent the estimated and ground truth values of the translation. The smaller the distance, the more accurate the estimation of translation is.

VII-B2 Success rate

When calibration failures, the rotation and translation errors can be significantly large, indicating only the failure of this particular instance, not the accuracy of the algorithm, so averaging all errors is not an effective method for comprehensively evaluating the calibration algorithm. For better evaluation, we define “calibration success” to filter samples for computing accuracy and use success rate, abbreviated as SR𝑆𝑅SRitalic_S italic_R, to measure the robustness of algorithms. For rotation initialization, we define success as the rotation errors less than 10superscript1010^{\circ}10 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, which is. For the final outcome, success is defined as rotation errors less than 1superscript11^{\circ}1 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT and translation errors less than 10cm10𝑐𝑚10~{}cm10 italic_c italic_m. The SR𝑆𝑅SRitalic_S italic_R is calculated as the number of successful samples divided by the total number of samples.

VII-C Study of Rotation Initialization

Refer to caption
Figure 11: The rotation error and success rate of rotation initialization tasks. We select eight representative LiDAR pairs from the COLORFUL dataset for calibration and test six methods. The x-axis of the graph represents the LiDAR types involved in the eight calibration tasks. The box plot indicates rotation error, while the line graph represents the success rate.

In this section, we conduct experiments to evaluate the accuracy of the proposed TERRA for rotation initialization. We compare its performance with several commonly used methods, including the motion-based Hand-Eye method [8], the feature-based FPFH method with Teaser++ [80, 81], the geometric-based CROON method [9], and the PHASER method [38], which also project the LiDAR frame onto the sphere for global registration. In addition, we enhance the classic scancontext [34]. Specifically, we extract and align the largest planes considered as the ground from the LiDAR scans first, to estimate the roll and pitch angles. The detail of plane alignment can be found in [9]. Subsequently, we leverage scancontext [34] to calculate the yaw, thereby accomplishing the 3-DoF rotation initialization. This method is denoted as scancontext* for comparison.

We conduct comprehensive tests on the proposed rotation initialization method, e.g., TERRA, starting with quantitative assessments of eight various LiDAR pairs within COLORFUL dataset, and comparing the results with five commonly used algorithms. Subsequently, the method is tested across various real-world datasets, encompassing different scenes and sensor types, demonstrating the versatility of TERRA. Finally, we perform ablation studies on some of the key parameters.

VII-C1 Quantitative experiment in the simulated dataset

Given that the COLORFUL dataset encompasses the most diverse sensor types and precise ground truth, we select eight typical calibration tasks to test TERRA. The difficulty of the tasks increases from left to right in Fig. 11.

The experimental results demonstrate that our method achieves the highest accuracy among the six methods tested. The Hand-Eye method requires sufficient motion, which cannot be satisfied in our randomly picked sequences, leading to the algorithm failing to converge. The FPFH-based method fails in almost all tasks due to the different distributions of point clouds in different LiDARs, which makes it difficult to use the same set of parameters to extract similar features locally. For the same reasons, PHASER performs poorly in calibrations related to solid-state LiDARs. Similarly, the geometry-based CROON method performs well in mechanical-mechanical and solid-solid initialization but fails in cross-modal tasks. The scancontext* achieves a precision comparable to TERRA, but it requires a substantial ground area in the LiDARs’ FoV to align roll and pitch. This condition is satisfied in the sensor layout of the COLORFUL dataset but may not be fulfilled in all actual scenarios, as demonstrated in the subsequent section.

In contrast, the proposed TERRA demonstrates robustness across all calibration tasks. For simple tasks, the success rate approaches 100%percent100100\%100 %, and even in the two most challenging tasks, the success rate exceeds 60%percent6060\%60 %. In terms of accuracy, for the left four tasks, the variances of TERRA’s results is very small, indicating the stability of TERRA across different environments. Notably, for the 64-Horizon calibration task, although the variance is small, the mean error is not near zero. This phenomenon is caused by the search step and TERRA’s resolution. We also consider this an excellent completion of the initialization task, as this error can be further reduced through subsequent refinement step. Overall, our experimental results demonstrate the superiority of our TERRA method, which can provide accurate and reliable rotation initialization for a wide range of LiDAR pairs. Part of the schematic diagram for TERRA is shown in Fig. 12. For clearer viewing, the TERRA for the first LiDAR is represented by the outer sphere, while the second is illustrated by the inner sphere.

Refer to caption
(a) 64-32
Refer to caption
(b) Horizon-Tele
Refer to caption
(c) 32-Mid40
Refer to caption
(d) 16-Tele
Figure 12: The example of the TERRA in COLORFUL dataset. The TERRA for the first LiDAR is represented by the outer sphere, while the second is illustrated by the inner sphere.
Dataset Calibration Target Scancontext* TERRA
erotsubscript𝑒𝑟𝑜𝑡e_{rot}italic_e start_POSTSUBSCRIPT italic_r italic_o italic_t end_POSTSUBSCRIPT(\downarrow SR𝑆𝑅SRitalic_S italic_R(%) \uparrow erotsubscript𝑒𝑟𝑜𝑡e_{rot}italic_e start_POSTSUBSCRIPT italic_r italic_o italic_t end_POSTSUBSCRIPT(\downarrow SR𝑆𝑅SRitalic_S italic_R(%) \uparrow
Campus1 RS-LiDAR-M1 LIVOX-AVIA / 0 1.144 100
Campus2 RS-Helios(top) RS-Helios(back) 7.491 46.46 1.521 100
RS-Helios(top) RS-Bpearl(right) / 0 2.243 100
USTC FLICAR Ouster OS0-128 VLP-32C / 0 3.178 63.54
TIERS LIVOX Horizon LIVOX AVIA 2.643 66.67 2.018 100
OS-1 LIVOX AVIA 4.84 21.73 1.918 100
OS-0 OS-1 4.008 67.57 0.5819 100
Pandaset Pandar64 PandarGT 5.399 92.40 1.438 93.67
A2D2 frontcenter frontleft / 0 8.726 61.37
UrbanNav VLP16 Lslidar C16 / 0 3.785 67.78
  • * For A2D2 dataset, the “frontcenter” and “frontleft” is the positions of the LiDARs, which are both VLP-16.

TABLE IV: The rotation initial result of TERRA in the real-world dataset.

VII-C2 Quantitative experiment in the real world

To verify the algorithm’s performance in real-world scenarios, we select ten representative LiDAR pairs from seven datasets as shown in Tab. IV. In this section, we compare our method with the best-performing method, Scancontext*, in last part.

Due to the variety of installation methods of LiDARs, the assumption of a large planar surface in LiDAR scan does not hold, resulting in the complete failure of Scancontext* in some scenarios, with a success rate of zero. In contrast, the proposed TERRA still achieves stable and successful rotational initialization across most of scenarios without making any assumptions about the environment. In most tasks, the rotation error is within 5 cm. In the A2D2 dataset, due to severe distortions of the point cloud caused by motion, the error is relatively larger, but the success rate still exceeds 60%percent6060\%60 %. What’s more, it’s worth noting that TERRA requires the input of two point clouds to have overlapping areas, but this does not mean that only LiDARs with overlapping areas can be calibrated. Similar to the approach [26] designed for non-overlapping LiDAR calibration, if a map is constructed for each LiDAR and then the overlapping local maps are input into the TERRA, it can still work effectively.

Fig. 13 presents several challenging tasks. The LiDAR pairs showcased in USTC FLICAR, A2D2, and UrbanNav are all positioned with the LiDARs tilted, resulting in minimal overlapping areas between the two LiDARs. Additionally, the unique hemispherical FoV of the RS-Bpearl in the Campus-BS dataset presents significant challenges.

VII-C3 Ablation study

In this section, we test some key parameters affecting the performance of TERRA, including the translation distance between LiDARs, the number of spherical sampling points, and the stride during the search.

The impact of translation: As TERRA can only initialize rotation, the translation component introduces an error into the distance calculation in Eq. (9). Within the COLORFUL dataset, for the same calibration tasks, i.e., 64-32 and 64-Horizon, we introduce translations ranging from 0 to 8 meters in both the direction parallel and perpendicular to the vehicle’s movement, thereby collecting 9 additional data sequences. The 0-meter translation implies the same location of the two LiDARs, which is impractical in reality but feasible in a simulation environment. Additionally, we conduct comparative experiments by selecting different values for d1subscript𝑑1d_{1}italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in the Eq. (8) to mitigate translation effects. The quantitative experimental results are illustrated in the Fig. 14. With a fixed d1subscript𝑑1d_{1}italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, the success rate of initialization decreases as the translation distance increases. When the translation distance is fixed, choosing d1subscript𝑑1d_{1}italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT values of 0m0𝑚0m0 italic_m and 10m10𝑚10m10 italic_m results in significantly poorer algorithm performance due to the disruption caused by nearby objects, compared to other values. Conversely, opting for larger values, such as 50m50𝑚50m50 italic_m or 60m60𝑚60m60 italic_m, the threshold becomes too high, reducing the number of available values in the distance calculation and thereby weakening the algorithm. After comprehensive evaluation, we fix d1subscript𝑑1d_{1}italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT at 20m20𝑚20m20 italic_m for our algorithm.

Refer to caption
(a) Campus-BS (top-right)
Refer to caption
(b) USTC FLICAR
Refer to caption
(c) A2D2
Refer to caption
(d) UrbanNav
Figure 13: The challenging example of the TERRA in the real-world dataset. The TERRA for the first LiDAR is represented by the outer sphere, while the second is illustrated by the inner sphere. For UrbanNav, both spheres are the same size.
Refer to caption
(a) 64-32, parallel
Refer to caption
(b) 64-32, perpendicular
Refer to caption
(c) 64-Horizon, parallel
Refer to caption
(d) 64-Horizon, perpendicular
Figure 14: The impact of translation on the success rate of rotation initialization and how the d1subscript𝑑1d_{1}italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT value mitigates this impact.
TABLE V: The impact of sampling number N𝑁Nitalic_N on the TERRA’s precision
N𝑁Nitalic_N 100 500 1000 3000 10000
Resolution() 18.98 8.70 6.10 3.55 2.76
64-32 erotsubscript𝑒𝑟𝑜𝑡e_{rot}italic_e start_POSTSUBSCRIPT italic_r italic_o italic_t end_POSTSUBSCRIPT(\downarrow 6.204 2.953 2.176 2.061 0.935
SR𝑆𝑅SRitalic_S italic_R(%) \uparrow 13.08 69.4 100 100 100
64-Horizon erotsubscript𝑒𝑟𝑜𝑡e_{rot}italic_e start_POSTSUBSCRIPT italic_r italic_o italic_t end_POSTSUBSCRIPT(\downarrow 4.978 6.469 4.835 3.542 2.743
SR𝑆𝑅SRitalic_S italic_R(%) \uparrow 5.70 98.65 99.66 100 100
64-Tele erotsubscript𝑒𝑟𝑜𝑡e_{rot}italic_e start_POSTSUBSCRIPT italic_r italic_o italic_t end_POSTSUBSCRIPT(\downarrow \ \ \ 5.183 4.995
SR𝑆𝑅SRitalic_S italic_R(%) \uparrow 0 0 0 26.17 45.6
Refer to caption
(a) 64-32-Eq. (9)
Refer to caption
(b) 64-Tele-Eq. (9)
Refer to caption
(c) 64-32-ED
Refer to caption
(d) 64-Tele-ED
Figure 15: Rotations and their corresponding distances. The axes represent the Euler angles for rotation around the x, y, and z axes, respectively. The opacity signifies the value of the distance, where less opacity indicates a smaller distance. The red point represents the location of the ground truth. ED is an abbreviation for Euclidean Distance.

The impact of sampling number: The sample number N𝑁Nitalic_N of Fibonacci lattice determines the resolution of TERRA. We use 64-32, 64-Horizon and 64-Tele calibration tasks as examples to illustrate the performance of TERRA at varying resolutions. As demonstrated in Tab. V, the precision decreases as the resolution reduces. However, because the 32-beam LiDAR and Horizon has a wider FoV, it can still achieve acceptable success rate even at low resolutions. Meanwhile, Tele has only a 14.5×16.2superscript14.5superscript16.214.5^{\circ}\times 16.2^{\circ}14.5 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT × 16.2 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT FoV. When TERRA’s resolution exceeds 16.2superscript16.216.2^{\circ}16.2 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, it means that there is only one valid data in the entire descriptor, which will understandably lead to the algorithm’s failure.

The impact of search stride: Similar to the sampling number, the search stride also has an impact on performance. It’s worth noting that the stride we talk here is the stride in the first stage, not the heuristic search stride in the second stage. The reason being, in the second stage, we only use a small stride such as 1superscript11^{\circ}1 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT to search, which is still not particularly time-consuming. However, in the first stage, the stride will increase the search time cubically, so a stride as large as possible is essential. Yet, an exceedingly large stride may result in a failure to fall into the locally convex space close to the ground truth during traversal, leading to a convergence to incorrect results.

As shown in Fig. 15(a) and Fig. 15(b), we traverse the rotation space with a resolution of 10superscript1010^{\circ}10 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT and calculate the corresponding distance for each rotation. Opacity signifies the value of the distance, with lesser opacity indicating a smaller distance. The small FoV of the Tele results in a small locally convex space, while in contrast, the locally convex space of the 64-32 is relatively large. In summary, for LiDARs like Tele, a large step size could likely cause the algorithm to miss local convex space, leading to failure. Therefore, in our algorithm, we use a 10superscript1010^{\circ}10 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT step size, which is smaller than Tele’s FoV.

The impact of distance function: As depicted in Fig. 15, we compare the distance from Eq. 9 and the Euclidean distance (ED). The two tasks tested perfectly demonstrate the two beneficial properties of our designed distance function. Firstly, as shown in Fig. 15(a) and Fig. 15(c), our distance function has a clear gradient near the ground truth, which facilitates faster search. Additionally, for 64-Tele, although there are some local optima in the solution space shown in Fig. 15(b), the ground truth remains the global optimum. However, using ED makes the global optimum far from the ground truth, which is the worst case.

TABLE VI: Rotation Error(erot()e_{rot}(^{\circ})italic_e start_POSTSUBSCRIPT italic_r italic_o italic_t end_POSTSUBSCRIPT ( start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT )), Translation Error(etrans(cm)subscript𝑒𝑡𝑟𝑎𝑛𝑠𝑐𝑚e_{trans}(cm)italic_e start_POSTSUBSCRIPT italic_t italic_r italic_a italic_n italic_s end_POSTSUBSCRIPT ( italic_c italic_m )) and Success Rate(SR(%)SR(\%)italic_S italic_R ( % )) of Different Approaches.
The initial value is ground truth with 2m2𝑚2m2 italic_m and 5superscript55^{\circ}5 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT translation and rotation disturbance.
Rotational Error and Translation Error are only recorded when the experiment is successful (SR𝑆𝑅SRitalic_S italic_R>0).
Target CROON [9] scan-scan JEEP MLCC [26] HOTS
erotsubscript𝑒𝑟𝑜𝑡e_{rot}italic_e start_POSTSUBSCRIPT italic_r italic_o italic_t end_POSTSUBSCRIPT \downarrow etranssubscript𝑒𝑡𝑟𝑎𝑛𝑠e_{trans}italic_e start_POSTSUBSCRIPT italic_t italic_r italic_a italic_n italic_s end_POSTSUBSCRIPT \downarrow SR𝑆𝑅SRitalic_S italic_R  \uparrow erotsubscript𝑒𝑟𝑜𝑡e_{rot}italic_e start_POSTSUBSCRIPT italic_r italic_o italic_t end_POSTSUBSCRIPT \downarrow etranssubscript𝑒𝑡𝑟𝑎𝑛𝑠e_{trans}italic_e start_POSTSUBSCRIPT italic_t italic_r italic_a italic_n italic_s end_POSTSUBSCRIPT \downarrow SR𝑆𝑅SRitalic_S italic_R \uparrow erotsubscript𝑒𝑟𝑜𝑡e_{rot}italic_e start_POSTSUBSCRIPT italic_r italic_o italic_t end_POSTSUBSCRIPT \downarrow etranssubscript𝑒𝑡𝑟𝑎𝑛𝑠e_{trans}italic_e start_POSTSUBSCRIPT italic_t italic_r italic_a italic_n italic_s end_POSTSUBSCRIPT \downarrow SR𝑆𝑅SRitalic_S italic_R \uparrow erotsubscript𝑒𝑟𝑜𝑡e_{rot}italic_e start_POSTSUBSCRIPT italic_r italic_o italic_t end_POSTSUBSCRIPT \downarrow etranssubscript𝑒𝑡𝑟𝑎𝑛𝑠e_{trans}italic_e start_POSTSUBSCRIPT italic_t italic_r italic_a italic_n italic_s end_POSTSUBSCRIPT \downarrow SR𝑆𝑅SRitalic_S italic_R \uparrow erotsubscript𝑒𝑟𝑜𝑡e_{rot}italic_e start_POSTSUBSCRIPT italic_r italic_o italic_t end_POSTSUBSCRIPT \downarrow etranssubscript𝑒𝑡𝑟𝑎𝑛𝑠e_{trans}italic_e start_POSTSUBSCRIPT italic_t italic_r italic_a italic_n italic_s end_POSTSUBSCRIPT \downarrow SR𝑆𝑅SRitalic_S italic_R \uparrow
LiDAR-64
LiDAR-32
0.032 1.269 25.63 0.176 4.911 20.84 0.007 0.229 100 0.007 0.127 100 0.006 0.144 100
COLORFUL
LiDAR-32
LiDAR-16
0.110 2.338 26.39 0.215 4.885 22.81 0.074 1.378 100 0.130 1.820 100 0.027 0.197 100
LIVOX-Horizon
LIVOX-Tele
0.199 5.779 0.830 0.133 5.351 1.740 0.061 4.486 49.12 0.026 6.048 100 0.030 1.786 100
LiDAR-64
LIVOX-Horizon
0.122 5.386 14.24 0.135 4.141 8.932 0.010 0.793 98.84 0.008 1.640 100 0.009 1.374 100
LiDAR-32
LIVOX-Avia
0.425 5.699 1.044 0.275 5.852 6.148 0.089 3.079 91.10 0.041 2.862 100 0.025 0.926 100
LiDAR-64
LIVOX-Tele
0.215 5.290 0.573 0.164 5.824 1.082 0.079 4.060 55.89 0.031 6.482 100 0.032 2.356 100
Campus-SS
RS-LiDAR-M1
LIVOX-AVIA
0.313 7.609 10.4 0.534 6.962 1.301 0.205 3.003 95.12 0.159 2.277 100 0.234 0.461 100
Campus-BS
RS-Helios(top)
RS-Helios(back)
/ / 0 / / 0 0.325 2.623 96.34 0.284 2.215 100 0.349 1.555 100
RS-Helios(top)
RS-Bpearl(right)
/ / 0 0.815 7.524 0.975 0.820 7.285 50.19 0.428 3.46 100 0.6914 4.635 100
USTC FLICAR
Ouster OS0-128
VLP-32C
/ / 0 / / 0 0.791 4.854 41.25 0.505 6.521 100 0.5123 4.151 100
TIERS
LIVOX Horizon
LIVOX AVIA
0.3867 7.021 13.56 0.433 6.601 8.861 0.307 8.051 88.69 / / 0 0.117 3.656 100
OS-1
LIVOX AVIA
0.2255 6.83 8.108 0.804 7.158 3.436 0.201 3.032 97.50 0.150 5.109 100 0.103 1.248 100
OS-0
OS-1
0.9093 5.983 25.41 0.776 6.866 5.606 0.051 1.787 100 / / 0 0.1815 4.196 100
Pandaset
Pandar64
PandarGT
0.142 4.318 15.69 0.069 5.32 4.167 0.143 3.018 100 0.025 1.355 100 0.1258 2.046 100
A2D2
frontcenter
frontleft
/ / 0 / / 0 0.721 3.025 30.02 0.094 5.697 100 0.7225 2.9 100
UrbanNav
HDL-32E
Lslidar C16
0.9301 9.256 0.4926 0.772 2.708 0.218 0.748 9.215 32 / / 0 0.3709 5.903 100

VII-D Study of Refinement

In this section, we conduct experiments to evaluate the accuracy of the proposed JEEP and HOTS for extrinsic refinement. We compare their performance with several commonly used methods, including CROON [9] and MLCC [26]. In addition to this, we also conduct tests on the direct scan-scan method. It is noteworthy that the classic LiDAR calibration algorithm M-LOAM [8], which can only calibrate mechanical LiDARs, is not suitable for our task setting, and therefore, it has not been included in our comparison.

VII-D1 Quantitative experiment

As indicated in the Tab. VI, we evaluate the extrinsic refinement methods on six types of LiDAR pairs of COLORFUL dataset and ten types of real-world datasets. We perturb the ground truth to serve as the input initial values, including a 5superscript55^{\circ}5 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT rotation and a 2m2𝑚2m2 italic_m translation. Among the five methods, CROON and scan-scan requires single point cloud pairs as input. For JEEP, the number of input frames is set to five, with registration parameters consistent with those in scan-scan. Lastly, MLCC and HOTS utilize the entire sequence to generate the final result. Specifically, MLCC and HOTS require the trajectories of each LiDAR as input, which we estimate using the Multi-LiDAR SLAM described in Sec. VI-A, where the required extrinsic parameters are substituted with the ground truth disturbed by the average error of JEEP on the corresponding dataset.

As shown in the table on the left, due to the poor initial values, as well as the distribution of the point cloud and the degeneracy of the scene, both CROON and the direct scan-to-scan methods have low success rates, failing entirely on some challenging tasks. In contrast, after introducing the constraints of motion, the success rate of JEEP significantly improves. Even in the challenging tasks such as Campus-BS, USTC FLICAR, A2D2, and UrbanNav, we maintain a success rate of over 30%. The calibrated point clouds for these four tasks are displayed in Fig. 16.

With the use of all available data and backend optimization, MLCC shows enhanced performance. Benefiting from the hierarchical optimization design, our approach ensures better global consistency of the map, resulting in more accurate poses and extrinsic parameters.

Refer to caption
(a) Campus-BS (top-right)
Refer to caption
(b) USTC FLICAR
Refer to caption
(c) A2D2
Refer to caption
(d) UrbanNav
Figure 16: The example of the difficult calibration targets in the real-world dataset.

VII-D2 Ablation study

In this section, we test the impact of initial value on JEEP, the impact of the number of frames input into JEEP and the impact of time synchronization.

The impact of initial value: To test the robustness of JEEP under various initial conditions, we sample rotation disturbances from 00 to 12.512.512.512.5 degrees and translation disturbances from 00 to 5555 meters, conducting 6×6666\times 66 × 6 experiments for each LiDAR-pair. The heatmaps of rotation errors, translation errors, and success rates, as shown in Fig. 17, which demonstrate that JEEP is highly robust to inaccurate initial values, achieving an 80%percent8080\%80 % success rate even with disturbances of 10101010 degrees and 5555 meters. Moreover, it is crucial to note that this experiment demonstrates that within a 5-meter range, translation disturbance doesn’t significantly impact the final results. However, when rotation disturbance exceeded 10superscript1010^{\circ}10 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, noticeable declines are observed in the performance of the refinement process. This phenomenon validates the correctness and necessity of performing only rotation initialization for single-platform calibration.

Refer to caption
(a) 64-32 erotsubscript𝑒𝑟𝑜𝑡e_{rot}italic_e start_POSTSUBSCRIPT italic_r italic_o italic_t end_POSTSUBSCRIPT
Refer to caption
(b) 64-32 etranssubscript𝑒𝑡𝑟𝑎𝑛𝑠e_{trans}italic_e start_POSTSUBSCRIPT italic_t italic_r italic_a italic_n italic_s end_POSTSUBSCRIPT
Refer to caption
(c) 64-32 SR𝑆𝑅SRitalic_S italic_R
Refer to caption
(d) 64-Horizon erotsubscript𝑒𝑟𝑜𝑡e_{rot}italic_e start_POSTSUBSCRIPT italic_r italic_o italic_t end_POSTSUBSCRIPT
Refer to caption
(e) 64-Horizon etranssubscript𝑒𝑡𝑟𝑎𝑛𝑠e_{trans}italic_e start_POSTSUBSCRIPT italic_t italic_r italic_a italic_n italic_s end_POSTSUBSCRIPT
Refer to caption
(f) 64-Horizon SR𝑆𝑅SRitalic_S italic_R
Figure 17: Calibration results of JEEP under different rotation and translation disturbance in COLORFUL dataset.

The impact of the frame number: The number of frames input into JEEP is an interesting parameter. As analyzed in Sec. VI-A, too few frames fail to provide sufficient constrains, while too many frames introduce cumulative errors of pose estimation to extrinsic calibration. The disturbance remain consistent with our main experiment, i.e., 2 meters and 5 degrees. We adjust the frame number from 2 to 12 to investigate the effects of varying the number of input frames. The rotational and translational errors and successful rates are shown in the Fig. 18. For tasks like 64-32 calibration, where the point cloud distribution is already uniform, fewer frames can achieve good results, and increasing the number of frames is not beneficial. However, for tasks like 64-Horizon and 64-Tele calibration, which inherently lack sufficient constraints, a certain number of frames is necessary to achieve satisfactory results.

Refer to caption
(a) SR𝑆𝑅SRitalic_S italic_R
Refer to caption
(b) etranssubscript𝑒𝑡𝑟𝑎𝑛𝑠e_{trans}italic_e start_POSTSUBSCRIPT italic_t italic_r italic_a italic_n italic_s end_POSTSUBSCRIPT
Refer to caption
(c) erotsubscript𝑒𝑟𝑜𝑡e_{rot}italic_e start_POSTSUBSCRIPT italic_r italic_o italic_t end_POSTSUBSCRIPT
Figure 18: Calibration results of JEEP for different input frame number in COLORFUL dataset.

The impact of time synchronization: To validate the effectiveness of the time synchronization introduced in Sec. 9, we reduce the frequency of the data in COLORFUL dataset from 20Hz20𝐻𝑧20Hz20 italic_H italic_z to 10Hz10𝐻𝑧10Hz10 italic_H italic_z. In specific, we remove data from two LiDARs at different times to maintain a 50ms50𝑚𝑠50ms50 italic_m italic_s interval between them. It is important to note that for 10Hz10𝐻𝑧10Hz10 italic_H italic_z data, 50ms50𝑚𝑠50ms50 italic_m italic_s represents the maximum achievable time offset. We repeat the experiments on JEEP and HOTS according to the previous experimental setup, and the results are displayed in the Tab. VII. Due to the lack of time synchronization, errors caused by motion are introduced to calibration, resulting in JEEP’s success rate being significantly inferior to those previously shown in Tab. VI. In fact, these successful cases, accounting for about 20%percent2020\%20 % of the total, are attributed to the fact that approximately 20%percent2020\%20 % of the data collection period involves minimal movement. Therefore, the specific values of erotsubscript𝑒𝑟𝑜𝑡e_{rot}italic_e start_POSTSUBSCRIPT italic_r italic_o italic_t end_POSTSUBSCRIPT and etranssubscript𝑒𝑡𝑟𝑎𝑛𝑠e_{trans}italic_e start_POSTSUBSCRIPT italic_t italic_r italic_a italic_n italic_s end_POSTSUBSCRIPT are not meaningful in explaining the calibration performance in this context. Alternatively, we use extrinsic parameters with disturbances of 1superscript11^{\circ}1 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT and 10cm10𝑐𝑚10cm10 italic_c italic_m for the multi-LiDAR SLAM, and further optimize the output pose using hierarchical optimization.

After applying hierarchical optimization, a noticeable improvement is achieved, yielding results comparable to previous ones. Furthermore, the estimated time offsets are close to the ground truth of 50ms50𝑚𝑠50ms50 italic_m italic_s, demonstrating that our method can achieve millisecond-level time synchronization accuracy. Fig. 19 illustrates the relationship between the sampled time offset and extrinsic variance, translation error, and rotation error. It is observed that both translation and rotation errors reach their minimum values around 50ms50𝑚𝑠-50ms- 50 italic_m italic_s, and the variance shows a positive correlation with them. This indicates the validity of using variance to infer the true extrinsic parameters.

TABLE VII: Experiments on dataset with 50ms time offset.
Target 64-32 64-Horizon
JEEP erot()e_{rot}(^{\circ})italic_e start_POSTSUBSCRIPT italic_r italic_o italic_t end_POSTSUBSCRIPT ( start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ) \downarrow 0.052 0.033
etrans(cm)subscript𝑒𝑡𝑟𝑎𝑛𝑠𝑐𝑚e_{trans}(cm)italic_e start_POSTSUBSCRIPT italic_t italic_r italic_a italic_n italic_s end_POSTSUBSCRIPT ( italic_c italic_m ) \downarrow 0.738 0.894
SR𝑆𝑅SRitalic_S italic_R(%) \uparrow 20.69 20.23
HOTS erot()e_{rot}(^{\circ})italic_e start_POSTSUBSCRIPT italic_r italic_o italic_t end_POSTSUBSCRIPT ( start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ) \downarrow 0.061 0.022
etrans(cm)subscript𝑒𝑡𝑟𝑎𝑛𝑠𝑐𝑚e_{trans}(cm)italic_e start_POSTSUBSCRIPT italic_t italic_r italic_a italic_n italic_s end_POSTSUBSCRIPT ( italic_c italic_m ) \downarrow 5.508 3.148
SR𝑆𝑅SRitalic_S italic_R(%) \uparrow 100 100
Time Offset(ms) -37 -42
Refer to caption
(a) 64-32
Refer to caption
(b) 64-Horizon
Figure 19: The relationship between the sampled time offset and extrinsic variance. The ground truth time offset is 50ms50𝑚𝑠-50ms- 50 italic_m italic_s.

VIII Discussion and Future Work

In this section, we discuss the limitations of the proposed methods and list some potential future work.

VIII-A TERRA

The previous experiments confirm the robustness of TERRA in various scenarios and tasks. However, TERRA is not a descriptor that can be universally applied across all scenarios. In the design of TERRA, we enhance its robustness against translation between two LiDARs by filtering nearby objects. This implies that in indoor environments where the translation between two LiDARs is comparable to the room size, TERRA is likely to underperform. Additionally, for other calibration tasks, such as roadside LiDARs calibration [82, 83], where the translation is on the same scale as the scanning radius of the LiDARs, TERRA also fails to function properly.

However, these limitations also provide new inspiration. The generation process of TERRA fundamentally involves uniformly down-sampling the point cloud by angle, which means that TERRA does not lose the 3D structural information of the point cloud. Currently, TERRA is designed for estimating the rotation between two LiDARs, but with the appropriate loss design, TERRA still has the potential application on 6-DoF pose estimation.

VIII-B Refinement

In the first step of refinement, we employ JEEP effectively addressing registration failures caused by minimal overlapping areas or sparsity of point clouds by alternately estimating extrinsic parameters and poses. This method also offers the advantage of determining the success of calibration by monitoring the iteration count. Specifically, when the calibration parameters closely approximate the ground truth, the pose estimation process using data from both LiDARs typically converges easily. Furthermore, the accuracy of the incrementally constructed local map, facilitated by precise pose estimation, promotes convergence in the calibration step. Ideally, a single cycle of pose estimation and calibration may suffice. In this case, we can affirm the success of the calibration and advance to the subsequent phase of accelerated multi-LiDAR SLAM.

Although calibrating during motion offers the aforementioned benefits, the effectiveness is undoubtedly influenced by the motion despite the used backend optimization steps. Firstly, the obvious effect is that motion causes distortion in the point cloud, which is detrimental to point cloud registration. Additionally, due to imperfect timestamp alignment, the estimated extrinsic parameters include components of motion as shown in Fig. 9. Although we reduce this effect through time synchronization steps, our practice of interpolating motion trajectories based on the assumption of the constant velocity is still not precise. A direct solution is to feed the algorithm with point clouds collected in a stationary state from different locations, similar to [24, 25]. However, this approach still requires a fixed calibration data collection step, unlike our approach, which is applicable to any data segment. Furthermore, we intend to adopt the continuous-time SLAM approach [55, 49] in the future, treating point cloud sequences as a continuous point stream rather than temporally discrete frame, which allows for more precise modeling of continuous-time trajectories.

IX Conclusion

In this paper, we design a universal LiDAR calibration method including four core steps, resolving the calibration challenges of initialization and refinement brought by different FoV and point cloud distributions due to the variety of LiDAR types. Firstly, we design a universal spherical descriptor TERRA to initialize the rotation part of the extrinsics without any prior knowledge. Following this, a method called JEEP, for the joint estimation of extrinsic and poses, is proposed to achieve optimized, usable extrinsics. To further refine these parameters with additional data, we implement a straightforward multi-LiDAR SLAM approach and utilize the hierarchical optimization to optimize the pose of each point cloud frame, subsequently enhancing the estimation of the extrinsics and time offsets. We believe our method offers a universal solution for LiDAR calibration, unaffected by the calibration environment or the type of LiDAR. This will provide a convenient and precise calibration tool for downstream tasks, allowing them to focus more on their primary objectives without being influenced by preliminary tasks.

References

  • [1] J. Zhang and S. Singh, “LOAM: Lidar Odometry and Mapping in Real-time.” in Robotics: Science and Systems, vol. 2, no. 9, 2014, pp. 1–9.
  • [2] J. Lin and F. Zhang, “Loam Livox: A fast, Robust, High-Precision LiDAR Odometry and Mapping Package for LiDARs of Small FoV.” in International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 3126–3131.
  • [3] J. Lin, X. Liu, and F. Zhang, “A Decentralized Framework for Simultaneous Calibration, Localization and Mapping with Multiple LiDARs,” in International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2020, pp. 4870–4877.
  • [4] H. Hu, Z. Liu, S. Chitlangia, A. Agnihotri, and D. Zhao, “Investigating the Impact of Multi-LiDAR Placement on Object Detection for Autonomous Driving,” in Conference on Computer Vision and Pattern Recognition (CVPR).   IEEE, 2022, pp. 2550–2559.
  • [5] C. Glennie and D. D. Lichti, “Static calibration and analysis of the velodyne hdl-64e s2 for high accuracy mobile scanning,” Remote sensing, vol. 2, no. 6, pp. 1610–1624, 2010.
  • [6] J. Lin, C. Zheng, W. Xu, and F. Zhang, “R2live: A robust, real-time, lidar-inertial-visual tightly-coupled state estimator and mapping,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 7469–7476, 2021.
  • [7] J. Lin and F. Zhang, “R3live: A robust, real-time, rgb-colored, lidar-inertial-visual tightly-coupled state estimation and mapping package,” arXiv preprint arXiv:2109.07982, 2021.
  • [8] J. Jiao, H. Ye, Y. Zhu, and M. Liu, “Robust odometry and mapping for multi-lidar systems with online extrinsic calibration,” IEEE Transactions on Robotics, vol. 38, no. 1, pp. 351–371, 2021.
  • [9] P. Wei, G. Yan, Y. Li, K. Fang, W. Liu, X. Cai, and J. Yang, “Croon: Automatic multi-lidar calibration and refinement method in road scene,” arXiv preprint arXiv:2203.03182, 2022.
  • [10] J. Jiao, Q. Liao, Y. Zhu, T. Liu, Y. Yu, R. Fan, L. Wang, and M. Liu, “A novel dual-lidar calibration algorithm using planar surfaces,” in 2019 IEEE Intelligent Vehicles Symposium (IV).   IEEE, 2019, pp. 1499–1504.
  • [11] F. Ma, S. Wang, and M. Liu, “An automatic multi-lidar extrinsic calibration algorithm using corner planes,” in 2022 IEEE International Conference on Robotics and Biomimetics (ROBIO).   IEEE, 2022, pp. 235–240.
  • [12] S. Das, N. Mahabadi, A. Djikic, C. Nassir, S. Chatterjee, and M. Fallon, “Extrinsic calibration and verification of multiple non-overlapping field of view lidar sensors,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 919–925.
  • [13] S. Das, L. af Klinteberg, M. Fallon, and S. Chatterjee, “Observability-aware online multi-lidar extrinsic calibration,” IEEE Robotics and Automation Letters, vol. 8, no. 5, pp. 2860–2867, 2023.
  • [14] Z. Gong, C. Wen, C. Wang, and J. Li, “A target-free automatic self-calibration approach for multibeam laser scanners,” IEEE Transactions on Instrumentation and Measurement, vol. 67, no. 1, pp. 238–240, 2017.
  • [15] P. J. Besl and N. D. McKay, “Method for registration of 3-d shapes,” in Sensor fusion IV: control paradigms and data structures, vol. 1611.   Spie, 1992, pp. 586–606.
  • [16] P. Biber and W. Straßer, “The normal distributions transform: A new approach to laser scan matching,” in Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003)(Cat. No. 03CH37453), vol. 3.   IEEE, 2003, pp. 2743–2748.
  • [17] Y. Chen and G. Medioni, “Object modelling by registration of multiple range images,” Image and vision computing, vol. 10, no. 3, pp. 145–155, 1992.
  • [18] A. Segal, D. Haehnel, and S. Thrun, “Generalized-icp.” in Robotics: science and systems, vol. 2, no. 4.   Seattle, WA, 2009, p. 435.
  • [19] A. Censi, “An accurate closed-form estimate of icp’s covariance,” in Proceedings 2007 IEEE international conference on robotics and automation.   IEEE, 2007, pp. 3167–3172.
  • [20] M. Brossard, S. Bonnabel, and A. Barrau, “A new approach to 3d icp covariance estimation,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 744–751, 2020.
  • [21] A. De Maio and S. Lacroix, “Deep bayesian icp covariance estimation,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 6519–6525.
  • [22] B. Ponton, M. Ferri, L. König, and M. Bartels, “Efficient extrinsic calibration of multi-sensor 3d lidar systems for autonomous vehicles using static objects information,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2022, pp. 6285–6292.
  • [23] I. Tahiraj, F. Fent, P. Hafemann, E. Ye, and M. Lienkamp, “Gmmcalib: Extrinsic calibration of lidar sensors using gmm-based joint registration,” arXiv preprint arXiv:2404.03427, 2024.
  • [24] M. Nie, W. Shi, W. Fan, and H. Xiang, “Automatic extrinsic calibration of dual lidars with adaptive surface normal estimation,” IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–11, 2022.
  • [25] J. R. Xu, S. Huang, S. Qiu, L. Zhao, W. Yu, M. Fang, M. Wang, and R. Li, “Lidar-link: Observability-aware probabilistic plane-based extrinsic calibration for non-overlapping solid-state lidars,” IEEE Robotics and Automation Letters, 2024.
  • [26] X. Liu and F. Zhang, “Extrinsic calibration of multiple lidars of small fov in targetless environments,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2036–2043, 2021.
  • [27] X. Liu, C. Yuan, and F. Zhang, “Targetless extrinsic calibration of multiple small fov lidars and cameras using adaptive voxelization,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1–12, 2022.
  • [28] L. Heng, “Automatic targetless extrinsic calibration of multiple 3d lidars and radars,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2020, pp. 10 669–10 675.
  • [29] R. Horaud and F. Dornaika, “Hand-eye calibration,” The international journal of robotics research, vol. 14, no. 3, pp. 195–210, 1995.
  • [30] K. Hausman, J. Preiss, G. S. Sukhatme, and S. Weiss, “Observability-aware trajectory optimization for self-calibration with application to uavs,” IEEE Robotics and Automation Letters, vol. 2, no. 3, pp. 1770–1777, 2017.
  • [31] Z. Taylor and J. Nieto, “Motion-based calibration of multimodal sensor extrinsics and timing offset estimation,” IEEE Transactions on Robotics, vol. 32, no. 5, pp. 1215–1229, 2016.
  • [32] J. Jiao, Y. Yu, Q. Liao, H. Ye, R. Fan, and M. Liu, “Automatic calibration of multiple 3d lidars in urban environments,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2019, pp. 15–20.
  • [33] L. He, X. Wang, and H. Zhang, “M2dp: A novel 3d point cloud descriptor and its application in loop closure detection,” in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2016, pp. 231–237.
  • [34] G. Kim and A. Kim, “Scan context: Egocentric spatial descriptor for place recognition within 3d point cloud map,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2018, pp. 4802–4809.
  • [35] G. Kim, S. Choi, and A. Kim, “Scan context++: Structural place recognition robust to rotation and lateral variations in urban environments,” IEEE Transactions on Robotics, vol. 38, no. 3, pp. 1856–1874, 2021.
  • [36] L. Luo, S. Zheng, Y. Li, Y. Fan, B. Yu, S.-Y. Cao, J. Li, and H.-L. Shen, “Bevplace: Learning lidar-based place recognition using bird’s eye view images,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8700–8709.
  • [37] C. Fu, L. Li, L. Peng, Y. Ma, X. Zhao, and Y. Liu, “Overlapnetvlad: A coarse-to-fine framework for lidar-based place recognition,” arXiv preprint arXiv:2303.06881, 2023.
  • [38] L. Bernreiter, L. Ott, J. Nieto, R. Siegwart, and C. Cadena, “Phaser: A robust and correspondence-free global pointcloud registration,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 855–862, 2021.
  • [39] S.-J. Han, J. Kang, K.-W. Min, and J. Choi, “Dilo: Direct light detection and ranging odometry based on spherical range images for autonomous driving,” ETRI journal, vol. 43, no. 4, pp. 603–616, 2021.
  • [40] T. Shan, B. Englot, F. Duarte, C. Ratti, and D. Rus, “Robust place recognition using an imaging lidar,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021, pp. 5469–5475.
  • [41] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: An efficient alternative to sift or surf,” in 2011 International conference on computer vision.   Ieee, 2011, pp. 2564–2571.
  • [42] S. Zhao, P. Yin, G. Yi, and S. Scherer, “Spherevlad++: Attention-based and signal-enhanced viewpoint invariant descriptor,” IEEE Robotics and Automation Letters, vol. 8, no. 1, pp. 256–263, 2022.
  • [43] X. Chen, T. Läbe, A. Milioto, T. Röhling, O. Vysotska, A. Haag, J. Behley, and C. Stachniss, “Overlapnet: Loop closing for lidar-based slam,” arXiv preprint arXiv:2105.11344, 2021.
  • [44] L. Bernreiter, L. Ott, J. Nieto, R. Siegwart, and C. Cadena, “Spherical multi-modal place recognition for heterogeneous sensor systems,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021, pp. 1743–1750.
  • [45] T. S. Cohen, M. Geiger, J. Köhler, and M. Welling, “Spherical cnns,” arXiv preprint arXiv:1801.10130, 2018.
  • [46] C. Esteves, C. Allen-Blanchette, A. Makadia, and K. Daniilidis, “Learning so (3) equivariant representations with spherical cnns,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 52–68.
  • [47] N. Perraudin, M. Defferrard, T. Kacprzak, and R. Sgier, “Deepsphere: Efficient spherical convolutional neural network with healpix sampling for cosmological applications,” Astronomy and Computing, vol. 27, pp. 130–146, 2019.
  • [48] H. Wang, C. Wang, C.-L. Chen, and L. Xie, “F-loam: Fast lidar odometry and mapping,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2021, pp. 4390–4396.
  • [49] P. Dellenbach, J.-E. Deschaud, B. Jacquet, and F. Goulette, “Ct-icp: Real-time elastic lidar odometry with loop closure,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 5580–5586.
  • [50] I. Vizzo, T. Guadagnino, B. Mersch, L. Wiesmann, J. Behley, and C. Stachniss, “Kiss-icp: In defense of point-to-point icp–simple, accurate, and robust registration if done the right way,” IEEE Robotics and Automation Letters, vol. 8, no. 2, pp. 1029–1036, 2023.
  • [51] Y. Pan, P. Xiao, Y. He, Z. Shao, and Z. Li, “Mulls: Versatile lidar slam via multi-metric linear least square,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021, pp. 11 633–11 640.
  • [52] T. Shan and B. Englot, “Lego-loam: Lightweight and ground-optimized lidar odometry and mapping on variable terrain,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2018, pp. 4758–4765.
  • [53] Y. Duan, J. Peng, Y. Zhang, J. Ji, and Y. Zhang, “Pfilter: Building persistent maps through feature filtering for fast and accurate lidar-based slam,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2022, pp. 11 087–11 093.
  • [54] J. Jiao, Y. Zhu, H. Ye, H. Huang, P. Yun, L. Jiang, L. Wang, and M. Liu, “Greedy-based feature selection for efficient lidar slam,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021, pp. 5222–5228.
  • [55] X. Zheng and J. Zhu, “Traj-lo: In defense of lidar-only odometry using an effective continuous-time trajectory,” IEEE Robotics and Automation Letters, 2024.
  • [56] ——, “Traj-lio: A resilient multi-lidar multi-imu state estimator through sparse gaussian process,” arXiv preprint arXiv:2402.09189, 2024.
  • [57] A. Reinke, M. Palieri, B. Morrell, Y. Chang, K. Ebadi, L. Carlone, and A.-A. Agha-Mohammadi, “Locus 2.0: Robust and computationally efficient lidar odometry for real-time 3d mapping,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9043–9050, 2022.
  • [58] X. Huang, G. Mei, J. Zhang, and R. Abbas, “A comprehensive survey on point cloud registration,” arXiv preprint arXiv:2103.02690, 2021.
  • [59] J. E. Dennis Jr and R. B. Schnabel, Numerical methods for unconstrained optimization and nonlinear equations.   SIAM, 1996.
  • [60] C. Meng, Y. Duan, C. He, D. Wang, X. Fan, and Y. Zhang, “mmplace: Robust place recognition with intermediate frequency signal of low-cost single-chip millimeter wave radar,” arXiv preprint arXiv:2403.04703, 2024.
  • [61] C. Yuan, J. Lin, Z. Zou, X. Hong, and F. Zhang, “Std: Stable triangle descriptor for 3d place recognition,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 1897–1903.
  • [62] S. Fu, Y. Duan, Y. Li, C. Meng, Y. Wang, J. Ji, and Y. Zhang, “Crplace: Camera-radar fusion with bev representation for place recognition,” arXiv preprint arXiv:2403.15183, 2024.
  • [63] V. J. Katz, “A history of mathematics: An introduction,” (No Title), 1998.
  • [64] C. F. Gauss, Disquisitiones generales circa superficies curvas.   Typis Dieterichianis, 1828, vol. 1.
  • [65] Á. González, “Measurement of areas on a sphere using fibonacci and latitude–longitude lattices,” Mathematical Geosciences, vol. 42, pp. 49–64, 2010.
  • [66] J. L. Bentley, “Multidimensional binary search trees used for associative searching,” Communications of the ACM, vol. 18, no. 9, pp. 509–517, 1975.
  • [67] J. Zhang, M. Kaess, and S. Singh, “On degeneracy of optimization-based state estimation problems,” in 2016 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2016, pp. 809–816.
  • [68] X. Liu, Z. Liu, F. Kong, and F. Zhang, “Large-scale lidar consistent mapping using hierarchical lidar bundle adjustment,” IEEE Robotics and Automation Letters, vol. 8, no. 3, pp. 1523–1530, 2023.
  • [69] Z. Liu and F. Zhang, “Balm: Bundle adjustment for lidar mapping,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 3184–3191, 2021.
  • [70] F. Dellaert, M. Kaess et al., “Factor graphs for robot perception,” Foundations and Trends® in Robotics, vol. 6, no. 1-2, pp. 1–139, 2017.
  • [71] E. B. Dam, M. Koch, and M. Lillholm, Quaternions, interpolation and animation.   Citeseer, 1998, vol. 2.
  • [72] Z. Wang, Y. Liu, Y. Duan, X. Li, X. Zhang, J. Ji, E. Dong, and Y. Zhang, “Ustc flicar: A sensors fusion dataset of lidar-inertial-camera for heavy-duty autonomous aerial work robots,” The International Journal of Robotics Research, vol. 42, no. 11, pp. 1015–1047, 2023.
  • [73] Q. Li, X. Yu, J. P. Queralta, and T. Westerlund, “Multi-modal lidar dataset for benchmarking general-purpose localization and mapping algorithms,” arXiv preprint arXiv:2203.03454, 2022.
  • [74] P. Xiao, Z. Shao, S. Hao, Z. Zhang, X. Chai, J. Jiao, Z. Li, J. Wu, K. Sun, K. Jiang et al., “Pandaset: Advanced sensor suite dataset for autonomous driving,” in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC).   IEEE, 2021, pp. 3095–3101.
  • [75] J. Geyer, Y. Kassahun, M. Mahmudi, X. Ricou, R. Durgesh, A. S. Chung, L. Hauswald, V. H. Pham, M. Mühlegg, S. Dorn, T. Fernandez, M. Jänicke, S. Mirashi, C. Savani, M. Sturm, O. Vorobiov, M. Oelker, S. Garreis, and P. Schuberth, “A2D2: Audi Autonomous Driving Dataset,” 2020. [Online]. Available: https://www.a2d2.audi
  • [76] L.-T. Hsu, N. Kubo, W. Wen, W. Chen, Z. Liu, T. Suzuki, and J. Meguro, “Urbannav: An open-sourced multisensory dataset for benchmarking positioning algorithms designed for urban areas,” in Proceedings of the 34th International Technical Meeting of the Satellite Division of The Institute of Navigation (ION GNSS+ 2021), 2021, pp. 226–256.
  • [77] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” in Proceedings of the 1st Annual Conference on Robot Learning, 2017, pp. 1–16.
  • [78] T. Sattler, W. Maddern, C. Toft, A. Torii, L. Hammarstrand, E. Stenborg, D. Safari, M. Okutomi, M. Pollefeys, J. Sivic et al., “Benchmarking 6dof outdoor visual localization in changing conditions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8601–8610.
  • [79] R. Hartley, J. Trumpf, Y. Dai, and H. Li, “Rotation averaging,” International journal of computer vision, vol. 103, no. 3, pp. 267–305, 2013.
  • [80] R. B. Rusu, N. Blodow, and M. Beetz, “Fast point feature histograms (fpfh) for 3d registration,” in 2009 IEEE international conference on robotics and automation.   IEEE, 2009, pp. 3212–3217.
  • [81] H. Yang, J. Shi, and L. Carlone, “Teaser: Fast and certifiable point cloud registration,” IEEE Transactions on Robotics, vol. 37, no. 2, pp. 314–333, 2020.
  • [82] Y. Li, Y. Wang, C. Meng, Y. Duan, J. Ji, Y. Zhang, and Y. Zhang, “Farfusion: A practical roadside radar-camera fusion system for far-range perception,” IEEE Robotics and Automation Letters, 2024.
  • [83] H. Ren, S. Zhang, S. Li, Y. Li, X. Li, J. Ji, Y. Zhang, and Y. Zhang, “Trajmatch: Toward automatic spatio-temporal calibration for roadside lidars through trajectory matching,” IEEE Transactions on Intelligent Transportation Systems, 2023.