Rotation Initialization and Stepwise Refinement for Universal LiDAR Calibration

Yifan Duan, Xinran Zhang, Guoliang You, Yilong Wu, Xingchen Li, Yao Li, Xiaomeng Chu, Jie Peng,
Yu Zhang, Jianmin Ji, and Yanyong Zhang* * The corresponding author.School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026, China {dyf0202, zxrr, glyou, elonwu, starlet, zkdly, cxmeng, pengjieb}@mail.ustc.edu.cn, {yuzhang, jianmin, yanyongz}@ustc.edu.cn.

Abstract

Autonomous systems often employ multiple LiDARs to leverage the integrated advantages, enhancing perception and robustness. The most critical prerequisite under this setting is the estimating the extrinsic between each LiDAR, i.e., calibration. Despite the exciting progress in multi-LiDAR calibration efforts, a universal, sensor-agnostic calibration method remains elusive. According to the coarse-to-fine framework, we first design a spherical descriptor TERRA for 3-DoF rotation initialization with no prior knowledge. To further optimize, we present JEEP for the joint estimation of extrinsic and pose, integrating geometric and motion information to overcome factors affecting the point cloud registration. Finally, the LiDAR poses optimized by the hierarchical optimization module are input to time synchronization module to produce the ultimate calibration results, including the time offset. To verify the effectiveness, we conduct extensive experiments on eight datasets, where 16 diverse types of LiDARs in total and dozens of calibration tasks are tested. In the challenging tasks, the calibration errors can still be controlled within $5cm$ and $1^{\circ}$ with a high success rate.

Index Terms:

Calibration, LiDAR, descriptor, hand-eye, and SLAM.

I Introduction

In recent years, the field of 3D vision technology has witnessed remarkable advancements, leading to the widespread adoption of depth sensors such as LiDARs for their ability to obtain accurate depth information [1, 2]. As the demand for enhanced performance continues to grow, there is a notable trend toward equipping autonomous systems, like self-driving vehicles and robotics, with multiple LiDARs. This approach brings numerous benefits, including improved robustness in self-localization and mapping, as well as the ability to prevent system crashes resulting from a single hardware failure [3]. Moreover, the utilization of multiple LiDARs enables the generation of a denser point cloud with a wider field of view (FoV), thereby significantly enhancing the overall perception performance [4]. Furthermore, by carrying various types of LiDARs, the entire perception system can leverage the unique characteristics of different LiDARs to enhance the performance. For example, conventional mechanical LiDARs provide a $360^{\circ}$ horizontal FoV, ensuring uniform spatial point cloud distribution [5]. Additionally, solid-state LiDARs are becoming increasingly popular in autonomous driving applications, offering denser point clouds within a limited FoV at a relatively lower cost [2, 6, 7].

To effectively utilize the benefits of multiple LiDAR, it is crucial to establish a common coordinate system by accurately transforming the data from their respective individual coordinate systems. This transformation between different LiDAR coordinate systems is parameterized by the extrinsic parameters, and the process of finding the optimal parameters is known as calibration, which plays a pivotal role in ensuring aligned and synchronized data from each sensor.

Refer to caption — Figure 1: The illustration of seven types of LiDAR in the simulated dataset. Mechanical LiDARs exhibit varying densities, solid-state LiDARs have different FOV, and mechanical and solid-state LiDARs have distinctly different point cloud distributions. This diversity presents new challenges for LiDAR calibration. This paper aims to introduce a universal, sensor-agnostic calibration method.

Calibration is a widely researched topic, and we mainly focus on the calibration of LiDARs in this paper.

Existing LiDAR calibration methods typically employ a coarse-to-fine framework to solve for the calibration parameters progressively. Finding an appropriate initial value is a prerequisite for obtaining precise parameters, which can be done by utilizing Hand-Eye method [8] or prior knowledge of specific scenarios [9]. After the initialization step, it’s commonly to perform a calibration refinement process to further improve the accuracy of calibration parameters. Specifically, the fundamental principle of the refinement step is to identify elements that can be correlated between LiDARs, such as similar features in the point cloud [10, 11] or their similar motion trajectories [12, 13]. Therefore, the refinement step is typically modeled as either a point cloud registration problem or a trajectory alignment problem. It is worth noting that the existing methods are usually tailored for specific LiDAR type, i.e., leveraging the prior knowledge of point cloud distribution or designing corresponding odometry module for pose estimation.

Although multi-LiDAR calibration has been extensively researched, it becomes increasingly challenging when it comes to the calibration between LiDARs with different types as shown in Fig. 1. First, the FoV of different LiDARs varies significantly. For instance, the Tele¹¹1https://www.livoxtech.com/tele-15 from Livox has only $14.5^{\circ}*16.2^{\circ}$ FoV, whereas the mechanical LiDAR²²2https://www.robosense.ai/en/rslidar/RS-Helios has a $360^{\circ}*30^{\circ}$ FoV. Additionally, due to the different scanning methods of LiDAR, the distribution of point clouds generated by different LiDARs for the same area also differ. Lastly, the overlap between two LiDARs may be limited, as a primary motivation for employing multiple LiDARs is to expand the FoV of the agent. These factors present new challenges to the coarse-to-fine LiDAR calibration framework. The different distribution and density of point clouds make it challenging to develop a universal initialization method that can consistently produce qualified initial values for any two LiDARs. Additionally, the minimal overlap and sparsity of the point cloud contribute to instability in the results during further precise extrinsic parameter refinement, even if the calibration converges.

To effectively address the aforementioned challenges, we propose a new calibration approach for multiple LiDARs of any types. To this end, the issue of parameter initialization needs to be addressed first. According to the experiments in Sec. VII-D2, we highlight the importance of initializing rotation parameters rather than translation parameters, given the paper’s concentration on single-platform (one vehicle or one robot) sensor calibration. Therefore, we first introduce a point cloud descriptor for rotation initialization, namely TERRA, i.e., a Three degree of freedom (DoF) Embedding of point cloud for Robust Rotation Alignment. We carefully design a distance function between multiple TERRAs to overcome the impact of translation on rotation estimation robustly. After initialization, to address poor registration accuracy caused by the sparsity of the point cloud and scene degeneracy, a method called JEEP for the Joint Estimation of Extrinsic and Poses is proposed to obtain optimized, usable extrinsics by integrating geometric and motion information. To further refine these parameters with all sequence data from LiDARs, we implement a straightforward multi-LiDAR SLAM approach and utilize the hierarchical optimization to optimize the pose of each point cloud frame, subsequently enhancing the accuracy of the extrinsics estimation. In this way, we also estimate the offset between the data reception time of the two LiDARs by a proposed time synchronization module.

We conduct experiments on eight datasets, including both simulated and real-world environments with 16 diverse types of LiDARs in total and dozens of LiDAR pairs. Results show our rotation initialization and stepwise refinement methods have strong universality for different calibration tasks. We also conduct comprehensive ablation experiments on the key parameters involved in the methods. For typical calibration tasks for mechanical LiDARs, the calibration error can be controlled within $1cm$ and $0.1^{\circ}$ and in tasks involving challenging factors such as partial overlap of point clouds, calibration errors can still be controlled within $5cm$ and $1^{\circ}$ with a high success rate.

In summary, our main contributions are as follows:

•

We propose a novel method for universal LiDAR calibration, regardless of the specific type or model of LiDAR.
•

We design a point cloud descriptor namely TERRA, i.e., a Three-DoF Embedding of point cloud for Robust Rotation Alignment, to perform the rotation parameters initialization between any two LiDARs effectively.
•

To estimate the extrinsic parameters accurately, we introduce a module called JEEP, named after the Joint Estimation of Extrinsic and Poses, integrating geometric and motion information to overcome factors affecting the point cloud registration. Following that, we utilize the hierarchical optimization module and time synchronization module to further enhance the accuracy of extrinsics and time offset estimation.
•

To fully verify the effectiveness of our method, we conduct extensive experiments on eight datasets, including both simulated and real-world environments, where 16 diverse types of LiDARs in total and dozens of LiDAR pairs are tested. The code and related dataset will be open sourced on https://github.com/yjsx/ULC.

II Related Work

II-A LiDAR-LiDAR Calibration

In this section, we primarily discuss the existing calibration methods for LiDAR-LiDAR. We also briefly introduce the calibration between other sensors due to the similarities among calibration tasks. We divide these methods into two groups: geometry-based methods and motion-based methods.

II-A1 Geometry-based methods

Multi-LiDAR calibration is essentially a process to estimate the relative positions of two LiDARs, typically addressed through point cloud registration [14]. Point cloud registration is a widely studied problem, including classic methods like point-point ICP [15], NDT [16], point-plane ICP [17], and generalized-ICP [18] etc. However, the accuracy of point cloud registration is influenced by numerous factors, such as the initial transformation, scene degeneracy, and point cloud bias and noise [19, 20, 21]. In calibration tasks where the initial values are missing and point cloud distribution varies, these factors become challenges in the solving process. Leveraging the typical characteristics of road scenes, CROON [9] initializes the extrinsic parameters by aligning the ground plane, followed by using an octree-based refinement method to optimize the LiDARs’ pose. Jiao et al. [10] and Ma et al. [11] carefully select the environment for calibration and introduce an automatic multi-LiDAR extrinsic calibration algorithm using corner planes. Brahayam et al. [22] also focus on street scenes, selecting the ground and commonly seen poles as reference objects. By matching and aligning, they solve the 6-DoF parameters in a stepwise manner. GMMCalib [23] implements a GMM-based joint registration algorithm to enhance the robustness of the calibration process.

In multi-LiDAR calibration tasks, a challenge arises when the overlapping areas between LiDARs are minimal, leading to insufficient constraints for point cloud registration. Nie et al. [24] developed a novel surface normal estimation method for two mechanically tilted placed LiDARs taking account of the uneven distribution of point cloud density and the edge information of planes. LiDAR-Link [25] employs a large-FoV LiDAR as a “LiDAR-Bridge” to connect other small-FoV LiDARs, subsequently introducing the LiDAR-Align method to collect point clouds across multiple scenes and utilize the Generalized-ICP in parallel spaces to obtain the extrinsics. For small-FoV solid-state LiDAR calibration, the challenge intensifies as there may be no overlapping areas between the LiDARs. To address this problem, MLCC [26, 27] generates overlapping areas by constructing a local map of each LiDAR and formulate the multiple LiDAR extrinsic calibration into a LiDAR bundle adjustment (BA) problem. Heng et al. [28] also employs the approach of constructing maps to calibrate LiDARs and millimeter-wave radars.

II-A2 Motion-based methods

Since calibration tasks typically involve sensors rigidly connected, the relative motion between two sensors can be established using the hand-eye method [29, 30]. Das et al. [12][13] estimate the motion trajectories of each sensor separately and recover the initial rotation and translation by aligning these trajectories, subsequently optimizing them using dual quaternion-based hand-eye calibration. Taylor et al. [31] propose a universal method for LiDARs, cameras, GPS, and IMU that sequentially solves for time offsets, rotational offsets, and translational offsets based on motion information. Hand-Eye calibration is also a method used to assign initial values to extrinsic parameters [10, 32, 8], which typically requires specific trajectories and sufficient accuracy.

In this paper, we emphasize and solely focus on the rotation initialization. In the refinement phase, we integrate the advantages of geometry-based and motion-based methods to overcome the challenges affecting point cloud registration. Furthermore, through backend optimization, we enhance the calibration accuracy.

II-B Point Cloud Descriptor

Extracting descriptor from point cloud is a common method for transforming point clouds into feature spaces, with numerous applications such as place recognition and pose estimation. We primarily focus on descriptors generated through projection and categorized them based on the projection method into two types: BEV (Bird’s Eye View) projection-based methods and spherical projection-based methods.

II-B1 BEV Projection Based Methods

Projecting point clouds onto the BEV plane is a common approach in descriptor generation. He et al. propose M2DP [33], which is generated by projecting a 3D point cloud to multiple 2D planes with density signatures and generating the descriptor of the 3D point cloud using the signatures. Scancontext [34] partitions the point cloud into bins on the x-y plane based on both azimuthal and radial directions. This process creates a two-dimensional matrix to facilitate global position recognition by finding the optimal rotation for rotational invariance. Their sequel, ScanContext++ [35], introduced a new spatial division strategy to achieve lateral invariance and utilized a sub-descriptor to speed up loop retrieval, showing promising results in urban environments. More recently, some methods [36, 37] use neural networks to generate BEV features and conduct end-to-end training to achieve place recognition.

II-B2 Spherical Projection Based Methods

PHASER [38] projects both source and target point cloud onto the sphere and performs a spherical Fourier transform to find the rotation that maximizes the spherical correlation. Similarly, Han et al. propose DiLO [39] implementing LiDAR odometry based on the spherical range image (SRI). Shan et al. [40] utilize the traditional BoW algorithm in visual place recognition for LiDAR-based global localization. This approach involves a projection of high-resolution LiDAR point cloud intensity data into a spherical visual format, facilitating the extraction of features based on ORB [41]. SphereVLAD++ [42] processes a query LiDAR scan by converting it into a panorama by spherical projection and encodes local orientation-equivalent features via spherical convolution for descriptor generation. OverlapNet [43] takes spherical projections of 3D LiDAR data as input and estimates the overlap between two range images by calculating all possible differences for each pixel. Bernreiteret et al. [44] employ spherical CNNs [45], [46], [47] to learn a multi-modal descriptor from spherical and visual images.

In this paper, we opt to project the point cloud onto the sphere, generating the descriptor TERRA, for robust rotation alignment. Compared to other methods, we employ a novel sampling method, diverging from the conventional angle-based sampling, which endows our descriptor with isotropy. This characteristic is crucial for 3-DoF rotation estimation. What’s more, with the help of a carefully designed distance calculation function, it can robustly estimate the rotation even in the presence of translational disturbances.

II-C Multi-LiDAR SLAM

LiDAR SLAM has seen substantial development in recent years. Inspired by LOAM [1], numerous variants improve it from different perspectives, including enhancing accuracy [48, 49, 50], feature extraction [51, 52], and feature selection [53, 54]. Compared to single-LiDAR SLAM or LiDAR-IMU fusion SLAM, multi-LiDAR SLAM is relatively a nascent field. M-LOAM [8] proposes a complete and robust solution for uncertainty-aware multi-LiDAR extrinsic calibration and SLAM. Lin et al. [3] develop a decentralized framework for simultaneous calibration, localization and mapping with multiple LiDARs to remove the single point of failure problem. Traj-LO [55, 56] regards the data of LiDAR as streaming points continuously captured at high frequency and applies the continuous-time trajectory estimation into multi-LiDAR SLAM. Locus 2.0 [57] employs a novel normals-based G-ICP formulation, an adaptive voxel grid filter and a sliding-window map approach to reduce memory consumption in multi-LiDAR SLAM. It has been successfully deployed on heterogeneous robotic platforms in the DARPA Subterranean Challenge.

In this paper, multi-LiDAR is not the primary issue to be addressed. However, to achieve precise extrinsic estimation, we implement a simple multi-LiDAR SLAM to provide the poses of each frame and optimize them using hierarchical optimization. Subsequently, the final calibration results and the time offsets between the LiDARs are outputted through time synchronization.

III Preliminaries

TABLE I: Nomenclature

Notation	Description	Type
$\mathbf{p}$	Point	$\mathbb{R}^{3}$
${}^{t}\mathcal{S}_{a}$	Scan from sensor “ $a$ ” at time $t$	$\{\mathbb{R}^{3}\}$
${}^{t}\mathcal{M}_{a}$	Map stacked by scans from “ $a$ ” at time $t$	$\{\mathbb{R}^{3}\}$
${}^{t}\mathbf{P}_{a}$	Pose of sensor “ $a$ ” at time $t$	$SE(3)$
${}^{t}_{t+1}\mathbf{T}_{a}$	Transformation of “ $a$ ” between $t$ and $t+1$	$SE(3)$
$\mathbf{C}$	Extrinsic parameters between “ $a$ ” and “ $b$ ”	$SE(3)$
$\mathbf{C}^{*}$	Result of calibration	$SE(3)$
$\mathbf{C}^{{}^{\prime}}$	Perturbation of calibration, i.e., $\mathbf{C}^{*}=\mathbf{C}^{{}^{\prime}}\cdot\mathbf{C}$	$SE(3)$
$\mathbb{F}$	TERRA, descriptor for rotation initializaiton	$\mathbb{R}^{N}$

The objective of calibration is to estimate the extrinsic parameters between two independent sensors, which can transform the data from each sensor into the same coordinate system. This is the foundation for many downstream tasks. We refer to one of these sensors as the “base” sensor, denoted by $a$ , and the other as the “target” sensor, denoted by $b$ . The related notations are listed in Tab. I, where $SE(3)$ denotes a rigid transformation, which means $\mathbf{P},\mathbf{T},\mathbf{C}$ are composed of a rotation matrix $\mathbf{R}\in SO(3)$ and a translation vector $\mathbf{t}\in\mathbb{R}^{3}$ . The relationships between the notations are detailed in the following three sections. In the following, we describe the processes of point cloud registration, LiDAR-based SLAM, and hand-eye calibration with the notations mentioned above.

III-A Point Cloud Registration

The pose transformations between point clouds are typically estimated through point cloud registration. As specified in [58], given two point clouds $\mathcal{P}_{0},\mathcal{P}_{1}$ , the goal of registration is to estimate the transformation matrix $\mathbf{T}^{*}$ to align $\mathcal{P}_{1}$ with $\mathcal{P}_{0}$ . The Iterative Closest Point (ICP) algorithm [15] is the most common solution to the registration problem, where $\mathbf{T}^{*}$ can be calculated by

\displaystyle\mathbf{T}^{*}

\displaystyle=\mathop{\arg\min}_{\mathbf{T}}\ \ \|d(\mathcal{P}_{0},\mathbf{T}% (\mathcal{P}_{1}))\|^{2},

(1)

where $d(\cdot,\,\cdot)$ denotes the loss function to measure the distance between the two inputs and $\mathbf{T}(\mathcal{P})$ means each point in $\mathcal{P}$ is translated and rotated to form a new point cloud, i.e., $\{\mathbf{R}\mathbf{p}+\mathbf{t}|for~{}\mathbf{p}~{}in~{}\mathcal{P}\}$ .

In this paper, we utilize a more robust point-to-plane ICP algorithm [17] due to the sparsity of the LiDAR point cloud, which results in the point-to-point ICP lacking sufficient correct correspondences to work properly. For two frame of point cloud $\mathcal{P}_{0},\mathcal{P}_{1}$ , the $d(\mathcal{P}_{0},\mathbf{T}(\mathcal{P}_{1}))$ of point-to-plane ICP is defined as:

d(\mathcal{P}_{0},\mathbf{T}(\mathcal{P}_{1}))=\sum_{p\in\mathcal{P}_{1}}(% \mathbf{T}(\mathbf{p})-p_{0})~{}\mathbf{n}_{0},

(2)

where $p_{0}$ is any point on the corresponding plane and $\mathbf{n}_{a}$ is the unit normal vector of the plane. The optimal pose estimation can be derived by solving the non-linear equation through Gauss-Newton method [59]. The Jacobian matrix can be solved by the chain rule as follows:

	$\displaystyle\mathbf{J}_{d}$	$\displaystyle=\frac{\partial\,d(\mathcal{P}_{0},\mathbf{T}(\mathcal{P}_{1}))}{% \partial\mathbf{T}\mathbf{p}}\frac{\partial\,\mathbf{T}\mathbf{p}}{\partial% \delta\xi}$		(3)
		$\displaystyle=\mathbf{n}_{0}\mathbf{J}_{p},$		(3)

where $\mathbf{J}_{p}$ can be estimated by applying left perturbation model with $\delta\xi\in\mathfrak{se}(3)$ :

	$\displaystyle\mathbf{J}_{p}=\frac{\partial\,\mathbf{T}\mathbf{p}}{\partial% \delta\xi}$	$\displaystyle=\lim_{\delta\xi\rightarrow 0}\frac{exp(\delta\xi)\mathbf{T}% \mathbf{p}-\mathbf{T}\mathbf{p}}{\delta\xi}$		(4)
		$\displaystyle=\begin{bmatrix}\mathbf{I}_{3\times 3}&-[\mathbf{T}\mathbf{p}]_{% \times}\\ \mathbf{0}_{1\times 3}&\mathbf{0}_{1\times 3}\end{bmatrix},$		(4)

where $[\mathbf{T}\mathbf{p}]_{\times}$ means the skew symmetric matrix of the transformed point. It is important to note that the accuracy of point cloud registration is often influenced by various factors such as the initial transformation, scene degeneracy, and point cloud bias and noise [19, 20, 21]. This paper primarily addresses registration failures caused by the first two factors.

III-B SLAM

In this paper, we also use some concepts commonly used in the SLAM, i.e., simultaneous localization and mapping, which plays a vital role in the navigation of autonomous systems, such as autonomous carscite and robots. The problem that SLAM aims to solve is to estimate the position of a sensor when each frame of data is received. This position is then used to build a map using the sensor data. In specific, for LiDAR-based SLAM, a sequence of point clouds $\{^{0}\mathcal{S},^{1}\mathcal{S},\dots,^{t}\mathcal{S}\}$ is received at consecutive time intervals, and the poses $\mathbf{P}$ at each time is estimated using point cloud registration. The map is then constructed as follow:

{}^{t}\mathcal{M}=\bigcup_{i\in[0,t]}~{}^{i}\mathbf{P}(^{i}\mathcal{S}),

(5)

where the pose is calculated by

{}^{i}\mathbf{P}=^{0}_{i}\mathbf{T}=^{0}_{1}\mathbf{T}~{}^{1}_{2}\mathbf{T}% \dots^{i-1}_{i}\mathbf{T}.

III-C Hand-Eye Method

Hand-eye calibration is originally a method of calibrating the end-effector of the robot and the camera fixed on it. The position changes $\mathbf{A}\in SE(3)$ of end-effector are obtained through robot kinematics, meanwhile the corresponding position changes $\mathbf{B}\in SE(3)$ of the camera are obtained through pattern pose estimation. The transformation $\mathbf{A}$ and $\mathbf{B}$ conform to the following equation:

\mathbf{A~{}X}=\mathbf{X~{}B},

(6)

where the transformation between the end-effector and the camera is denoted as $\mathbf{X}$ .

In our calibration task, since the base sensor and the target sensor are also rigidly connected, their motion conforms to the following equation as well:

^{t}_{t+1}\mathbf{T}_{a}~{}\mathbf{C}=\mathbf{C}~{}^{t}_{t+1}\mathbf{T}_{b}.

(7)

Please note that in the above equation, the ${}^{t}_{t+1}\mathbf{T}_{a},^{t}_{t+1}\mathbf{T}_{b}$ and $\mathbf{C}$ are all unknown variables, which need to be calculated by the registration mentioned before.

IV System Overview

The objective of this study is to calibrate two LiDARs without any prior knowledge of the initial extrinsic parameters, the point cloud distribution, or the field of view. To address this challenge, we design a universal LiDAR calibration method consisting of four main steps as shown in Fig. 2. Firstly, we develop a versatile rotation initialization method based on the proposed TERRA which is detailed in Sec. V. Secondly, building upon the initial rotation, we use JEEP to jointly estimate the extrinsic parameters and poses with consecutive frames, yielding refined extrinsic parameters. The method is elaborated in Sec. VI-A. Finally, we implement a hierarchical optimization that not only reduces the calibration errors to the millimeter-level but also achieves time synchronization of the LiDARs, as described in Sec. VI-B and Sec. 9.

V Rotation Initialization

In this section, we propose a novel method for initializing the rotation part of the extrinsic parameter to ensure the convergence of subsequent steps. It is important to note that, when calibrating sensors on a single mobile platform, i.e., a car or a robot, the translation offset typically does not result in a considerable distance between point pairs which should be matched during registration. This is because the distance between sensors is limited by the size of the mobile platform, and we will conduct experiments to demonstrate it in Sec. VII-D2. However, the rotation offset can create a significant distance, potentially leading to registration failure. To address this problem, we simplify the initialization problem by focusing solely on estimating the 3-DoF rotation part. We will begin by presenting the newly proposed TERRA, a Three-DoF Embedding of point cloud for Robust Rotation Alignment. Subsequently, we will explain how TERRA is utilized to assess rotation and introduce a well-designed method to calculate the distance between two TERRAs to enhance the robustness. Finally, a two-step search method is designed to quickly identify the most suitable initial rotation value. For clarity, we employ the term “descriptor” interchangeably with “embedding” in this paper, following common usage.

V-A Design of TERRA

Extracting embeddings or descriptors is a common task, often used in place recognition or pose estimation tasks [60, 43, 61, 62]. However, these tasks often require the rotation invariance of the descriptors, in order to improve the algorithm’s robustness in dealing with diverse data. This requirement conflicts with our need to initialize rotation, which demands accurately reflecting changes in angles. A classic rotation-aware descriptor is scan context [34], which partitions the point cloud into bins on the $x-y$ plane according to both azimuthal and radial directions. Each bin is encoded with the maximum height of the segmented point cloud in the bin, i.e., the max value of $z$ . In the end, a matrix is extracted as the descriptor to represent the point cloud as shown in Fig. 3 and the rows and columns of the matrix represent different azimuths and radii, respectively. Shifting the columns of the original matrix to create the new matrix represents rotating the point cloud around the $z$ axis, which enables comparing the similarity of descriptors at different angles and thus get the most suitable rotation. However, this method can only search for the optimal angle in 1-DoF, while the 3-DoF estimation is necessary for the task of multi LiDAR calibration.

To solve this problem, inspired by globe in a gimbal, which can rotate on three axes to display the earth, we project the point cloud onto a sphere to generate descriptors instead of projecting onto the $x-y$ plane as previously done with the scan context. The sphere, being a 3D structure, intuitively supports 3-DoF rotation, similar to a gimbal. Additionally, we will not lose any information during the projection, that is, a position on the sphere corresponds to one and only one point in the point cloud. This is because the imaging principle of LiDAR dictates that the laser is unable to penetrate obstacles.

The next point of consideration is the selection of the most suitable data structure to represent the sphere. For spherical data, a readily conceivable technique is the application of the Mercator projection [63], which is a type of cartographic mapping method signifying a sphere’s surface into a plane. Upon application of Mercator projection, a 2D matrix can serve as the data structure for TERRA. However, on one hand, this method lacks isotropy, subject to considerable distortion toward the poles, resulting in a misrepresentation of relative size and distance of land masses as seen in world maps. On the other hand, the only reasonable rotation for world maps is around the Earth’s axis. In fact, Gauss’s Theorema Egregium [64] proves that a sphere’s surface cannot be represented on a plane without distortion. Based on the discussion, we can summarize that the optimal data structure should encompass two fundamental characteristics: (1) Isotropy, which implies that the descriptor should remain unbiased regardless of the position of the points when encoding them. (2) The capability to effectively perform rotational operations for comparing similarities under different rotations.

To meet the above requirements, we propose TERRA, a Three-DoF Embedding of point cloud for Robust Rotation Alignment. We conduct three steps to convert a frame of point cloud into the corresponding TERRA as shown in Fig. 4, i.e., spherical sampling, projection, and populating. Specifically, we employ a uniform spherical sampling method known as the Fibonacci lattice [65]. This method uniformly distributes $N$ points across the surface of a sphere with each point approximately covering the same area, which demonstrates the high isotropy of TERRA. The Fibonacci lattice sampling approach is detailed in Alg. 1, which takes an integer $N$ as input and returns a spherical point cloud, denoted as $\mathcal{FL}$ , consisting of $N$ uniformly distributed points over the sphere.

After preparing the point cloud $\mathcal{FL}$ , we can encode any given LiDAR frame to TERRA, which is a vector of length $N$ . In specific, the LiDAR frame is first normalized, i.e., projecting each point onto a unit sphere. Then, the closest point in $\mathcal{FL}$ for each point is efficiently searched by KD-tree [66]. LiDAR points corresponding to the same point in the $\mathcal{FL}$ are considered the smallest unit to extract TERRA, which means that the “characteristic” of these points forms one of the $N$ values in the TERRA. In this paper, we use the minimum distance from these covered points to the origin as the “characteristic” to populate TERRA, aiming to minimize the effects of occlusion. Finally, a descriptor of length $N$ is generated, as illustrated in the Fig. 4.

0: The number of points:

N

0: Sphere Points:

\mathcal{FL}

\phi\leftarrow\frac{\sqrt{5}+1}{2}

(golden ratio)

\mathcal{FL}\leftarrow\emptyset

3: for

i=0

N-1

y\leftarrow 1-\frac{2i}{N-1}

radius\leftarrow\sqrt{1-y^{2}}

\theta\leftarrow\frac{2\pi i}{\phi}

x\leftarrow radius\times\cos(\theta)

z\leftarrow radius\times\sin(\theta)

9: Append point

(x,y,z)

\mathcal{FL}

10: end for

Algorithm 1 Fibonacci lattice

V-B Rotation of the TERRA

After preparing the TERRA for each LiDAR frame, we need to design a rotation operation to search for the optimal rotation required. For the reason that each value in TERRA corresponds one-to-one with points in $\mathcal{FL}$ . Therefore, rotating the point cloud $\mathcal{FL}$ is equivalent to rotating TERRA.

Specifically, we rotate the point cloud $\mathcal{FL}$ by a certain rotation R, and the resulting rotated point cloud is denoted as $\mathcal{FL}^{\prime}$ . Similarly, KD-tree is employed to establish a one-to-one correspondence between point pairs in the rotated point cloud $\mathcal{FL}^{\prime}$ and the original $\mathcal{FL}$ . In this case, each pair consists of two points with different indices. By taking the value from the original index of TERRA and filling it into the corresponding point’s index position, we create a rotated TERRA. Essentially, we’re reorganizing the original TERRA values according to the new index alignment post-rotation, thus forming a rotated version of TERRA. Fig. 5 shows the process of $\mathcal{FL}$ rotation and rotated TERRA generation.

For a descriptor TERRA of length $N$ , the rotation process entails a time complexity of $O(N)$ for rotating the point cloud $\mathcal{FL}$ , and $O(NlogN)$ for matching points using a KD-tree, followed by an additional $O(N)$ for generating the updated descriptor. Thus, the total time for performing a rotation is $O(NlogN)$ , offering a substantial efficiency over sequentially rotating a LiDAR frame and then generating a new TERRA. Additionally, since each point independently searches for its match, the entire rotation process can be further accelerated through parallel computing.

V-C Distance between TERRA

We evaluate the distance between TERRAs to identify the most suitable rotation. The small distance between the TERRA indicates that the landmarks projected to sphere are in the correct corresponding relationship, meaning the correct rotation has been identified. An effective distance function typically needs to exhibit two key qualities. On one hand, it should show a monotonically increasing or decreasing trend within a local area, thus enabling the use of gradient descent to find the optimal solution. On the other hand, having fewer local optima in the function’s domain is beneficial as it facilitates the more efficient identification of the global optimum.

When the origins of two LiDARs coincide perfectly, the angles between landmarks projected onto the two unit spheres remain constant, because the positions of the two unit spheres are identical. In this ideal scenario, comparing TERRA to estimate rotation is theoretically correct. However, this assumption is not feasible in reality, as the translation between two LiDARs causes shifts in the projection positions of objects. As illustrated in Fig. 6, we abstract this effect into a task where a big circle (landmarks) colors two small circles (the unit sphere). In Fig. 6(a), the black dot, representing landmarks in the same direction as the translation, do not shift in projection position, whereas the red dot, representing landmarks perpendicular to the translation direction, show the maximum shift.

To address the impact of translation, we draw inspiration from the phenomenon that distant scenery moves slower in the field of view when traveling by train, which indicates that distant landmarks should have smaller shifts in their projection positions. To test this hypothesis, we change the radius of the big circle and quantitatively calculate the coloring differences between the two small circles as shown in Fig. 6(b). The results show that as the radius of the great circle increases, the coloring differences decrease.

Based on the observations, we define a mask $\mathbf{M}=\left[m_{i}\right]_{i=1}^{N}$ to indicate which positions are suitable for distance calculation for two TERRA $\mathbb{F}_{1}$ and $\mathbb{F}_{2}$ . $m_{i}$ is defined as:

m_{i}=\begin{cases}1,&\text{if }(\mathbb{F}_{1}^{i}>d_{1})\land(\mathbb{F}_{2}% ^{i}>d_{1})\land(|\mathbb{F}_{1}^{i}-\mathbb{F}_{2}^{i}|\leq d_{2})\\ 0,&\text{otherwise}\end{cases},

(8)

where $\mathbb{F}_{1}^{i}$ is the $i-th$ position of $\mathbb{F}_{1}$ and we use the parameter $d_{1}$ to filter out the distant landmarks from the LiDARs to reduce the interference caused by translation between LiDARs and $d_{1}$ is set to $20m$ in the practice. The parameter $d_{2}$ is used to select positions that may belong to the same object for subsequent calculations. $d_{2}$ can be adjusted according to the actual distance between LiDARs. In our experiments, it is set to $5m$ based on the typical size of vehicles.

Therefore, we define the distance calculation function as follows:

\displaystyle D(F_{1},F_{2})

\displaystyle=\frac{||(\mathbb{F}_{1}-\mathbb{F}_{2})\mathbf{M}^{T}||_{1}}{||% \mathbf{M}||_{1}^{\alpha}},

(9)

where $\alpha$ set to 2 is used to increase the influence of valid number on the distance, preventing scenarios with few correspondences but also minimal distances. For instance, even if there’s only one valid position with a short distance, the distance remains small. The smaller distance indicates greater similarity between the two TERRA, representing that the rotation is more likely to be close to the ground truth.

V-D Two-step Search Method

Despite the minor time expenditure for a single rotation, traversing all feasible rotations still demands an unacceptable amount of time, given the vastness of the rotation space. Therefore, we need to devise a search strategy that efficiently identifies the optimal rotation.

In the first step, we start by checking if the initial rotation will result in an overlap between the two point clouds using Eq. (8), that is, whether the length of mask $\mathbf{M}$ is not zero. If there is no overlap, the rotation space will be traversed with a significantly large step, e.g., $30^{\circ}$ , until a suitable initial rotation that produces an overlap area is found. This step is time-efficient. Starting from this initial rotation, we explore the surrounding rotation space with a large step size, e.g., $10^{\circ}$ , sampling among all rotations that lead to an overlap area and scoring each rotation using Eq. (9). The top ten rotations with the smallest distances will be input into the next stage.

In the second stage, building on the groundwork laid by the first, a heuristic search is employed to find improved solutions in the vicinity of each discrete rotation identified earlier. Specifically, in each area, we explore with a step size of $1^{\circ}$ , moving along the path of decreasing distance as calculated in Eq. (9), until a local optimum is reached.

VI Stepwise Refinement with Data Sequences

After obtaining a appropriate initial rotation value, we first integrate hand-eye calibration, using the initial frames of the sequences to obtain optimized extrinsic parameters. Next, to mitigate the impact of scene degeneracy, we employ a hierarchical optimization approach, continuously optimizing the extrinsic parameters during the multi-LiDAR mapping process. Finally, we provide a method for synchronizing the timing between LiDARs.

VI-A Joint Estimation of Extrinsic and Pose

As previously mentioned in Sec. III, we model the calibration problem as a point cloud registration problem. Through scan-to-scan registration [9], we form the most direct calibration method as follows:

\mathbf{C}^{*}=\mathop{\arg\min}_{\mathbf{T}}\ \ \|d(^{t}\mathcal{S}_{a},% \mathbf{T}(^{t}\mathcal{S}_{b}))\|^{2},\\

(10)

where each calibration takes the scans ${}^{t}\mathcal{S}_{a}\text{ and }^{t}\mathcal{S}_{b}$ from two LiDARs at the same time $t$ as input, and outputs the transformation $\mathbf{C}^{*}$ between them.

However, since only a single frame of data provides constraints for estimating transformation, the impact of noise and scene degeneracy becomes significant [8]. To mitigate these disadvantages, we propose JEEP for Joint Estimation of Extrinsic and Pose. By utilizing the motion principle of sensors connected by rigid links, we utilize multiple frames from both LiDARs to refine extrinsic parameters, while estimating the pose at different moments.

In detail, JEEP is divided into two phases: pose estimation phase and calibration phase as shown in Fig. 7. During the pose estimation phase, we utilize the equation of hand-eye calibration Eq. (7), combining the extrinsic parameters and the motion of the base LiDAR $a$ to derive the motion of the target LiDAR $b$ . This means that we can define the pose estimation problem as:

	$\displaystyle^{t}\mathbf{P}^{*}_{a}=$	$\displaystyle\mathop{\arg\min}_{\mathbf{T}}(\ \ \\|d(^{t-1}\mathcal{M}_{a},% \mathbf{T}(^{t}\mathcal{S}_{a}))\\|^{2}+$		(11)
		$\displaystyle\\|d(^{t-1}\mathcal{M}_{b},(^{t-1}\mathbf{C}^{-1}~{}\mathbf{T}~{}% ^{t-1}\mathbf{C}^{})(^{t}\mathcal{S}_{b}))\\|^{2}),$		(11)

where ${}^{t-1}\mathcal{M}_{a}\text{ and }^{t-1}\mathcal{M}_{b}$ are the maps built according to Eq. (5), and the pose of LiDAR $b$ ${}^{t}\mathbf{P}^{*}_{b}$ is replaced by ${}^{t-1}\mathbf{C}^{*-1}~{}^{t}\mathbf{P}^{*}_{a}~{}^{t-1}\mathbf{C}^{*}$ . At startup, the map uses the first frame as a substitute, i.e., ${}^{0}\mathcal{M}_{a}=^{0}\mathcal{S}_{a}$ and ${}^{0}\mathcal{M}_{b}=^{0}\mathcal{S}_{b}$ . The rotation part of ${}^{0}C^{*}$ is initialized using TERRA, and the translation part is set to $(0,0,0)$ . The Jacobian matrix for the first part is presented in Sec. III, and the second part is as follows:

	$\displaystyle\mathbf{J}_{p}$	$\displaystyle=\frac{\partial\,\mathbf{C}^{-1}\mathbf{T}\mathbf{C}\mathbf{p}}{% \partial\delta\xi}=\lim_{\delta\xi\rightarrow 0}\frac{\mathbf{C}^{-1}exp(% \delta\xi)\mathbf{T}\mathbf{C}\mathbf{p}-\mathbf{C}^{-1}\mathbf{T}\mathbf{C}% \mathbf{p}}{\delta\xi}$		(12)
		$\displaystyle=\begin{bmatrix}\mathbf{R}^{\mathbf{C}^{-1}}_{3\times 3}\\ \mathbf{0}_{1\times 3}\end{bmatrix}\begin{bmatrix}\mathbf{I}_{3\times 3}&-[% \mathbf{R}^{\mathbf{T}}\mathbf{R}^{\mathbf{C}}\mathbf{p}+\mathbf{R}^{\mathbf{T% }}\mathbf{t}^{\mathbf{C}}+\mathbf{t}^{\mathbf{T}}]_{\times}\\ \mathbf{0}_{1\times 3}&\mathbf{0}_{1\times 3}\end{bmatrix},$		(12)

where $\mathbf{R}^{C}$ is the rotation of ${}^{t-1}\mathbf{C}$ , and $\mathbf{R}^{\mathbf{T}}$ , $\mathbf{t}^{\mathbf{T}}$ are the rotation and translation of $\mathbf{T}$ .

In the calibration phase, we employ the following objective function to align two maps:

^{t}\mathbf{C}^{*}=\mathop{\arg\min}_{\mathbf{T}}\ \ \|d(^{t}\mathcal{M}_{a},% \mathbf{T}(^{t}\mathcal{M}_{b}))\|^{2}.\\

(13)

We iteratively execute these two processes until both ${}^{t}\mathbf{C}^{*}$ and ${}^{t}\mathbf{P}^{*}_{a}$ stabilize and converge. During this process, when ${}^{t}\mathbf{C}^{*}$ is inaccurate, estimating ${}^{t}\mathbf{P}^{*}_{a}$ is difficult to converge due to the incorrect position of ${}^{t}\mathcal{S}_{b}$ . Additionally, using a local map for the calibration introduces more constraints, making the point cloud registration in Eq. 13 more accurate than in Eq. 10. Exceeding a certain number of iterations will be considered a failure of JEEP.

However, repetitively executing this procedure for every frame yields little benefit and is impractical. On one hand, the iterative computation of each frame cannot meet real-time requirements, and the continuous growth in map size will further intensify this issue. On the other hand, due to the use of a map-to-map registration for calibration as specified in Eq. (13), the cumulative error in pose estimation inevitably leads to map deformation, which negatively impacts the accuracy of the calibration. Therefore, we restrict its execution to the initial phase of the data sequence as illustrated in Fig. 2, where it proves to be cost-effective and highly accurate. After JEEP, we develop a straightforward multi-LiDAR SLAM method using only the pose estimation phase to provide poses for subsequent steps.

VI-B Pose Refinement by Hierarchical Optimization

Despite using multi-frame data and incorporating motion constraints in JEEP, calibrating in a single environment remains risky due to the impact of scene degeneracy on point cloud registration accuracy [67, 8]. Collecting data across different environments by SLAM to jointly produce extrinsic parameters is advisable. However, errors in motion estimation can lead to incorrect calibration results, as analyzed in the previous section. Therefore, we employ a hierarchical optimization module to optimize the poses generated by multi-LiDAR SLAM, aiming to reduce pose errors and use all data collectively to estimate the extrinsics.

In the field of SLAM, a common method used for backend optimization involves using factor graphs to optimize all variables together, achieving global consistency. However, this approach typically optimizes only at the level of poses, overlooking the constraints introduced by the point cloud itself. Inspired by HBA [68], we employ a similar hierarchical optimization architecture here to achieve more accurate poses for each data frame from both LiDARs. Specifically, as shown in Fig. 8, we first construct a hierarchical structure for each LiDAR’s data by building submaps at various levels. For each LiDAR, scans are grouped according to a window size $w$ . A local LiDAR bundle adjustment (BA) [69] is performed in each local window using the initial pose estimated by the multi-LiDAR SLAM, optimizing the relative pose between each frame and the first frame in this window. Scans within the same window with the poses optimized by BA, form a local submap which serves as a frame in the next layer. This process can be repeatedly performed from the lower layer to the upper layer. Subsequently, the hierarchical structures of the two LiDARs are linked through the registration of different layer submaps. Finally, the poses of all frames and submaps are linked into a factor graph and solved using the Levenberg-Marquardt method with GTSAM [70].

In addition, since the BA process within each submap is independent, it is suitable to use parallel computing to accelerate the process. To achieve that, we maintain a thread pool and a task queue. When each submap needs to be stitched, the task is added to the task queue and the idle thread will retrieve tasks from the task queue.

VI-C Time Synchronization

After JEEP and hierarchical optimization, each LiDAR frame’s pose is as accurate as possible. However, as illustrated in Fig. 9, in the absence of time synchronization for the LiDAR pairs, the pose between two aligned frames from two LiDARs, denoted as ${}^{i}\mathbf{C}^{r}=^{i}\mathbf{P}_{a}^{-1}~{}^{i}\mathbf{P}_{b}$ here, is not the actual extrinsic parameters, denoted as ${}^{i}\mathbf{C}^{*}$ , between the LiDARs. Therefore, we propose a method for time synchronization to estimate the time offset between different LiDARs.

It’s worth noting that we assume a consistent frame rate for the LiDAR, meaning the time offset is a constant value. Moreover, between adjacent frames from the same LiDAR, the motion status remains consistent, allowing us to obtain the position at any given moment through interpolation.

More specifically, we hypothesize the time offset between two LiDARs as $\delta t$ and denote the interval of LiDAR capturing data as $t_{f}$ . The ratio between $\delta t$ and $t_{f}$ is denoted as $\alpha=\frac{\delta t}{t_{f}}$ . The pose bias caused by the time offset is denoted as:

{}^{i}\mathbf{T}_{b}^{\delta}=\alpha~{}^{i-1}\mathbf{P}_{b}^{-1}~{}^{i}\mathbf% {P}_{b},\\

(14)

where $\alpha$ means interpolating the pose. The translation part is linear interpolation under constant velocity assumption, while the rotation part is spherical linear interpolation (Slerp) [71]. Following that, the triangular relationship in the Fig. 9 can be expressed as:

{}^{i}\mathbf{C}^{*}~{}^{i}\mathbf{T}_{b}^{\delta}=^{i}\mathbf{C}^{r}.

(15)

Substituting ${}^{i}\mathbf{T}_{b}^{\delta}$ and $\mathbf{C}^{r}_{i}$ into Eq. 15, we get

{}^{i}\mathbf{C}^{*}=^{i}\mathbf{P}_{a}^{-1}~{}^{i}\mathbf{P}_{b}(\alpha~{}^{i% -1}\mathbf{P}_{b}^{-1}~{}^{i}\mathbf{P}_{b})^{-1},

(16)

where ${}^{i}\mathbf{C}^{*}$ is a function of $\alpha$ . We model the process of finding $\mathbf{C}^{*}$ and $\alpha$ as minimizing the variance of $\mathbf{C}^{*}$ , as $\mathbf{C}^{*}$ is considered constant over short periods once the time offset interference is excluded. The calculation process is as follows:

	$\displaystyle\alpha^{*}$	$\displaystyle=\mathop{\arg\min}_{\alpha}Var(\mathbb{C}^{*}\|\alpha),$		(17)
	$\displaystyle\mathbf{C}^{*}$	$\displaystyle=Mean(\mathbb{C}^{}\|\alpha^{}),$		(18)

where $\mathbb{C}^{*}=\{^{1}\mathbf{C}^{*},^{2}\mathbf{C}^{*},\cdots,^{i}\mathbf{C}^{% *}\}$ , and the $\mathbf{C}^{*}$ is the final output of the whole algorithm.

VII Experiments

To fully verify the effectiveness of our method, we conduct comprehensive experiments using both simulated and real-world datasets to evaluate the performance of the proposed methods compared to existing approaches. We first introduce the datasets used in our experiments in Sec. VII-A. In Sec. VII-B, we describe the quantitative evaluation metrics used in the subsequent experiments. In Sec. VII-C, the comparison between TERRA and five common methods demonstrates that our method provides significant robustness for diverse LiDAR pairs. Quantitative experiments are conducted on both simulated and real-world datasets, along with essential ablation studies. In Sec. VII-D, we test our extrinsic refinement methods JEEP, the hierarchical optimization module and the time synchronization module against two open-source calibration methods as well as the direct scan-to-scan method. Since the last two modules need to work together, we abbreviate them as HOTS. Please note that all registration processes are implemented using the widely adopted point-to-plane ICP method [17]. The parameters and their values used in our method are as shown in the Tab. II. Except for the ablation studies, all experiments are conducted according to this setup.

TABLE II: Parameters and Their Values Used in Our Method

Module	Description	Value
TERRA	$N$ , sphere sampling number	10000
	$d_{1}$ , used in distance calculation	20
	$d_{2}$ , used in distance calculation	5
	First-stage search stride	10
	Second-stage search stride	1
JEEP	Input frame number to JEEP	5
JEEP	Failure judgment threshold	10
Hierarchical optimization	$w$ , submap window size	5
Hierarchical optimization	Layer number	3

VII-A Datasets

TABLE III: Datasets for Evaluating the Proposed Method. 16 Different Types of LiDAR are Involved.

Dataset	LiDAR		Type	Self-Collected
Dataset	Spinning	Solid-State	Type	Self-Collected
COLORFUL	LiDAR-{16, 32, 64}	Livox-{Horizon, Mid40, Avia, Tele}	Simulated	✓
Campus-SS	RS-Helios	RS-LiDAR-M1, Livox-{Mid70, Avia}	Realistic	✓
Campus-BS	RS-{Bpearl $\times$ 2, Helios $\times$ 2}	/	Realistic	✓
USTC FLICAR [72]	Ouster OS0-128, Velodyne-{VLP-32C,HDL-32E}	Livox Avia	Realistic	✗
TIERS [73]	VLP-16, Ouster-{OS1-64, OS0-128}	Livox-{Horizon, Avia}	Realistic	✗
PandaSet [74]	Pandar64	PandarGT	Realistic	✗
A2D2 [75]	VLP-16 $\times$ 5	/	Realistic	✗
UrbanNav [76]	Velodyne-{VLP16, HDL-32E}, Lslidar C16	/	Realistic	✗

To demonstrate the versatility of our method, a variety of real-world and simulated datasets are utilized, encompassing 16 diverse types of LiDARs. The datasets used in our experiments are listed in Tab. III. The calibrated point clouds and the platforms are shown in Fig. 10. A brief description of each dataset is provided below:

•

COLORFUL: We collected a multi-LiDAR dataset in CARLA [77], an open urban driving simulator. This dataset is captured using three mechanical LiDARs and four different Livox LiDARs³³3https://www.livoxtech.com/, each displaying data in various colors as illustrated in Fig. 10(a), hence the name “colorful”. The dataset covers a duration of 120 seconds, with each sensor operating at a rate of 20 Hz. The simulated environment provides accurate relative extrinsic parameters among the sensors, which serve as the ground truth for the calibration task. Therefore, the quantitative experiments described below are primarily performed using this dataset.
•

Campus-SS We collected this dataset using a low-speed vehicle carrying 4 LiDARs on the campus of USTC. This dataset is used to investigate the performance of algorithms in calibrating various solid-state LiDARs, such as Livox models and the RS-LiDAR-M1⁴⁴4https://www.robosense.ai/en/rslidar/RS-LiDAR-M1. For this reason, we have named it Campus-SS. The ground truth for this dataset is established by manually providing initial values for point cloud registration, performing the calculation five times, and taking the average. The calibration scenes are manually selected to be suitable for calibration tasks, ensuring sufficient constraints in all directions.
•

Campus-BS We collected the dataset on an autonomous logistic vehicle on USTC campus, equipped with two 32-beam LiDARs (RS-Helios⁵⁵5https://www.robosense.ai/en/rslidar/RS-Helios) and two special LiDARs for near-field blind-spots detection (RS-Bpearl⁶⁶6https://www.robosense.ai/en/rslidar/RS-Bpearl). We use this dataset to investigate the performance related to blind-spots LiDARs. For this reason, we have named it Campus-BS. The method for obtaining the ground truth is the same as Campus-SS dataset.
•

USTC FLICAR [72]: This dataset, collected by Wang et al., used an aerial platform on a bucket truck carrying four LiDARs for aerial work. It’s important to highlight that this dataset features two vertically positioned LiDARs with a reduced overlapping FoV, introducing challenges for calibration. Meanwhile, this dataset features movement trajectories distinct from those found in autonomous driving datasets, including vertical movements.
•

TIERS [73]: The TIERS dataset, collected by Li et al., encompasses data captured by different types of LiDARs, including spinning LiDARs with varying resolutions (16, 64, and 128 beams) and solid-state LiDARs with two different scan patterns. The dataset offers a diverse collection of environments, including indoor, outdoor, urban roads, forests, and various other scenes.
•

PandaSet [74]: This dataset was collected by Xiao et al., equipped with one mechanical spinning LiDAR (Pandar64) and one forward-facing LiDAR (PandarGT) for autonomous driving scenarios.
•

A2D2 [75]: The A2D2 dataset was collected by Geyer et al. with five 16-beam LiDARs on highways, country roads, and cities in the south of Germany. This dataset represents a practical vehicle sensor layout strategy, utilizing five LiDARs to provide comprehensive surround sensor coverage.
•

UrbanNav [76]: The UrbanNav dataset, collected by Hsu et al. in Hong Kong with 3 mechanical LiDARs, focuses on urban canyon scenarios. In our experiments, it is primarily used to test the performance of online calibration in these typical urban driving situations.

VII-B Metrics

VII-B1 Accuracy

To measure the difference between the calibration results and the ground truth, we define two metrics to quantify the errors in the rotation and translation components.

The rotation error is calculated using the following formula with the reference to [78, 79]:

e_{rot}=\arccos((trace(\textbf{R}_{gt}^{-1}\textbf{R}_{est})-1)/2),

(19)

where $\textbf{R}_{est}$ and $\textbf{R}_{gt}$ respectively represent the estimated and ground truth values of the rotation. This formula calculates the angle between two rotation matrices in radians. The smaller the angle, the more accurate the estimation of rotation is.

The translation error is calculated using the formula that calculates the Euclidean distance in meters between two sensor positions, as follows:

e_{trans}=\|\textbf{t}_{est}-\textbf{t}_{gt}\|_{2},

(20)

where $\textbf{t}_{est}$ and $\textbf{t}_{gt}$ respectively represent the estimated and ground truth values of the translation. The smaller the distance, the more accurate the estimation of translation is.

VII-B2 Success rate

When calibration failures, the rotation and translation errors can be significantly large, indicating only the failure of this particular instance, not the accuracy of the algorithm, so averaging all errors is not an effective method for comprehensively evaluating the calibration algorithm. For better evaluation, we define “calibration success” to filter samples for computing accuracy and use success rate, abbreviated as $SR$ , to measure the robustness of algorithms. For rotation initialization, we define success as the rotation errors less than $10^{\circ}$ , which is. For the final outcome, success is defined as rotation errors less than $1^{\circ}$ and translation errors less than $10~{}cm$ . The $SR$ is calculated as the number of successful samples divided by the total number of samples.

VII-C Study of Rotation Initialization

In this section, we conduct experiments to evaluate the accuracy of the proposed TERRA for rotation initialization. We compare its performance with several commonly used methods, including the motion-based Hand-Eye method [8], the feature-based FPFH method with Teaser++ [80, 81], the geometric-based CROON method [9], and the PHASER method [38], which also project the LiDAR frame onto the sphere for global registration. In addition, we enhance the classic scancontext [34]. Specifically, we extract and align the largest planes considered as the ground from the LiDAR scans first, to estimate the roll and pitch angles. The detail of plane alignment can be found in [9]. Subsequently, we leverage scancontext [34] to calculate the yaw, thereby accomplishing the 3-DoF rotation initialization. This method is denoted as scancontext* for comparison.

We conduct comprehensive tests on the proposed rotation initialization method, e.g., TERRA, starting with quantitative assessments of eight various LiDAR pairs within COLORFUL dataset, and comparing the results with five commonly used algorithms. Subsequently, the method is tested across various real-world datasets, encompassing different scenes and sensor types, demonstrating the versatility of TERRA. Finally, we perform ablation studies on some of the key parameters.

VII-C1 Quantitative experiment in the simulated dataset

Given that the COLORFUL dataset encompasses the most diverse sensor types and precise ground truth, we select eight typical calibration tasks to test TERRA. The difficulty of the tasks increases from left to right in Fig. 11.

The experimental results demonstrate that our method achieves the highest accuracy among the six methods tested. The Hand-Eye method requires sufficient motion, which cannot be satisfied in our randomly picked sequences, leading to the algorithm failing to converge. The FPFH-based method fails in almost all tasks due to the different distributions of point clouds in different LiDARs, which makes it difficult to use the same set of parameters to extract similar features locally. For the same reasons, PHASER performs poorly in calibrations related to solid-state LiDARs. Similarly, the geometry-based CROON method performs well in mechanical-mechanical and solid-solid initialization but fails in cross-modal tasks. The scancontext* achieves a precision comparable to TERRA, but it requires a substantial ground area in the LiDARs’ FoV to align roll and pitch. This condition is satisfied in the sensor layout of the COLORFUL dataset but may not be fulfilled in all actual scenarios, as demonstrated in the subsequent section.

In contrast, the proposed TERRA demonstrates robustness across all calibration tasks. For simple tasks, the success rate approaches $100\%$ , and even in the two most challenging tasks, the success rate exceeds $60\%$ . In terms of accuracy, for the left four tasks, the variances of TERRA’s results is very small, indicating the stability of TERRA across different environments. Notably, for the 64-Horizon calibration task, although the variance is small, the mean error is not near zero. This phenomenon is caused by the search step and TERRA’s resolution. We also consider this an excellent completion of the initialization task, as this error can be further reduced through subsequent refinement step. Overall, our experimental results demonstrate the superiority of our TERRA method, which can provide accurate and reliable rotation initialization for a wide range of LiDAR pairs. Part of the schematic diagram for TERRA is shown in Fig. 12. For clearer viewing, the TERRA for the first LiDAR is represented by the outer sphere, while the second is illustrated by the inner sphere.

Dataset	Calibration Target	Scancontext*		TERRA
Dataset	Calibration Target	$e_{rot}$ (^∘) $\downarrow$	$SR$ (%) $\uparrow$	$e_{rot}$ (^∘) $\downarrow$	$SR$ (%) $\uparrow$
Campus1	RS-LiDAR-M1 LIVOX-AVIA	/	0	1.144	100
Campus2	RS-Helios(top) RS-Helios(back)	7.491	46.46	1.521	100
Campus2	RS-Helios(top) RS-Bpearl(right)	/	0	2.243	100
USTC FLICAR	Ouster OS0-128 VLP-32C	/	0	3.178	63.54
TIERS	LIVOX Horizon LIVOX AVIA	2.643	66.67	2.018	100
	OS-1 LIVOX AVIA	4.84	21.73	1.918	100
	OS-0 OS-1	4.008	67.57	0.5819	100
Pandaset	Pandar64 PandarGT	5.399	92.40	1.438	93.67
A2D2	frontcenter frontleft	/	0	8.726	61.37
UrbanNav	VLP16 Lslidar C16	/	0	3.785	67.78

•

* For A2D2 dataset, the “frontcenter” and “frontleft” is the positions of the LiDARs, which are both VLP-16.

TABLE IV: The rotation initial result of TERRA in the real-world dataset.

VII-C2 Quantitative experiment in the real world

To verify the algorithm’s performance in real-world scenarios, we select ten representative LiDAR pairs from seven datasets as shown in Tab. IV. In this section, we compare our method with the best-performing method, Scancontext*, in last part.

Due to the variety of installation methods of LiDARs, the assumption of a large planar surface in LiDAR scan does not hold, resulting in the complete failure of Scancontext* in some scenarios, with a success rate of zero. In contrast, the proposed TERRA still achieves stable and successful rotational initialization across most of scenarios without making any assumptions about the environment. In most tasks, the rotation error is within 5 cm. In the A2D2 dataset, due to severe distortions of the point cloud caused by motion, the error is relatively larger, but the success rate still exceeds $60\%$ . What’s more, it’s worth noting that TERRA requires the input of two point clouds to have overlapping areas, but this does not mean that only LiDARs with overlapping areas can be calibrated. Similar to the approach [26] designed for non-overlapping LiDAR calibration, if a map is constructed for each LiDAR and then the overlapping local maps are input into the TERRA, it can still work effectively.

Fig. 13 presents several challenging tasks. The LiDAR pairs showcased in USTC FLICAR, A2D2, and UrbanNav are all positioned with the LiDARs tilted, resulting in minimal overlapping areas between the two LiDARs. Additionally, the unique hemispherical FoV of the RS-Bpearl in the Campus-BS dataset presents significant challenges.

VII-C3 Ablation study

In this section, we test some key parameters affecting the performance of TERRA, including the translation distance between LiDARs, the number of spherical sampling points, and the stride during the search.

The impact of translation: As TERRA can only initialize rotation, the translation component introduces an error into the distance calculation in Eq. (9). Within the COLORFUL dataset, for the same calibration tasks, i.e., 64-32 and 64-Horizon, we introduce translations ranging from 0 to 8 meters in both the direction parallel and perpendicular to the vehicle’s movement, thereby collecting 9 additional data sequences. The 0-meter translation implies the same location of the two LiDARs, which is impractical in reality but feasible in a simulation environment. Additionally, we conduct comparative experiments by selecting different values for $d_{1}$ in the Eq. (8) to mitigate translation effects. The quantitative experimental results are illustrated in the Fig. 14. With a fixed $d_{1}$ , the success rate of initialization decreases as the translation distance increases. When the translation distance is fixed, choosing $d_{1}$ values of $0m$ and $10m$ results in significantly poorer algorithm performance due to the disruption caused by nearby objects, compared to other values. Conversely, opting for larger values, such as $50m$ or $60m$ , the threshold becomes too high, reducing the number of available values in the distance calculation and thereby weakening the algorithm. After comprehensive evaluation, we fix $d_{1}$ at $20m$ for our algorithm.

TABLE V: The impact of sampling number

N

on the TERRA’s precision

$N$		100	500	1000	3000	10000
Resolution(^∘)		18.98	8.70	6.10	3.55	2.76
64-32	$e_{rot}$ (^∘) $\downarrow$	6.204	2.953	2.176	2.061	0.935
64-32	$SR$ (%) $\uparrow$	13.08	69.4	100	100	100
64-Horizon	$e_{rot}$ (^∘) $\downarrow$	4.978	6.469	4.835	3.542	2.743
64-Horizon	$SR$ (%) $\uparrow$	5.70	98.65	99.66	100	100
64-Tele	$e_{rot}$ (^∘) $\downarrow$	\	\	\	5.183	4.995
64-Tele	$SR$ (%) $\uparrow$	0	0	0	26.17	45.6

The impact of sampling number: The sample number $N$ of Fibonacci lattice determines the resolution of TERRA. We use 64-32, 64-Horizon and 64-Tele calibration tasks as examples to illustrate the performance of TERRA at varying resolutions. As demonstrated in Tab. V, the precision decreases as the resolution reduces. However, because the 32-beam LiDAR and Horizon has a wider FoV, it can still achieve acceptable success rate even at low resolutions. Meanwhile, Tele has only a $14.5^{\circ}\times 16.2^{\circ}$ FoV. When TERRA’s resolution exceeds $16.2^{\circ}$ , it means that there is only one valid data in the entire descriptor, which will understandably lead to the algorithm’s failure.

The impact of search stride: Similar to the sampling number, the search stride also has an impact on performance. It’s worth noting that the stride we talk here is the stride in the first stage, not the heuristic search stride in the second stage. The reason being, in the second stage, we only use a small stride such as $1^{\circ}$ to search, which is still not particularly time-consuming. However, in the first stage, the stride will increase the search time cubically, so a stride as large as possible is essential. Yet, an exceedingly large stride may result in a failure to fall into the locally convex space close to the ground truth during traversal, leading to a convergence to incorrect results.

As shown in Fig. 15(a) and Fig. 15(b), we traverse the rotation space with a resolution of $10^{\circ}$ and calculate the corresponding distance for each rotation. Opacity signifies the value of the distance, with lesser opacity indicating a smaller distance. The small FoV of the Tele results in a small locally convex space, while in contrast, the locally convex space of the 64-32 is relatively large. In summary, for LiDARs like Tele, a large step size could likely cause the algorithm to miss local convex space, leading to failure. Therefore, in our algorithm, we use a $10^{\circ}$ step size, which is smaller than Tele’s FoV.

The impact of distance function: As depicted in Fig. 15, we compare the distance from Eq. 9 and the Euclidean distance (ED). The two tasks tested perfectly demonstrate the two beneficial properties of our designed distance function. Firstly, as shown in Fig. 15(a) and Fig. 15(c), our distance function has a clear gradient near the ground truth, which facilitates faster search. Additionally, for 64-Tele, although there are some local optima in the solution space shown in Fig. 15(b), the ground truth remains the global optimum. However, using ED makes the global optimum far from the ground truth, which is the worst case.

TABLE VI: Rotation Error(

e_{rot}(^{\circ})

), Translation Error(

e_{trans}(cm)

) and Success Rate(

SR(\%)

) of Different Approaches.
The initial value is ground truth with

2m

and

5^{\circ}

translation and rotation disturbance.
Rotational Error and Translation Error are only recorded when the experiment is successful (

SR

>0).

Target

CROON [9]

scan-scan

JEEP

MLCC [26]

HOTS

e_{rot}

\downarrow

e_{trans}

\downarrow

SR

\uparrow

e_{rot}

\downarrow

e_{trans}

\downarrow

SR

\uparrow

e_{rot}

\downarrow

e_{trans}

\downarrow

SR

\uparrow

e_{rot}

\downarrow

e_{trans}

\downarrow

SR

\uparrow

e_{rot}

\downarrow

e_{trans}

\downarrow

SR

\uparrow

LiDAR-64

LiDAR-32

0.032

1.269

25.63

0.176

4.911

20.84

0.007

0.229

100

0.007

0.127

100

0.006

0.144

100

COLORFUL

LiDAR-32

LiDAR-16

0.110

2.338

26.39

0.215

4.885

22.81

0.074

1.378

100

0.130

1.820

100

0.027

0.197

100

LIVOX-Horizon

LIVOX-Tele

0.199

5.779

0.830

0.133

5.351

1.740

0.061

4.486

49.12

0.026

6.048

100

0.030

1.786

100

LiDAR-64

LIVOX-Horizon

0.122

5.386

14.24

0.135

4.141

8.932

0.010

0.793

98.84

0.008

1.640

100

0.009

1.374

100

LiDAR-32

LIVOX-Avia

0.425

5.699

1.044

0.275

5.852

6.148

0.089

3.079

91.10

0.041

2.862

100

0.025

0.926

100

LiDAR-64

LIVOX-Tele

0.215

5.290

0.573

0.164

5.824

1.082

0.079

4.060

55.89

0.031

6.482

100

0.032

2.356

100

Campus-SS

RS-LiDAR-M1

LIVOX-AVIA

0.313

7.609

10.4

0.534

6.962

1.301

0.205

3.003

95.12

0.159

2.277

100

0.234

0.461

100

Campus-BS

RS-Helios(top)

RS-Helios(back)

0.325

2.623

96.34

0.284

2.215

100

0.349

1.555

100

RS-Helios(top)

RS-Bpearl(right)

0.815

7.524

0.975

0.820

7.285

50.19

0.428

3.46

100

0.6914

4.635

100

USTC FLICAR

Ouster OS0-128

VLP-32C

0.791

4.854

41.25

0.505

6.521

100

0.5123

4.151

100

TIERS

LIVOX Horizon

LIVOX AVIA

0.3867

7.021

13.56

0.433

6.601

8.861

0.307

8.051

88.69

0.117

3.656

100

OS-1

LIVOX AVIA

0.2255

6.83

8.108

0.804

7.158

3.436

0.201

3.032

97.50

0.150

5.109

100

0.103

1.248

100

OS-0

OS-1

0.9093

5.983

25.41

0.776

6.866

5.606

0.051

1.787

100

0.1815

4.196

100

Pandaset

Pandar64

PandarGT

0.142

4.318

15.69

0.069

5.32

4.167

0.143

3.018

100

0.025

1.355

100

0.1258

2.046

100

A2D2

frontcenter

frontleft

0.721

3.025

30.02

0.094

5.697

100

0.7225

2.9

100

UrbanNav

HDL-32E

Lslidar C16

0.9301

9.256

0.4926

0.772

2.708

0.218

0.748

9.215

0.3709

5.903

100

VII-D Study of Refinement

In this section, we conduct experiments to evaluate the accuracy of the proposed JEEP and HOTS for extrinsic refinement. We compare their performance with several commonly used methods, including CROON [9] and MLCC [26]. In addition to this, we also conduct tests on the direct scan-scan method. It is noteworthy that the classic LiDAR calibration algorithm M-LOAM [8], which can only calibrate mechanical LiDARs, is not suitable for our task setting, and therefore, it has not been included in our comparison.

VII-D1 Quantitative experiment

As indicated in the Tab. VI, we evaluate the extrinsic refinement methods on six types of LiDAR pairs of COLORFUL dataset and ten types of real-world datasets. We perturb the ground truth to serve as the input initial values, including a $5^{\circ}$ rotation and a $2m$ translation. Among the five methods, CROON and scan-scan requires single point cloud pairs as input. For JEEP, the number of input frames is set to five, with registration parameters consistent with those in scan-scan. Lastly, MLCC and HOTS utilize the entire sequence to generate the final result. Specifically, MLCC and HOTS require the trajectories of each LiDAR as input, which we estimate using the Multi-LiDAR SLAM described in Sec. VI-A, where the required extrinsic parameters are substituted with the ground truth disturbed by the average error of JEEP on the corresponding dataset.

As shown in the table on the left, due to the poor initial values, as well as the distribution of the point cloud and the degeneracy of the scene, both CROON and the direct scan-to-scan methods have low success rates, failing entirely on some challenging tasks. In contrast, after introducing the constraints of motion, the success rate of JEEP significantly improves. Even in the challenging tasks such as Campus-BS, USTC FLICAR, A2D2, and UrbanNav, we maintain a success rate of over 30%. The calibrated point clouds for these four tasks are displayed in Fig. 16.

With the use of all available data and backend optimization, MLCC shows enhanced performance. Benefiting from the hierarchical optimization design, our approach ensures better global consistency of the map, resulting in more accurate poses and extrinsic parameters.

VII-D2 Ablation study

In this section, we test the impact of initial value on JEEP, the impact of the number of frames input into JEEP and the impact of time synchronization.

The impact of initial value: To test the robustness of JEEP under various initial conditions, we sample rotation disturbances from $0$ to $12.5$ degrees and translation disturbances from $0$ to $5$ meters, conducting $6\times 6$ experiments for each LiDAR-pair. The heatmaps of rotation errors, translation errors, and success rates, as shown in Fig. 17, which demonstrate that JEEP is highly robust to inaccurate initial values, achieving an $80\%$ success rate even with disturbances of $10$ degrees and $5$ meters. Moreover, it is crucial to note that this experiment demonstrates that within a 5-meter range, translation disturbance doesn’t significantly impact the final results. However, when rotation disturbance exceeded $10^{\circ}$ , noticeable declines are observed in the performance of the refinement process. This phenomenon validates the correctness and necessity of performing only rotation initialization for single-platform calibration.

The impact of the frame number: The number of frames input into JEEP is an interesting parameter. As analyzed in Sec. VI-A, too few frames fail to provide sufficient constrains, while too many frames introduce cumulative errors of pose estimation to extrinsic calibration. The disturbance remain consistent with our main experiment, i.e., 2 meters and 5 degrees. We adjust the frame number from 2 to 12 to investigate the effects of varying the number of input frames. The rotational and translational errors and successful rates are shown in the Fig. 18. For tasks like 64-32 calibration, where the point cloud distribution is already uniform, fewer frames can achieve good results, and increasing the number of frames is not beneficial. However, for tasks like 64-Horizon and 64-Tele calibration, which inherently lack sufficient constraints, a certain number of frames is necessary to achieve satisfactory results.

The impact of time synchronization: To validate the effectiveness of the time synchronization introduced in Sec. 9, we reduce the frequency of the data in COLORFUL dataset from $20Hz$ to $10Hz$ . In specific, we remove data from two LiDARs at different times to maintain a $50ms$ interval between them. It is important to note that for $10Hz$ data, $50ms$ represents the maximum achievable time offset. We repeat the experiments on JEEP and HOTS according to the previous experimental setup, and the results are displayed in the Tab. VII. Due to the lack of time synchronization, errors caused by motion are introduced to calibration, resulting in JEEP’s success rate being significantly inferior to those previously shown in Tab. VI. In fact, these successful cases, accounting for about $20\%$ of the total, are attributed to the fact that approximately $20\%$ of the data collection period involves minimal movement. Therefore, the specific values of $e_{rot}$ and $e_{trans}$ are not meaningful in explaining the calibration performance in this context. Alternatively, we use extrinsic parameters with disturbances of $1^{\circ}$ and $10cm$ for the multi-LiDAR SLAM, and further optimize the output pose using hierarchical optimization.

After applying hierarchical optimization, a noticeable improvement is achieved, yielding results comparable to previous ones. Furthermore, the estimated time offsets are close to the ground truth of $50ms$ , demonstrating that our method can achieve millisecond-level time synchronization accuracy. Fig. 19 illustrates the relationship between the sampled time offset and extrinsic variance, translation error, and rotation error. It is observed that both translation and rotation errors reach their minimum values around $-50ms$ , and the variance shows a positive correlation with them. This indicates the validity of using variance to infer the true extrinsic parameters.

TABLE VII: Experiments on dataset with 50ms time offset.

Target		64-32	64-Horizon
JEEP	$e_{rot}(^{\circ})$ $\downarrow$	0.052	0.033
	$e_{trans}(cm)$ $\downarrow$	0.738	0.894
	$SR$ (%) $\uparrow$	20.69	20.23
HOTS	$e_{rot}(^{\circ})$ $\downarrow$	0.061	0.022
	$e_{trans}(cm)$ $\downarrow$	5.508	3.148
	$SR$ (%) $\uparrow$	100	100
Time Offset(ms)		-37	-42

VIII Discussion and Future Work

In this section, we discuss the limitations of the proposed methods and list some potential future work.

VIII-A TERRA

The previous experiments confirm the robustness of TERRA in various scenarios and tasks. However, TERRA is not a descriptor that can be universally applied across all scenarios. In the design of TERRA, we enhance its robustness against translation between two LiDARs by filtering nearby objects. This implies that in indoor environments where the translation between two LiDARs is comparable to the room size, TERRA is likely to underperform. Additionally, for other calibration tasks, such as roadside LiDARs calibration [82, 83], where the translation is on the same scale as the scanning radius of the LiDARs, TERRA also fails to function properly.

However, these limitations also provide new inspiration. The generation process of TERRA fundamentally involves uniformly down-sampling the point cloud by angle, which means that TERRA does not lose the 3D structural information of the point cloud. Currently, TERRA is designed for estimating the rotation between two LiDARs, but with the appropriate loss design, TERRA still has the potential application on 6-DoF pose estimation.

VIII-B Refinement

In the first step of refinement, we employ JEEP effectively addressing registration failures caused by minimal overlapping areas or sparsity of point clouds by alternately estimating extrinsic parameters and poses. This method also offers the advantage of determining the success of calibration by monitoring the iteration count. Specifically, when the calibration parameters closely approximate the ground truth, the pose estimation process using data from both LiDARs typically converges easily. Furthermore, the accuracy of the incrementally constructed local map, facilitated by precise pose estimation, promotes convergence in the calibration step. Ideally, a single cycle of pose estimation and calibration may suffice. In this case, we can affirm the success of the calibration and advance to the subsequent phase of accelerated multi-LiDAR SLAM.

Although calibrating during motion offers the aforementioned benefits, the effectiveness is undoubtedly influenced by the motion despite the used backend optimization steps. Firstly, the obvious effect is that motion causes distortion in the point cloud, which is detrimental to point cloud registration. Additionally, due to imperfect timestamp alignment, the estimated extrinsic parameters include components of motion as shown in Fig. 9. Although we reduce this effect through time synchronization steps, our practice of interpolating motion trajectories based on the assumption of the constant velocity is still not precise. A direct solution is to feed the algorithm with point clouds collected in a stationary state from different locations, similar to [24, 25]. However, this approach still requires a fixed calibration data collection step, unlike our approach, which is applicable to any data segment. Furthermore, we intend to adopt the continuous-time SLAM approach [55, 49] in the future, treating point cloud sequences as a continuous point stream rather than temporally discrete frame, which allows for more precise modeling of continuous-time trajectories.

IX Conclusion

In this paper, we design a universal LiDAR calibration method including four core steps, resolving the calibration challenges of initialization and refinement brought by different FoV and point cloud distributions due to the variety of LiDAR types. Firstly, we design a universal spherical descriptor TERRA to initialize the rotation part of the extrinsics without any prior knowledge. Following this, a method called JEEP, for the joint estimation of extrinsic and poses, is proposed to achieve optimized, usable extrinsics. To further refine these parameters with additional data, we implement a straightforward multi-LiDAR SLAM approach and utilize the hierarchical optimization to optimize the pose of each point cloud frame, subsequently enhancing the estimation of the extrinsics and time offsets. We believe our method offers a universal solution for LiDAR calibration, unaffected by the calibration environment or the type of LiDAR. This will provide a convenient and precise calibration tool for downstream tasks, allowing them to focus more on their primary objectives without being influenced by preliminary tasks.

References

[1] J. Zhang and S. Singh, “LOAM: Lidar Odometry and Mapping in Real-time.” in Robotics: Science and Systems, vol. 2, no. 9, 2014, pp. 1–9.
[2] J. Lin and F. Zhang, “Loam Livox: A fast, Robust, High-Precision LiDAR Odometry and Mapping Package for LiDARs of Small FoV.” in International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 3126–3131.
[3] J. Lin, X. Liu, and F. Zhang, “A Decentralized Framework for Simultaneous Calibration, Localization and Mapping with Multiple LiDARs,” in International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 4870–4877.
[4] H. Hu, Z. Liu, S. Chitlangia, A. Agnihotri, and D. Zhao, “Investigating the Impact of Multi-LiDAR Placement on Object Detection for Autonomous Driving,” in Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022, pp. 2550–2559.
[5] C. Glennie and D. D. Lichti, “Static calibration and analysis of the velodyne hdl-64e s2 for high accuracy mobile scanning,” Remote sensing, vol. 2, no. 6, pp. 1610–1624, 2010.
[6] J. Lin, C. Zheng, W. Xu, and F. Zhang, “R²live: A robust, real-time, lidar-inertial-visual tightly-coupled state estimator and mapping,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 7469–7476, 2021.
[7] J. Lin and F. Zhang, “R3live: A robust, real-time, rgb-colored, lidar-inertial-visual tightly-coupled state estimation and mapping package,” arXiv preprint arXiv:2109.07982, 2021.
[8] J. Jiao, H. Ye, Y. Zhu, and M. Liu, “Robust odometry and mapping for multi-lidar systems with online extrinsic calibration,” IEEE Transactions on Robotics, vol. 38, no. 1, pp. 351–371, 2021.
[9] P. Wei, G. Yan, Y. Li, K. Fang, W. Liu, X. Cai, and J. Yang, “Croon: Automatic multi-lidar calibration and refinement method in road scene,” arXiv preprint arXiv:2203.03182, 2022.
[10] J. Jiao, Q. Liao, Y. Zhu, T. Liu, Y. Yu, R. Fan, L. Wang, and M. Liu, “A novel dual-lidar calibration algorithm using planar surfaces,” in 2019 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2019, pp. 1499–1504.
[11] F. Ma, S. Wang, and M. Liu, “An automatic multi-lidar extrinsic calibration algorithm using corner planes,” in 2022 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, 2022, pp. 235–240.
[12] S. Das, N. Mahabadi, A. Djikic, C. Nassir, S. Chatterjee, and M. Fallon, “Extrinsic calibration and verification of multiple non-overlapping field of view lidar sensors,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 919–925.
[13] S. Das, L. af Klinteberg, M. Fallon, and S. Chatterjee, “Observability-aware online multi-lidar extrinsic calibration,” IEEE Robotics and Automation Letters, vol. 8, no. 5, pp. 2860–2867, 2023.
[14] Z. Gong, C. Wen, C. Wang, and J. Li, “A target-free automatic self-calibration approach for multibeam laser scanners,” IEEE Transactions on Instrumentation and Measurement, vol. 67, no. 1, pp. 238–240, 2017.
[15] P. J. Besl and N. D. McKay, “Method for registration of 3-d shapes,” in Sensor fusion IV: control paradigms and data structures, vol. 1611. Spie, 1992, pp. 586–606.
[16] P. Biber and W. Straßer, “The normal distributions transform: A new approach to laser scan matching,” in Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003)(Cat. No. 03CH37453), vol. 3. IEEE, 2003, pp. 2743–2748.
[17] Y. Chen and G. Medioni, “Object modelling by registration of multiple range images,” Image and vision computing, vol. 10, no. 3, pp. 145–155, 1992.
[18] A. Segal, D. Haehnel, and S. Thrun, “Generalized-icp.” in Robotics: science and systems, vol. 2, no. 4. Seattle, WA, 2009, p. 435.
[19] A. Censi, “An accurate closed-form estimate of icp’s covariance,” in Proceedings 2007 IEEE international conference on robotics and automation. IEEE, 2007, pp. 3167–3172.
[20] M. Brossard, S. Bonnabel, and A. Barrau, “A new approach to 3d icp covariance estimation,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 744–751, 2020.
[21] A. De Maio and S. Lacroix, “Deep bayesian icp covariance estimation,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 6519–6525.
[22] B. Ponton, M. Ferri, L. König, and M. Bartels, “Efficient extrinsic calibration of multi-sensor 3d lidar systems for autonomous vehicles using static objects information,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022, pp. 6285–6292.
[23] I. Tahiraj, F. Fent, P. Hafemann, E. Ye, and M. Lienkamp, “Gmmcalib: Extrinsic calibration of lidar sensors using gmm-based joint registration,” arXiv preprint arXiv:2404.03427, 2024.
[24] M. Nie, W. Shi, W. Fan, and H. Xiang, “Automatic extrinsic calibration of dual lidars with adaptive surface normal estimation,” IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–11, 2022.
[25] J. R. Xu, S. Huang, S. Qiu, L. Zhao, W. Yu, M. Fang, M. Wang, and R. Li, “Lidar-link: Observability-aware probabilistic plane-based extrinsic calibration for non-overlapping solid-state lidars,” IEEE Robotics and Automation Letters, 2024.
[26] X. Liu and F. Zhang, “Extrinsic calibration of multiple lidars of small fov in targetless environments,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2036–2043, 2021.
[27] X. Liu, C. Yuan, and F. Zhang, “Targetless extrinsic calibration of multiple small fov lidars and cameras using adaptive voxelization,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1–12, 2022.
[28] L. Heng, “Automatic targetless extrinsic calibration of multiple 3d lidars and radars,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 10 669–10 675.
[29] R. Horaud and F. Dornaika, “Hand-eye calibration,” The international journal of robotics research, vol. 14, no. 3, pp. 195–210, 1995.
[30] K. Hausman, J. Preiss, G. S. Sukhatme, and S. Weiss, “Observability-aware trajectory optimization for self-calibration with application to uavs,” IEEE Robotics and Automation Letters, vol. 2, no. 3, pp. 1770–1777, 2017.
[31] Z. Taylor and J. Nieto, “Motion-based calibration of multimodal sensor extrinsics and timing offset estimation,” IEEE Transactions on Robotics, vol. 32, no. 5, pp. 1215–1229, 2016.
[32] J. Jiao, Y. Yu, Q. Liao, H. Ye, R. Fan, and M. Liu, “Automatic calibration of multiple 3d lidars in urban environments,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2019, pp. 15–20.
[33] L. He, X. Wang, and H. Zhang, “M2dp: A novel 3d point cloud descriptor and its application in loop closure detection,” in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2016, pp. 231–237.
[34] G. Kim and A. Kim, “Scan context: Egocentric spatial descriptor for place recognition within 3d point cloud map,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018, pp. 4802–4809.
[35] G. Kim, S. Choi, and A. Kim, “Scan context++: Structural place recognition robust to rotation and lateral variations in urban environments,” IEEE Transactions on Robotics, vol. 38, no. 3, pp. 1856–1874, 2021.
[36] L. Luo, S. Zheng, Y. Li, Y. Fan, B. Yu, S.-Y. Cao, J. Li, and H.-L. Shen, “Bevplace: Learning lidar-based place recognition using bird’s eye view images,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8700–8709.
[37] C. Fu, L. Li, L. Peng, Y. Ma, X. Zhao, and Y. Liu, “Overlapnetvlad: A coarse-to-fine framework for lidar-based place recognition,” arXiv preprint arXiv:2303.06881, 2023.
[38] L. Bernreiter, L. Ott, J. Nieto, R. Siegwart, and C. Cadena, “Phaser: A robust and correspondence-free global pointcloud registration,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 855–862, 2021.
[39] S.-J. Han, J. Kang, K.-W. Min, and J. Choi, “Dilo: Direct light detection and ranging odometry based on spherical range images for autonomous driving,” ETRI journal, vol. 43, no. 4, pp. 603–616, 2021.
[40] T. Shan, B. Englot, F. Duarte, C. Ratti, and D. Rus, “Robust place recognition using an imaging lidar,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 5469–5475.
[41] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: An efficient alternative to sift or surf,” in 2011 International conference on computer vision. Ieee, 2011, pp. 2564–2571.
[42] S. Zhao, P. Yin, G. Yi, and S. Scherer, “Spherevlad++: Attention-based and signal-enhanced viewpoint invariant descriptor,” IEEE Robotics and Automation Letters, vol. 8, no. 1, pp. 256–263, 2022.
[43] X. Chen, T. Läbe, A. Milioto, T. Röhling, O. Vysotska, A. Haag, J. Behley, and C. Stachniss, “Overlapnet: Loop closing for lidar-based slam,” arXiv preprint arXiv:2105.11344, 2021.
[44] L. Bernreiter, L. Ott, J. Nieto, R. Siegwart, and C. Cadena, “Spherical multi-modal place recognition for heterogeneous sensor systems,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 1743–1750.
[45] T. S. Cohen, M. Geiger, J. Köhler, and M. Welling, “Spherical cnns,” arXiv preprint arXiv:1801.10130, 2018.
[46] C. Esteves, C. Allen-Blanchette, A. Makadia, and K. Daniilidis, “Learning so (3) equivariant representations with spherical cnns,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 52–68.
[47] N. Perraudin, M. Defferrard, T. Kacprzak, and R. Sgier, “Deepsphere: Efficient spherical convolutional neural network with healpix sampling for cosmological applications,” Astronomy and Computing, vol. 27, pp. 130–146, 2019.
[48] H. Wang, C. Wang, C.-L. Chen, and L. Xie, “F-loam: Fast lidar odometry and mapping,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021, pp. 4390–4396.
[49] P. Dellenbach, J.-E. Deschaud, B. Jacquet, and F. Goulette, “Ct-icp: Real-time elastic lidar odometry with loop closure,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 5580–5586.
[50] I. Vizzo, T. Guadagnino, B. Mersch, L. Wiesmann, J. Behley, and C. Stachniss, “Kiss-icp: In defense of point-to-point icp–simple, accurate, and robust registration if done the right way,” IEEE Robotics and Automation Letters, vol. 8, no. 2, pp. 1029–1036, 2023.
[51] Y. Pan, P. Xiao, Y. He, Z. Shao, and Z. Li, “Mulls: Versatile lidar slam via multi-metric linear least square,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 11 633–11 640.
[52] T. Shan and B. Englot, “Lego-loam: Lightweight and ground-optimized lidar odometry and mapping on variable terrain,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018, pp. 4758–4765.
[53] Y. Duan, J. Peng, Y. Zhang, J. Ji, and Y. Zhang, “Pfilter: Building persistent maps through feature filtering for fast and accurate lidar-based slam,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022, pp. 11 087–11 093.
[54] J. Jiao, Y. Zhu, H. Ye, H. Huang, P. Yun, L. Jiang, L. Wang, and M. Liu, “Greedy-based feature selection for efficient lidar slam,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 5222–5228.
[55] X. Zheng and J. Zhu, “Traj-lo: In defense of lidar-only odometry using an effective continuous-time trajectory,” IEEE Robotics and Automation Letters, 2024.
[56] ——, “Traj-lio: A resilient multi-lidar multi-imu state estimator through sparse gaussian process,” arXiv preprint arXiv:2402.09189, 2024.
[57] A. Reinke, M. Palieri, B. Morrell, Y. Chang, K. Ebadi, L. Carlone, and A.-A. Agha-Mohammadi, “Locus 2.0: Robust and computationally efficient lidar odometry for real-time 3d mapping,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9043–9050, 2022.
[58] X. Huang, G. Mei, J. Zhang, and R. Abbas, “A comprehensive survey on point cloud registration,” arXiv preprint arXiv:2103.02690, 2021.
[59] J. E. Dennis Jr and R. B. Schnabel, Numerical methods for unconstrained optimization and nonlinear equations. SIAM, 1996.
[60] C. Meng, Y. Duan, C. He, D. Wang, X. Fan, and Y. Zhang, “mmplace: Robust place recognition with intermediate frequency signal of low-cost single-chip millimeter wave radar,” arXiv preprint arXiv:2403.04703, 2024.
[61] C. Yuan, J. Lin, Z. Zou, X. Hong, and F. Zhang, “Std: Stable triangle descriptor for 3d place recognition,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 1897–1903.
[62] S. Fu, Y. Duan, Y. Li, C. Meng, Y. Wang, J. Ji, and Y. Zhang, “Crplace: Camera-radar fusion with bev representation for place recognition,” arXiv preprint arXiv:2403.15183, 2024.
[63] V. J. Katz, “A history of mathematics: An introduction,” (No Title), 1998.
[64] C. F. Gauss, Disquisitiones generales circa superficies curvas. Typis Dieterichianis, 1828, vol. 1.
[65] Á. González, “Measurement of areas on a sphere using fibonacci and latitude–longitude lattices,” Mathematical Geosciences, vol. 42, pp. 49–64, 2010.
[66] J. L. Bentley, “Multidimensional binary search trees used for associative searching,” Communications of the ACM, vol. 18, no. 9, pp. 509–517, 1975.
[67] J. Zhang, M. Kaess, and S. Singh, “On degeneracy of optimization-based state estimation problems,” in 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2016, pp. 809–816.
[68] X. Liu, Z. Liu, F. Kong, and F. Zhang, “Large-scale lidar consistent mapping using hierarchical lidar bundle adjustment,” IEEE Robotics and Automation Letters, vol. 8, no. 3, pp. 1523–1530, 2023.
[69] Z. Liu and F. Zhang, “Balm: Bundle adjustment for lidar mapping,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 3184–3191, 2021.
[70] F. Dellaert, M. Kaess et al., “Factor graphs for robot perception,” Foundations and Trends® in Robotics, vol. 6, no. 1-2, pp. 1–139, 2017.
[71] E. B. Dam, M. Koch, and M. Lillholm, Quaternions, interpolation and animation. Citeseer, 1998, vol. 2.
[72] Z. Wang, Y. Liu, Y. Duan, X. Li, X. Zhang, J. Ji, E. Dong, and Y. Zhang, “Ustc flicar: A sensors fusion dataset of lidar-inertial-camera for heavy-duty autonomous aerial work robots,” The International Journal of Robotics Research, vol. 42, no. 11, pp. 1015–1047, 2023.
[73] Q. Li, X. Yu, J. P. Queralta, and T. Westerlund, “Multi-modal lidar dataset for benchmarking general-purpose localization and mapping algorithms,” arXiv preprint arXiv:2203.03454, 2022.
[74] P. Xiao, Z. Shao, S. Hao, Z. Zhang, X. Chai, J. Jiao, Z. Li, J. Wu, K. Sun, K. Jiang et al., “Pandaset: Advanced sensor suite dataset for autonomous driving,” in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC). IEEE, 2021, pp. 3095–3101.
[75] J. Geyer, Y. Kassahun, M. Mahmudi, X. Ricou, R. Durgesh, A. S. Chung, L. Hauswald, V. H. Pham, M. Mühlegg, S. Dorn, T. Fernandez, M. Jänicke, S. Mirashi, C. Savani, M. Sturm, O. Vorobiov, M. Oelker, S. Garreis, and P. Schuberth, “A2D2: Audi Autonomous Driving Dataset,” 2020. [Online]. Available: https://www.a2d2.audi
[76] L.-T. Hsu, N. Kubo, W. Wen, W. Chen, Z. Liu, T. Suzuki, and J. Meguro, “Urbannav: An open-sourced multisensory dataset for benchmarking positioning algorithms designed for urban areas,” in Proceedings of the 34th International Technical Meeting of the Satellite Division of The Institute of Navigation (ION GNSS+ 2021), 2021, pp. 226–256.
[77] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” in Proceedings of the 1st Annual Conference on Robot Learning, 2017, pp. 1–16.
[78] T. Sattler, W. Maddern, C. Toft, A. Torii, L. Hammarstrand, E. Stenborg, D. Safari, M. Okutomi, M. Pollefeys, J. Sivic et al., “Benchmarking 6dof outdoor visual localization in changing conditions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8601–8610.
[79] R. Hartley, J. Trumpf, Y. Dai, and H. Li, “Rotation averaging,” International journal of computer vision, vol. 103, no. 3, pp. 267–305, 2013.
[80] R. B. Rusu, N. Blodow, and M. Beetz, “Fast point feature histograms (fpfh) for 3d registration,” in 2009 IEEE international conference on robotics and automation. IEEE, 2009, pp. 3212–3217.
[81] H. Yang, J. Shi, and L. Carlone, “Teaser: Fast and certifiable point cloud registration,” IEEE Transactions on Robotics, vol. 37, no. 2, pp. 314–333, 2020.
[82] Y. Li, Y. Wang, C. Meng, Y. Duan, J. Ji, Y. Zhang, and Y. Zhang, “Farfusion: A practical roadside radar-camera fusion system for far-range perception,” IEEE Robotics and Automation Letters, 2024.
[83] H. Ren, S. Zhang, S. Li, Y. Li, X. Li, J. Ji, Y. Zhang, and Y. Zhang, “Trajmatch: Toward automatic spatio-temporal calibration for roadside lidars through trajectory matching,” IEEE Transactions on Intelligent Transportation Systems, 2023.