This repository reimplements the line/plane odometry (based on LOAM) of LIO-SAM with CUDA. Replacing pcl's kdtree, a point cloud hash map (inspired by iVox of Faster-LIO) on GPU is used to accelerate local map building, 5-neighbour KNN search and non-linear optimization.
Modifications are as follow :
- The CUDA codes of the line/plane odometry are in src/cuda_plane_line_odometry.
- To use this CUDA odometry, the scan2MapOptimization() in mapOptimization.cpp is replaced with scan2MapOptimizationWithCUDA().
This repository reimplements the line/plane odometry in scan2MapOptimization() of mapOptimization.cpp with CUDA.
On my machine (Orin-NX-8GB, walking_dataset.bag, with OpenMP), original CPU version:
- average cost of extracting surrounding key frames is more than 30ms
- average cost of building local map is about 20ms
- average cost of KNN search and optimization is about 30ms
- average cost of all operations in one frame is about 85ms
This repository replaces pcl's kdtree with a point cloud hash map (inspired by iVox of Faster-LIO) implemented with CUDA.
Meanwhile, other parts of the line/plane odometry (jacobians & residuals etc) are also implemented with CUDA.
On my machine (Orin-NX-8GB, walking_dataset.bag), GPU version implemented by this project :
- average cost of extracting surrounding key frames is down to about 2.74ms
- average cost of incrementally updating local map is down to about 1.16ms
- average cost of one 5-neighbour KNN search is down to about 1.40ms
- average cost of all operations in one frame is down to about 21.56ms
The essential dependencies are as same as LIO-SAM
My Orin-NX-8GB's specific enviroment :
Before build this repo, some CMAKE variables in src/cuda_plane_line_odometry/CMakeLists.txt need to be modified to fit your enviroment :
set(CMAKE_CUDA_COMPILER /usr/local/cuda/bin/nvcc) # change it to your path to nvcc
set(CUDA_TOOLKIT_ROOT_DIR /usr/local/cuda/bin/nvcc) # change it to your path to nvcc
set(CMAKE_CUDA_ARCHITECTURES 87) # for example, if your device's compute capability is 6.2, then set this CMAKE variable to 62
# In my Orin-NX-8GB, this CMAKE variable is 87
The basic steps to compile and run this repo is as same as LIO-SAM.
Sequence | Orin-NX-8GB CPU | Orin-NX-8GB GPU | ||||||
---|---|---|---|---|---|---|---|---|
extract surrounding key frames | build kdtree | one frame | extract surrounding key frames | incrementally update hashmap | one KNN | one frame | speed-up | |
Walking | 34.65ms | 20.03ms | 84.95ms | 2.74ms | 1.16ms | 1.40ms | 21.56ms | 3.94x |
Campus (large) | 25.21ms | 19.34ms | 84.75ms | 1.71ms | 1.13ms | 1.49ms | 23.58ms | 3.59x |
2011_09_30_drive_0028 | 68.17ms | 22.04ms | 166.67ms | 11.70ms | 3.97ms | 2.59ms | 54.06ms | 3.08x |
This repository is a modified version of LIO-SAM, whose line/plane odometry is originally based upon LOAM.
The point cloud hash map on GPU is inspired by iVox data structure of Faster-LIO, and draws experience from kdtree_cuda_builder.h of FLANN.