Kinect Registration
Problem
Description
The Microsoft Kinect is a RGBD camera that comes as a sensor
with the
Xbox 360. It effectively provides us a coloured point cloud with 3D
points. This project aimed at combining data from multiple Kinects,
that is to register the point clouds obtained from them.
This requires that we estimate the rotation and translation that
relates the coordinate frames of the individual Kinects.
Why do this?
We have identified 3 benefits:-
- Cover larger area
- Overcome occlusions
- RGBD Slam
The Kinect is definitely mobile in the third case, but not so in the
first two.
However, the third case has the added benefit of signifcant overlap
between consecutive frames.
A mobile Kinect demands that the R & T be estimated in real time.
In this project, we have assumed stationary kinects and hence the R
& T estimation is a one time initialization and need not be real
time.
Past Research
Point cloud registration has been studied in the context of matching
laser range scans.
Iterative Closest Points
- A popular algorithm in this field is ICP and its variants. ICP is
the iterative closest point algorithm.
It, however, requires large overlap between the point clouds.
Conditional Random Fields
- F. Ramos, D. Fox, and H. Durrant-Whyte.
CRF-matching: Conditional random fields for feature-based scan
matching. In Proc. of Robotics: Science and Systems (RSS), 2007.
The Conditional Random Fields based approach is, however, computation
expensive and not practical for point clouds with ~160k points. This
makes its application for Kinect Registration very difficult.
Image feature matching
- Peter Henry, Michael Krainin, Evan Herbst,
Xiaofeng Ren and Dieter Fox. RGB-D Mapping: Using Depth Cameras for
Dense 3D Modeling of Indoor Environments (Workshop). RSS Workshop on
Advanced Reasoning with Depth Cameras, 2010
Given two point clouds and the corresponding 2D images, one can find
image features such as SIFT and match these in the pixel space itself.
Using a sample consensus approach to choose good matches and
propogating these to matches in the point cloud lets us compute the R
& T required for the registration. A video showing the results of
their research is available here.
FPFH and PFH feature matching
- Rusu, Radu Bogdan., Blodow, Nico., and Beetz,
Michael, Fast Point Feature Histograms (FPFH) for 3D Registration,
ICRA(2009)
Computing FPFH features/PFH features in the point cloud itself and
matching them also provides the rigid transformation required.
Approaches Used
Chessboard corners detected by Opencv
Calibration Target
Though the initial
code base could handle more than 1 kinect, we tested with only two at a
time.
In the calibration target based approach, an asymmetric chessboard was
placed such that it was visible to both the kinects.
Opencv's chessboard detection code was used to detect it in the image
data available from the kinect.
Since, the chessboard was asymmetric the order in which the chessboard
points were detected was always the same.
This results in us obtaining immediate 1-1 correspondence between image
pixels.
The point cloud streamed by the OpenNI kinect driver has a one to one
correspondence with pixel points, with nan values filling those points
where the kinect failed to estimate the depth.
The image pixel correspondences across the two kinects, can now be
easily propograted to correspondences between 3D points.
With a sufficient number of correspondences we can compute the rigid
transformation required.
Results with calibration target
Pros
- Very accurate estimate of transform is obtained - the
chessboard points are detected robustly and accurately (sub pixel
accuracy).
- Fast
Cons
- Placing chessboard so as to be visible by all kinects is
difficult.
- Chessboard occupies a portion of the field of view.
The result overfits these points leading to a mis-alignment in other
points.
Everything does not fit smulgy because of error in kinect data.
Features need to be spread across the entire image.
Image Feature matching
Ransac filtered good matches of SIFT key points
Following the same approach as used by Dieter Fox et. al., we
found
SIFT features for the image points obtained.
Using euclidean distance as the distance metric and RANSAC as the
sample consensus approach, we choose corresponding image points that
are spatially aligned.
Here the spatial alignment was with respect to alighning the
corresponding 3D points.
The least error transformation was chosen as the desired rigid
transform.
We also experimented with 3D features namely, FHFP and PFH.
Good result using Image features
Bad result using Image features
Pros
- Features spread through out the field of view.
Cons
- Does not work for low overlap as correspondences are not good
enough.
Plane matching
In an attempt to improve the results, we wanted to use both shape
and
color information.
FPFH and PFH features use only shape information, and SIFT based
feature matching uses only color information.
The overall aim was to match pathes of point cloud rather than mere
points. For this matching both color (eg. color histogram) and shape
information could be used.
Since, working with general patches is very challenging, we started
with planes and hence I have titled this approach as plane matching.
This approach is novel and involves three steps:-
- Plane extraction
-
Plane extraction
Point cloud library implements a ransac based plane
extraction
algorithm.
This returns the plane that fits the largest number of points in the
supplied point cloud.
However, this does not discriminate based on the distance between the
points.
In other words, we get multiple patches of points lying on a plane
rather than one patch alone.
Using muliple patches as one plane at a time makes is difficult to
reply on color matching.
Also, not all of these patches may exist in the other kinect's point
cloud.
Hence, we tried to obtain only one patch per plane.
This was achieved by a recursive implemtation of spatial clustering of
points and plane fitting within clusters.
The recusion was required because a plane fit and plane extract
(pulling out of points that lie on that plane) operation, in general
breaks the cluster into multiple clusters some of which could be
planar.
- Plane description
- Each plane is described by its color histogram, which is
obained from
the RGB values corresponding to the points that lie on the plane.
The plane is also described by the coefficients that form its equation.
This constitutes the shape part of the description of a plane.
- Plane matching and transform
calculation
-
Plane matching - an illustration
The rotation came right
Rotation estimation
Rotation is obtained by aligning the normals.
To make the plane detection robust we restricted ourselves to work with
planes having atleast 1000 points.
In a general scene we get ~10 good planes and these can be used for
plane matching and transform calculation.
Since, the number of planes is very small a brute force approach will
also work.
We search for the least error set of 4 corresponding planes and use
these to calculate the transformation.
The error metric is defined based on the cosine distance between the
color histogram and the angle between the normals after rotation.
Ideally the normals should exactly align but error in Kinect data and
normal estimation leads to an error of ≤ 6 degress.
We thresholded for these errors individually and choose the minimum of
those which less than the threshold.
Translation estimation
Translation required an additional distance metric.
To estimate the translation the candidate plane correspondences were
used in the following manner:-
- We traversed the set of correspondences and for each correspondence we
computed the displacement along the aligned normal.
- This displacement was added to the translation vector and subtracted
(vector subtraction) from the other similar displacements between the
remaining corresponding planes.
- At this end of this process, the net translation vector was used to
translate the already rotated planes.
The distance based error metric was further defined as the sum of any
residual displacements, along their aligned normals, between
corresponding planes.
This was weighed and added to the original error metric.
This process, however, does not ensure that translations for all three
degrees of freedom are known.
The correspondences may all end up being between parallel planes and
the translation will end up being partially determined.
To solve this problem, it was ensured that the normals of the planes
vary substantially in their direction.
The normals were treated as points on a unit sphere and their
covariance matrix was calculated.
The eigen values of this matrix determine the spread of these normals
along three distinct directions (eigen vectors).
The entire process was carried out iff the three eigen values were all
greater than a threshold (in our case 0.1).
Cons
- The implementation has bugs and hence, translation is not
correctly determined.
- We need to generalize beyond using mere planes.
Finding matching planes along three distinct directions is a tighter
condition than having large overlap.
The problem is already solved for large overlap by Dieter Fox et. al.
- Very slow - The clustering algorithm is implemented by region
growing and is very slow.