Structured Light vs Microsoft Kinect

Posted: 26th February 2012 by hackengineer in Computer Vision
Tags: , , ,

So why use Structured light?  Turns out there is a tradeoff between depth resolution vs refresh rate.  Structured light can provide lots of depth detail (dependent on camera/project resolution) but doesn’t work well with fast moving scenes.  Kinect, on the other hand, provides decent depth maps and can handle dynamic scenes.  So if you want high resolution depth maps of relatively static scenes, SL can be a great option.

Structured Light

Structured Light in action

Structured Light in action

Structured light uses a set of temporally encoded patterns that are sequentially projected onto the scene.  The animation above illustrates SL in action.  This example shows two pixels of interest.  The yellow pixel is physically located on the background behind the red pixel and is 314 pixels from the left in the captured images.  The red pixel is 392 pixels from the left in the captured images.  Lets say white light is a 1 and black light is a 0.  Follow the example and watch the algorithm decode the structured light.

The resulting 9 bits are in gray code.  Gray code is used to reduce any errors while decoding the captured SL images.  For example did any of the pixels from the example look grayish?  If the wrong choice was made using a binary encoding a single error in the MSB would result in an error of 2^9!  Binary code has a great property that only single bit changes at a time.  Imagine a vertical bar progressing from left to right through the image below.  Each step only changes in one location.  As a result if any pixel is decoded as the wrong value, the error is limited to +/-1.  The result needs to be translated into binary before depth can be calculated.  The process is straight forward and the details can be found in the code.

Once we have decoded every pixel we have all the information we need to calculate depth.  To be clear we have the column number for each pixel of the captured image (0 = first pixel on left and 480 = last pixel on the right) and the decoded projector column (calculated above).  To calculate depth we will use simple geography.  The technique is called depth from disparity and the math is surprisingly really simple.   In the illustrations below the box on the left represents the projected set of SL images and the box on the left represents the captured images.  The line running through the middle of the boxes is called a scan line.  As the surface moves closer, the reflected light will be captured by pixel further to the left on the image sensor.  To calculate this disparity we subtract the decoded projector pixel from the corresponding camera pixel.  Turns out the depth is inversely proportunal to disparity(the difference between the two)!

Take a look at the illustration and notice the projector and camera are positioned side by side.  When setting up the hardware it is important to keep them level with eachother and spaced apart by a few inches.  This will greatly simply the math, otherwise a homography will need to be created.

Below is a step by step depth calculation for the yellow and red pixels in the example above.  Notice that the yellow pixel depth (z) is larger than the red pixel.  Physically this means that the yellow pixel is further away from the camera.  The depth will likely not be proportional to the x and y axis.  This depends on the focal length of both the camera and projector.  To fix this either both the camera and projector need to be calibrated or with trial and error.  This project used trail and error by scaling the depth value until the results matched the scene.

Kinect

Kinect uses a static spacially endcoded IR pattern projected onto the scene similar to that of a QR code.  The image on the right illustrates the set of dots scattered on the scence.  The pattern is deformed as it falls onto the objects.  The camera then captures an image of the scene and decodes the result.  This method calculates a single depth reading to a group of projected pixels (it takes multiple spacially encoded pixels to map back to unique camera pixels).  As a result there is a loss of depth resolution.  The advantage is depth can be calculated with only one capture.

While the hardware used in this project was designed for portability and small form-factor, the algorithm parameters can be easily updated for high resolution hardware.  In this project build, the pico projector is the limiting factor running at HVGA (480 x 320).

<–Previous     Project Home     Next –>


Be Sociable, Share!
  1. Anne says:

    Good blogging!