Thursday, April 27, 2006
Honduras Report
Here's a short video from the Honduras trip: [ fish5.avi ]
Watch it, and you will realize why I haven't gotten any major new results in shape classification this semester. The changes in lighting, camera jitter, small relative size of fish to the image, and abundance of texture (but not color) on which to segment make extracting fish contours an extremely challenging problem. Color segmentation (e.g. with pyramidal flood-filling) does a terrible job: under-water most objects are some shade of aqua-marine, and the texture varies so much when the fish is seen on the backdrop of the sea floor that the resulting contours are based almost exclusively on local texture patterns.


Background subtraction also does a bad job, since the camera is moving too quickly:

Even motion subtraction, via affine warping the previous frame to fit as closely as possible the current frame (using the flow field from sparse pyramidal Lucas Kanade optical flow) doesn't help us find the fish (which should be moving differently from the rest of the image). Due to the large range in scales across the image and the high degree of occlusion, many other pixels in the image stand out apart from the fish.


At this point you may be saying to yourself: "but there aren't any fish in this image." In fact, there are two:

It is only when the motion of the fish is seen that humans can even detect them. Thus, more complicated techniques for motion segmentation are needed, such as:
"Hierarchical Image-Motion Segmentation by Swendsen-Wang Cuts"
http://civs.stat.ucla.edu/Barbu_Research/Motion/index.html
I have also been playing around with some texture metrics for the purposes of segmenting out the sea-bed and for finding and classifying coral. However, for the time being I am giving up on the Honduras videos because they are simply too difficult to process. I will generate shape models from the Cousteau video, where fish are usually against the solid backdrop of the open sea, and where lighting conditions are much more favorable. I also have a Hokuyo laser range finder (http://www.hokuyo-aut.jp/products/urg/urg.htm) to play with for 3D shape analysis--I worked out the math for 3D Procrustean shape analysis while we were in Honduras.
Finally, while in Honduras the main application I worked on was optical flow/visual odometry. I had an implementation in OpenCV before we left; unfortunately we were using a Blackfin DSP processor, so OpenCV had to be ported. This turned out to be much more difficult than anticipated, even though there is at least one reference on the web claiming to have ported OpenCV to the Blackfin (or at least there was a reference...a google search for "opencv blackfin" turns up nothing now). The main problem is that the Blackfin has no hardware support for floating point arithmetic, so all of OpenCV's math functions were extremely slow, and had to be changed to use fixed point. Aside from over/underflow problems, in the end some of the algorithms themselves were too slow--pyramidal sparse Lucas Kanade optical flow was a little too pyramidal for the Blackfin's taste, it seems. In the end, after rewriting many of the image processing functions in OpenCV, the frame-rate was down to a blazingly fast 1.5 Hz through the JTAG debugging interface (*sarcasm*). Ironically, when we tried to run the program without the JTAG, the Blackfin operating system complained. The most likely explanation I got from the hardware people on the trip was that the program was too big. *Sigh*
Finally, on the last night of the trip, I gave up on OpenCV and hacked together a non-optical-flow-based image registration algorithm which almost, sorta-kinda worked. More processing power (to increase the searchable motion space) and smarter image processing techniques (registering edge images, for example) should improve results, but I haven't had a chance to work on it any more yet.
That's all for now! I remember why I usually don't post to my blog now...it takes so much time!
Watch it, and you will realize why I haven't gotten any major new results in shape classification this semester. The changes in lighting, camera jitter, small relative size of fish to the image, and abundance of texture (but not color) on which to segment make extracting fish contours an extremely challenging problem. Color segmentation (e.g. with pyramidal flood-filling) does a terrible job: under-water most objects are some shade of aqua-marine, and the texture varies so much when the fish is seen on the backdrop of the sea floor that the resulting contours are based almost exclusively on local texture patterns.


Background subtraction also does a bad job, since the camera is moving too quickly:

Even motion subtraction, via affine warping the previous frame to fit as closely as possible the current frame (using the flow field from sparse pyramidal Lucas Kanade optical flow) doesn't help us find the fish (which should be moving differently from the rest of the image). Due to the large range in scales across the image and the high degree of occlusion, many other pixels in the image stand out apart from the fish.


At this point you may be saying to yourself: "but there aren't any fish in this image." In fact, there are two:

It is only when the motion of the fish is seen that humans can even detect them. Thus, more complicated techniques for motion segmentation are needed, such as:
"Hierarchical Image-Motion Segmentation by Swendsen-Wang Cuts"
http://civs.stat.ucla.edu/Barbu_Research/Motion/index.html
I have also been playing around with some texture metrics for the purposes of segmenting out the sea-bed and for finding and classifying coral. However, for the time being I am giving up on the Honduras videos because they are simply too difficult to process. I will generate shape models from the Cousteau video, where fish are usually against the solid backdrop of the open sea, and where lighting conditions are much more favorable. I also have a Hokuyo laser range finder (http://www.hokuyo-aut.jp/products/urg/urg.htm) to play with for 3D shape analysis--I worked out the math for 3D Procrustean shape analysis while we were in Honduras.
Finally, while in Honduras the main application I worked on was optical flow/visual odometry. I had an implementation in OpenCV before we left; unfortunately we were using a Blackfin DSP processor, so OpenCV had to be ported. This turned out to be much more difficult than anticipated, even though there is at least one reference on the web claiming to have ported OpenCV to the Blackfin (or at least there was a reference...a google search for "opencv blackfin" turns up nothing now). The main problem is that the Blackfin has no hardware support for floating point arithmetic, so all of OpenCV's math functions were extremely slow, and had to be changed to use fixed point. Aside from over/underflow problems, in the end some of the algorithms themselves were too slow--pyramidal sparse Lucas Kanade optical flow was a little too pyramidal for the Blackfin's taste, it seems. In the end, after rewriting many of the image processing functions in OpenCV, the frame-rate was down to a blazingly fast 1.5 Hz through the JTAG debugging interface (*sarcasm*). Ironically, when we tried to run the program without the JTAG, the Blackfin operating system complained. The most likely explanation I got from the hardware people on the trip was that the program was too big. *Sigh*
Finally, on the last night of the trip, I gave up on OpenCV and hacked together a non-optical-flow-based image registration algorithm which almost, sorta-kinda worked. More processing power (to increase the searchable motion space) and smarter image processing techniques (registering edge images, for example) should improve results, but I haven't had a chance to work on it any more yet.
That's all for now! I remember why I usually don't post to my blog now...it takes so much time!
More Fish Art
Forget to re-normalize the log-distance transform of an edge map of a fish, and this is what you get =)


Fall '05 Final Paper
My final paper for Machine Learning last semester:
Probabilistic Procrustean Shape Analysis for Object Recognition
and a short set of slides for my vision class on the final project:
The task was to generate shape models of two types of fish from training image data, and then to classify new shapes extracted from test image data.
Contours were extracted by hand, using color thresholding to get binary images from which contours could be extracted. Gaussian smoothing and morphological dilation and erosion operations were used to remove noise from the contours. All of this was done using OpenCV from Intel.






The correspondence problem was handled in two steps. First, features were found using peaks in "local shape derivatives."


Next, a probabilistic search algorithm for finding feature correspondences between two contours was designed by considering neighborhoods of local shape at varying scales around each feature, together with feature spacing, and the overall shape of the features.



All the matched contours in the training set were then hand-labeled as belonging either to the "angelfish" of "yellow tang" classes, and then mean shapes and tangent plane principle components were found using Procrustean shape analysis.
Effects of principle components on the mean shape:
Angelfish



Yellow Tang



Contours extracted from test data were then feature-matched against the mean shapes of each class and then classified to the shape class with the smallest Mahalanobis distance. In other words, Gaussian models were assumed for each shape class and classification was acheived with a maximum likelihood decision rule.
Results were nearly perfect for the small dataset we had (10 angelfish and 14 yellow tang contours), but obviously more data and shape categories are needed to truly test the validity of the matching algorithm. The novel part of the algorithm lies in the feature extraction and correspondence methods; tangent-plane PCA on shape models using the Procrustes metric has been done many times before in other domains, although it is a relatively unknown technique to the vision community.
The next steps are to extract shapes from video, and to apply our shape completion techniques to complete partially-occluded shapes in video sequences.
Probabilistic Procrustean Shape Analysis for Object Recognition
and a short set of slides for my vision class on the final project:
[ ppt slides ]
The task was to generate shape models of two types of fish from training image data, and then to classify new shapes extracted from test image data.
Contours were extracted by hand, using color thresholding to get binary images from which contours could be extracted. Gaussian smoothing and morphological dilation and erosion operations were used to remove noise from the contours. All of this was done using OpenCV from Intel.






The correspondence problem was handled in two steps. First, features were found using peaks in "local shape derivatives."


Next, a probabilistic search algorithm for finding feature correspondences between two contours was designed by considering neighborhoods of local shape at varying scales around each feature, together with feature spacing, and the overall shape of the features.



All the matched contours in the training set were then hand-labeled as belonging either to the "angelfish" of "yellow tang" classes, and then mean shapes and tangent plane principle components were found using Procrustean shape analysis.
Effects of principle components on the mean shape:
Angelfish



Yellow Tang



Contours extracted from test data were then feature-matched against the mean shapes of each class and then classified to the shape class with the smallest Mahalanobis distance. In other words, Gaussian models were assumed for each shape class and classification was acheived with a maximum likelihood decision rule.
Results were nearly perfect for the small dataset we had (10 angelfish and 14 yellow tang contours), but obviously more data and shape categories are needed to truly test the validity of the matching algorithm. The novel part of the algorithm lies in the feature extraction and correspondence methods; tangent-plane PCA on shape models using the Procrustes metric has been done many times before in other domains, although it is a relatively unknown technique to the vision community.
The next steps are to extract shapes from video, and to apply our shape completion techniques to complete partially-occluded shapes in video sequences.