Correspondence techniques start from the assumption, based on the Lambertian reflection model, that the apparent brightness of a surface is independent of the observer's angle of view. From this, a grey value constancy assumption is derived, which states that a change in brightness of a particular image pixel is proportional to a change in its position. This constancy assumption can be extended directly for vector valued images, such as RGB. It is clear that the grey value constancy assumption does not hold for surfaces with a non-Lambertian behaviour and, therefore, the underlying image representation is crucial when using real image sequences under varying lighting conditions and noise from the imaging device. In order for the correspondence methods to produce good, temporally coherent results, properties such as robustness to noise, illumination invariance, and stability with respect to small geometrical deformations are all desired properties of the representation. In this paper, we study how different 

image representation spaces complement each other and how the chosen representations benefit from the combination in terms of both robustness and accuracy. The model used for establishing the correspondences, based on the calculus of variations, is itself considered robust. However, we show that considerable improvements are possible, especially in the case of real image sequences, by using an appropriate image representation. We also show how optimum (or near optimum) parameters, related to each representation space, can be efficiently found.


...In this paper, we study how combinations of different image representations behave with respect to both illumination errors and noise, ranking the results accordingly. We believe that such information is useful to the part of the visual community that concentrates on applications, such as obstacle detection in vehicle related scenarios, segmentation, and so on. Although other authors address similar issues, we find these to be slightly limited in scope due to a reduced `test bench', e.g.\ a small number of test images or image representations. Also, in most of the cases, the way in which the parameters related to the model(s) have been chosen is not satisfactorily explained. Therefore, the main contribution of our paper is an analysis of the different image representations supported by a more detailed and systematical evaluation methodology. For example, we show how optimum (or near optimum) parameters for the algorithm, related to each 

representation space, can be found. This is a small but important contribution in the case of real, non controlled, scenarios. The standard image representation is the RGB-space, the others being (obtained via image transformations): gradient, gradient magnitude, log-derivative, HSV, \( r\phi\theta \), and phase component of an image filtered using a bank of Gabor filters....

Tested image representation spaces


Published in EURASIP Journal on Advances in Signal Processing.


Following is a video showing results for the DRIVSCO scene with four different representations. The video has been generated exactly with the same parameters as obtained from the 1st run of the 5-fold cross-validation experiments. By increasing slightly the weight of the regularization parameter (alpha) the spurious errors seen in the right hand side of the disparity maps would be considerably decreased. However, I want to post the results here 'as they are' so that other scientists, including me, would be conscious of the problems with the algorithm. Many times the results are not what expected when processing complete videos. In the video left-image displayes the images received from the left camera, top right shows results using gradient based image representation, lower left shows results using normalized RGB, and lower right shows results using a combination of gradient and phase based image representation. As it can be observed from the video, used image representation space greatly influences the resulting stereo disparity map. More results can be seen in the paper.

High quality version iof the video in MP4 format is available from here


We have shown that the quality of a disparity map, generated by a variational method, under illumination changes and image noise, depends significantly on the used image representation type. By combining different representations, we have generated and tested 34 different cases and found several complementary spaces that are affected only slightly even under severe illumination errors and image noise. Accuracy differences of 7-fold (without noise) and 10-fold (with noise and illumination errors) were found between the best and worst representation maps, which highlights the relevance of an appropriate input representation for low level estimations such as stereo. This accuracy enhancing and robustness to noise can be of critical importance in specific application scenarios with real uncontrolled scenes and not just well behaving test images (e.g.\ automatic navigation, advanced robotics, CGI). Amongst the tested combinations, the \( \nabla I \) representation stood out as one of the most accurate and least affected by illumination errors or noise. By combining \( \nabla I \) with PHASE, the joined representation space was the most robust one amongst the tested spaces. This finding was also confirmed by the qualitative experiments. Thus, we can say that the aforementioned representations complement each other. These results were also confirmed in a qualitative evaluation of natural scenes in uncontrolled scenarios.


Download paper in PDF

Jarno Ralli
Author: Jarno Ralli
Jarno Ralli is a computer vision scientist and a programmer.

Let's get social!

FacebookTwitterGoogle BookmarksLinkedIn