We describe a method for resolving ambiguities in low-level disparity calculations in a stereo-vision scheme by using a recurrent mechanism that we call signal-symbol loop. Due to the local nature of low-level processing it is not always possible to estimate the correct disparity values produced at this level. Symbolic abstraction of the signal produces robust, high confidence, multimodal image features which can be used to interpret the scene more accurately and therefore disambiguate low-level interpretations by biasing the correct disparity. The fusion process is capable of producing more accurate dense disparity maps than the low- and symbolic-level algorithms can produce independently. Therefore we describe an efficient fusion scheme that allows symbolic- and low-level cues to complement each other, resulting in a more accurate and dense disparity representation of the scene.
Published in Machine Vision and Applications, 2010
Here is a video from to the DRIVSCO project, processed both with and without an external guide. Disparity is calculated with a so called phase-based method (suitable for realtime FPGA implementation). Left- and right-images are the images received from the cameras while WO guide refers to results without fusion and W guide refers to results obtained using the method described in the paper. As it can be observed from the video, stereo disparity results obtained using fusion are superior in quality. Results and implications are explained in the paper.
We have described a fusion process for disambiguating low-level disparity estimations, produced by an algorithm generating several possible local interpretations of the scene disparity for each image position. Disambiguation is done by using an external disparity map generated by a symbolic-level process which is used to bias those low-level estimations that are similar to the symbolic-ones. Any external disparities for which there is no support in the actual signal is considered to be unreliable and hence will be rejected automatically. Since disambiguation is done at several image scales, a combination of VMP (Voting Mask Propagation) and median filter scaling increases spatial support of the external disparity map used at the fusion stage. Thus no explicit segmentation, diffusion or similar schemes are needed. Other important observations are:
- The scheme will deliver more accurate dense disparity estimations by using the middle-level cues. The exact accuracy gain depends on the images used.
- The density is not very much affected by the fusion and thus it remains almost the same, but the error decreases. Thus at constant error level, a higher density is obtained after densification and fusion mechanisms.
- All image structures seem to benefit from the fusion, especially edge- and corner-like ones. Homogeneous structures are improved to a lesser extent because the symbolic-level process does not produce any cues for homogeneous areas.
- The proposed fusion scheme in particular enhances estimations produced by a hardware implementation (simulation) with restricted numerical precision so that the results become more similar with those produced by a software implementation without any numerical restrictions.
In summary, the proposed scheme enhances the accuracy of the disparity estimations, reducing significantly the mean average error in the evaluated images compared to the non-fusion scheme. The proposed fusion mechanism particularly enhances accuracy of low-level estimations generated by hardware based engines for real-time on-chip implementations with restricted accuracy.