In this paper we describe a novel use for a well known temporal constraint term in the framework of variational correspondence methods. The new use, that we call spatial constraining, allows bounding of the solution based on what is known of the solution beforehand. This knowledge can be something that (a) is known since the geometrical properties of the scene are known or (b) is deduced by a higher-level algorithm capable of inferring this information. In the latter case the spatial constraint term enables fusion of information between high- and low-level vision systems: high-level makes a hypothesis of a possible scene setup which then is tested by the low-level, recurrently. Since high-level vision systems incorporate knowledge of the world that surrounds us, this kind of hypothesis testing loop between the high- and low-level vision systems should converge to a more coherent solution.
In the following there are two videos, from to the DRIVSCO project, processed with the proposed method.
Left- and right-images are the images received from the cameras, WO apriori shows results without using a spatial constraint while Spatial displays results using a spatial constraint. As it can be observed from the video, results with a spatial constraint are far better than those without a constraint. More results can be found in the paper.
Left image shows the images received from the left camera, WO apriori shows the results without any constraining, Temporal shows results using a temporal constraint while Spatio-temporal shows results using both spatial and temporal constraints. As it can be seen from the video, constraining improves quality of the resulting optical flow field. More results can be found in the paper.
In this paper, we have proposed using spatial- and temporal constraints extracted from the scene analysis context or from higher level vision stages to enhance the quality of the low level representation maps. Purely data driven methods presume very little of the solution and thus, of the surrounding world. Typically, they only incorporate some kind of regularization concept that makes the solution smooth piece-wise (e.g. a regularization cost term). High-level vision makes more assumptions of the surrounding world and thus, of the solution. We feel that the right way to go is combining strengths of both the bottom-up (data driven) and top-down (model driven) approaches in order for machine vision to be used successfully in a number of real applications. Since the variational framework can naturally integrate information as constraint terms, we have extended the basic variational correspondence methods to include external constraints. The major difference between the smoothness and the external constraints
is that the smoothness term acts locally based only on the data whereas the extended constraint terms (spatial and temporal) constrain the solution in a more global sense. Therefore, by binding low- and high-level vision using the proposed mechanism, significant improvements are obtained. We illustrate this with several real-world examples and perform a quantitative evaluation with standard benchmark sequences: in disparity calculation, for the chosen benchmark images, we show that considerable improvements are achievable using spatial information. Especially cases containing surfaces with very little spatial information benefit from the constraints. On the other hand, if enough spatial information is available, then the improvements will be less significant. The obtained results indicate that with the evaluated constraints (spatial and temporal terms) the results improve to a point where tasks such as object recognition and grasping, fore- and background segmentation based upon the disparity, and/or
optical-flow become easier for higher level vision stages. Furthermore, we propose a more general mechanism in which the constraints can be provided by higher level vision stages within a hypothesis forming-validation-loop. This strategy is a generalization of the former studies in which the constraint integrates not only contextual low level primitives but also higher level model-based knowledge. The example of such a hypothesis forming and validation loop given in this work can be fairly easily extended for several planes/surfaces simultaneously for further improving the results. Machine vision is a very active field, in which many and diverse models have been proposed for a wide variety of problems. One of the current challenges is to develop integration schemes and scalable methods that allow different models help and complement each other towards the final vision goal which is the "interpretation of the scene". The method proposed in this paper is a step in this direction.