Stereo Disparity

3D reconstruction can be done based on depth information obtained from a pair of cameras. Stereo disparity map basically tells us the distance between each point in image and the camera.


Optical Flow

Optical Flow gives us information related to how the pixels move between images taken at two different points in time. Color coding can be used to display the movement direction.



Pixels of the image can be segmented into meaningful groups. Segmented images can be used for object detection, grasping etc.


Tutorials and Code

Getting started initially can be difficult. Check out my tutorials and code examples in order to get a jump start!


On this page you will find extra information related to the articles that I am associated with. Click on the name of the publication in order to see 'extra' information related to the paper.


Low-cost Sensor to Detect Overtaking Based on Optical-Flow

A Method for Sparse Disparity Densification using Voting Mask Propagation

Disparity Disambiguation by Fusion of Signal-and Symbolic-level information

Spatial and temporal constraints in variational correspondence methods

External Constaints in Variational Disparity Calculation: Hypothesis-Forming-Validation-Loops and Segmentation

Complementary Image Representation Spaces in Variational Disparity Calculation


Low-cost Sensor to Detect Overtaking Based on Optical-Flow

Abstract: The automotive industry invests substantial amounts of money in driver-security and driver-assistance systems. We propose an overtaking detection system based on visual motion cues that combines feature extraction, optical flow, solid-objects segmentation and geometry filtering, working with a low-cost compact architecture based on one focal plane and an on-chip embedded processor. The processing is divided into two stages: firstly analog processing on the focal plane processor dedicated to image conditioning and relevant image-structure selection, and secondly, vehicle tracking and warning-signal generation by optical flow, using a simple digital microcontroller. Our model can detect an approaching vehicle (multiple-lane overtaking scenarios) and warn the driver about the risk of changing lanes. Thanks to the use of tightly coupled analog and digital processors, the system is able to perform this complex task in real time with very constrained computing resources. The proposed method has been validated with a sequence of more than 15,000 frames (90 overtaking maneuvers) and is effective under different traffic situations, as well as weather and illumination conditions.

Journal: Machine Vision and Applications

As the name implies, the system detects detects overtaking vehicles based on optical-flow. Robustness of the system has been tested using more than 15,000 frames under realistic illumination conditions. The figure below represents different sub-tasks implemented in the system.

Overtaking system description

As it can be observed from the figure above, optical-flow due to egomotion (movement of the camera itself) needs to be filtered out in order to eliminate false detections. The video below demonstrates performance of the system using real footage from traffic:

A Method for Sparse Disparity Densification using Voting Mask Propagation

Abstract: We describe a novel method for propagating disparity values using directional masks and a voting scheme. The driving force of the propagation direction is image gradient, making the process anisotropic, whilst ambiguities between propagated values are resolved using a voting scheme. This kind of anisotropic densification process achieves significant density enhancement at a very low error cost: in some cases erroneous disparities are voted out, resulting not only in a denser but also a more accurate final disparity map. Due to the simplicity of the method it is suitable for embedded implementation and can also be included as part of a system-on-chip (SOC). Therefore it can be of great interest to the sector of the machine vision community that deals with embedded and/or real-time applications.

Journal: Journal of Visual Communication and Image Representation, 2009

By using a voting scheme based on masks that are syntonized with the image gradient, we obtain considerable densification of sparse disparity maps without increasing the error. The figure below shows a set of 8 voting masks. Intensity represents the number of votes each position gets. High intensity means more votes, while low intensity means less votes. Black represents 0 votes.

 Voting masks

By placing a mask (that best syntonizes with the image gradient direction) on top of each disparity estimation, the neighbouring pixels obtains votes for that particular disparity value. The disparity value that obtains the most votes, wins. Following figure shows results for well known image from the Middlebury database. 'C' stands for percentage correct disparities (+-1 disparity level), while 'D' means density, also given in percentage.

Results for voting mask

 As we can observe from the above figure, we obtain considerable densification with the VMP (Voting Mask Propagation), while the error remains the same.

Disparity Disambiguation by Fusion of Signal- and Symbolic-level Information

Abstract: We describe a method for resolving ambiguities in low-level disparity calculations in a stereo-vision scheme by using a recurrent mechanism that we call signal-symbol loop. Due to the local nature of low-level processing it is not always possible to estimate the correct disparity values produced at this level. Symbolic abstraction of the signal produces robust, high confidence, multimodal image features which can be used to interpret the scene more accurately and therefore disambiguate low-level interpretations by biasing the correct disparity. The fusion process is capable of producing more accurate dense disparity maps than the low- and symbolic-level algorithms can produce independently. Therefore we describe an efficient fusion scheme that allows symbolic- and low-level cues to complement each other, resulting in a more accurate and dense disparity representation of the scene.

Journal: Machine Vision and Applications, 2010

Here is a video related to the DRIVSCO project, processed both with and without an external guide. Disparity is calculated with a so called phase-based method (suitable for realtime FPGA implementation)

High quality version in MP4

Spatial and Temporal Constraints in Variational Correspondence Methods

Abstract: In this paper we describe a novel use for a well known temporal constraint term in the framework of variational correspondence methods. The new use, that we call spatial constraining, allows bounding of the solution based on what is known of the solution beforehand. This knowledge can be something that (a) is known since the geometrical properties of the scene are known or (b) is deduced by a higher-level algorithm capable of inferring this information. In the latter case the spatial constraint term enables fusion of information between high- and low-level vision systems: high-level makes a hypothesis of a possible scene setup which then is tested by the low-level, recurrently. Since high-level vision systems incorporate knowledge of the world that surrounds us, this kind of hypothesis testing loop between the high- and low-level vision systems should converge to a more coherent solution.

Journal: Machine Vision and Applications, 2011

In the following there are two videos, related to the DRIVSCO project, processed with the proposed method.

Stereo Disparity

High quality version in MP4



High quality version in MP4

External Constraints in Variational Disparity Calculation: Hypothesis-Forming-Validation-Loops and Segmentation

Abstract: In this paper we show how to enhance disparity estimations using spatial constraints based on segmentation, therefore leading to improved scene interpretation. By using a Hypothesis-Forming-Validation-Loop (HFVL) our method effectively fuses low- and middle-level vision cues, thus increasing coherency and quality of the estimations. The paper briefly describes a segmentation scheme based on physical model abstractions (polynomials as generalized surfaces of interest) that can be efficiently used as middle-level module to produce feedback cues towards enhancing low level disparity calculation methods. Improvements are considerable, especially in difficult cases without sufficient spatial features (e.g weakly textured scenes), where dense disparity methods typically tend to fail, possibly leading to an incorrect scene interpretation.

Journal: Sent for publication

Below are some of the results related to the Middlebury test images and the GRASP project. As it can be observed, the GRASP case is more difficult than the Middlebury case, due to lack of structure in the observed objects. Also, in the Middlebury case most of the objects are either fronto parallel or quasi fronto parallel, whereas this is not the case with the GRASP images. In the GRASP case we have used the proposed Hypothesis-Forming-Validation-Loop to further enhance the resulting disparity map: from the initial disparity map we obtain a constraint for the principal plane, automatically, that happens to be the table. Even though the objects of interest lying on the table are of the same color, our method is still able to segment them correctly. Segmentation is based only on the disparity.

Following are disparity results for two different image sets from the GRASP project. Not const. stands for 'not constrained', meaning that the disparity is calculated without a constraint. In the constrained case, first we calculate an initial disparity map and obtain, automatically, a constraint (principal surface) that is then used to constrain the solution.

Stereo apriori, easy case

Have you ever wondered, why in most cases having large planar surfaces without concernible features, there is either a colourful tablecloth or other stuff on the table? The 'extra features' are there so that the method used for calculating disparity can make better approximations.

Grasp stereo


As we can observe, using a constraint significantly enhances the results. In the following we show segmentation results based on the segmentation method introduced in this it can be seen, it does a pretty deacent job segmenting the disparity maps. Seg. GT. means segmentation based on ground-truth, while Seg. disp. means segmentation based on calculated disparity map.

Middlebury segmentation

Grasp segmentation


Following are some results not included in the publication. I tested a Graph-Cut (GC) method based on the MRF library to see how it would do with the GRASP images.

Graph-cuts versus variational disparity

Please mind that I am not making any claims of the variational method being better than GC methods, since I am not an expert when it comes down to the GCs. I would be more than happy if someone expert in the GC field could process the GRASP images so that I could compare the results.


Complementary Image Representation Spaces in Variational Disparity Calculation

Abstract: Correspondence techniques part from the assumption, based on the Lambertian reflection model, that apparent brightness of a surface is independent of the observer's angle of view. From this, a gray value constancy assumption is derived, which states that a change in brightness of a particular image pixel is proportional to a change in its position. This constancy assumption can be extended directly for vector valued images. It is clear that the gray value constancy assumption does not hold for surfaces with non-Lambertian behavior and therefore, the underlying image representation is crucial when using real image sequences under varying lighting conditions and noise from the imaging device. In order for the correspondence methods to produce good, temporally coherent results, properties such as robustness to noise, illumination invariance, and stability with respect to small geometrical deformations are all desired properties of the representation. In this paper, we study how different image representation spaces complement each other and how the chosen representations benefit from the combination in terms of both robustness and accuracy. We also show how optimum, or near optimum, parameters for the variational based algorithm, related to each representation space, can be efficiently found.

Journal: sent for publication

Following is a video showing results for the DRIVSCO scene with four different representations. The video has been generated exactly with the same parameters as obtained from the 1st run of the 5-fold cross-validation experiments. By increasing slightly the weight of the regularization parameter (alpha) the spurious errors seen in the right hand side of the disparity maps would be considerably decreased. However, I want to post the results here 'as they are' so that other scientists, including me, would be conscious of the problems with the algorithm. Many times the results are not what expected when processing complete videos...

High quality version in MP4

Jarno Ralli
Author: Jarno Ralli
Jarno Ralli is a computer vision scientist and a programmer.

Let's get social!

FacebookTwitterGoogle BookmarksLinkedIn