published in journal of Visual Communication and Image Representation

A Method for Sparse Disparity Densiﬁcation

using Voting Mask Propagation

Jarno Ralli

1

, Javier D

´

ıaz

1

, and Eduardo Ros

1

jarno@ralli.ﬁ, jdiaz@atc.ugr.es, eros@atc.ugr.es

1

Departamento de Arquitectura y Tecnolog

´

ıa de Computadores

∗

Abstract

We describe a novel method for propagating disparity values using directional

masks and a voting scheme. The driving force of the propagation direction is image

gradient, making the process anisotropic, whilst ambiguities between propagated

values are resolved using a voting scheme. This kind of anisotropic densiﬁcation

process achieves signiﬁcant density enhancement at a very low error cost: in some

cases erroneous disparities are voted out, resulting not only in a denser but also a

more accurate ﬁnal disparity map. Due to the simplicity of the method it is suitable

for embedded implementation and can also be included as part of a system-on-chip

(SOC). Therefore it can be of great interest to the sector of the machine vision

community that deals with embedded and/or real-time applications.

1 Introduction

Disparity is originally deﬁned as being the horizontal difference of a 3D point being

projected on two adjacent imaging devices (e.g. stereo-rig) and if both intrinsic- and

extrinsic parameters of the imaging system are known, complete 3D-reconstruction of

the scene is possible. However, even if we do not know all the necessary parameters to

do a complete 3D-reconstruction, disparity still conveys relative information of the 3D

structures of the scene which can also be useful.

Disparity extraction models are based on local or global optimization methods that

minimize (or maximize) matching cost of image features between two or more images.

Practically all the sparse models have some kind of threshold or other parameter, ei-

ther implicit or explicit, which affects density and at the same time error in the derived

disparity map [18][3]. On the other hand the global methods that minimize energy

functions within the whole scene, through local operations, usually derive a dispar-

ity map that is typically 100% dense (sometimes detecting occlusions as well). Such

global minimization can be done using different approaches such as variational meth-

ods [7][1][2][4][6] or graph cuts [9][10]. Typically, depending upon the sparse method

∗

Escuela T

´

ecnica Superior de Ingenier

´

ıa Informatica y de Telecomunicac

´

ıon, Universidad de Granada,

Calle Periodista Daniel Saucedo Aranda s/n, E-18071 Granada, Spain

www.jarnoralli.ﬁ 1 Author: Jarno Ralli

used, as density increases, after a certain limit error also starts to increase concomi-

tantly. Therefore, it is worth calculating a less dense, high-conﬁdence disparity map

and afterwards increasing the density by propagating the correct disparity values. In

this way we achieve better accuracy vs density trade-off than by directly reducing the

reliability threshold and thus increasing density at the expense of higher error. Many

global, dense, disparity calculation methods have built-in mechanisms for propagat-

ing/diffusing disparity [1][17] but sparse methods usually lack this capacity. To the

best of our knowledge there are very few independent propagation methods, apart from

interpolation [14] and diffusion [22], that can be applied as a post-processing step. By

independent we mean that the propagation method does not depend upon the algorithm

used to derive the initial disparity map. This work proposes a new densiﬁcation method

that is able to arrive at denser and more accurate results than the standard one-stage dis-

parity algorithms such as dynamic programming, block- matching and so on that only

slightly affects the error rate. Since the scheme is based on very simple operations,

it can be considered suitable for efﬁcient implementation. Our method resembles im-

age driven anisotropic diffusion, used for instance by Alvarez et al. [1] in variational

disparity calculation, in the sense that the propagation direction is based on the image

gradient. Instead of using a set of equations for deﬁning the diffusion model as it is

done in [1] our approach (VMP) uses a bank of predeﬁned masks and a voting process

to deﬁne the local interactions driving the diffusion process.

2 Method

The ﬁrst step is to calculate a sparse, high-conﬁdence, stereo map. Many feature-based

disparity calculation methods match edges present in stereo-images, since these can be

considered relatively robust features [8][11]. For the reasons set out in the Introduc-

tion we use here a simpliﬁed dynamic programming technique based on image edges

[14]. The rational for using dynamic programming is that it has been shown to be both

computationally efﬁcient [12] and capable of producing highly accurate results [3].

Nevertheless, as mentioned earlier, our densiﬁcation approach does not depend upon

the method used for producing the initial sparse disparity map. The second step is to

apply voting mask propagation (VMP) for propagating disparity in the direction where

estimations are expected to be similar and to use voting for resolving ambiguities. Lo-

cal support of the voting process is based on directional masks: for each image position

for which disparity is known a mask from a pre-determined bank is chosen, depend-

ing upon the underlying image structure (gradient). The properties of the chosen ﬁlter

deﬁne how many votes each of the neighborhood positions will receive.

2.1 Propagation direction

Since without further image analysis we cannot be sure of which object an image pixel

for which the disparity is known belongs to, and since we assume that inside objects

disparity changes gradually, image gradient is used as a driving force of the propagation

direction. We assume that two different objects will almost certainly have two different

disparity levels. By propagating in a direction perpendicular to the image gradient we

reduce errors since different objects have varying disparities divided by an edge. This

assumption of local maximum gradient separating different objects is also the basis of

anisotropic diffusion, where diffusion direction is driven by the gradient [22][13]. In

this work we concentrate on the case where the disparities for the edges are known and

2

the disparities are propagated in an edge-wise direction. There is, however, no reason

why the disparities not residing at the edges could not be propagated as well. The

tangent-to-edge direction is approximated by calculating image gradient.

2.2 Bank of masks

A bank of masks is designed using a 2D multivariate Gaussian distribution which is

rotated in order to generate masks corresponding to different propagation directions.

The basic mask, corresponding to orientation 0

◦

(i.e. the horizontal axis), is calculated

as per (1).

z = G(i, j, µ

i

= µ

j

= 0, Σ), i = −N .. . N, j = −P . . . P (1)

where G(·) denotes multivariate Gaussian, (i, j) are the coordinates of the mask, (µ

i

=

µ

j

= 0) are the mean, Σ is the covariance, z is the number of votes that each position

receives and N and P deﬁne the mask size. Fig. 1 shows a voting mask corresponding

to several different orientations (rotations).

Figure 1: A bank of 7x7 voting masks corresponding to different orientations. Intensity

codiﬁes the number of votes each position receives.

The use of Gaussian distribution is motivated by the fact that it reﬂects the proba-

bilistic nature of our approach: the underlying image structure drives the propagation

direction and thus reﬂects our belief on how the disparity is distributed. Other au-

thors have used similar approaches for image denoising [5]. Furthermore Gaussian

multivariate distribution allows a smooth transition from isotropic to anisotropic cases,

depending on the certainty of the image structure, which can be used in more elabo-

rated schemes by futher analyzing the image. Besides a Gaussian distribution can be

implemented as a separable convolution thus making it efﬁcient computationally.

2.3 Choosing the mask

Once the orientations of the edge tangents have been approximated, propagation is

carried out for each disparity value using the mask whose orientation best matches the

tangent of the edge. The most closely corresponding mask’s centre is placed on top of

the disparity value of interest and each pixel within mask size receives as many votes

for the disparity value as deﬁned by the mask. This is shown in the (2).

V

x+i, y+ j

(D

x, y

) = G

x, y, ∆

(x + i, y + j), i = −N .. . N, j = −P . . . P (2)

3

where V

x, y

indicates votes received by position (x, y) for disparity, D, G

x, y, ∆

(x + i, y +

j) denotes how the Gaussian voting mask, chosen as per gradient ∆, with a size of

(2N + 1, 2P + 1) , placed at (x, y) votes for each mask position. As a ﬁnal step, after

the disparities have been propagated for each of the original disparity values, each pixel

position assumes the disparity that receives most votes, as deﬁned in (3).

D

x, y

= MAX

V

(V

x, y

,D

x, y

)

(3)

where D

x, y

indicates the ﬁnal chosen disparity value for a position (x, y) and MAX

V

(V, D)

returns the disparity value that has received the most votes for a set of vote-disparity

tuples (V, D). Due to the spatial support of the voting mechanism erroneous values are

in certain cases effectively voted out: if within a certain neighborhood there are more

correct values than erroneous values, the erroneous values receive fewer votes and are

discarded. Interestingly enough this effect allows the densiﬁcation process to arrive at

a denser but at the same time more accurate disparity map than the original.

2.4 Pseudocode

For the sake of clarity, below we have included a pseudocode of the propagation pro-

cess.

Algorithm 1 Pseudocode Describing the Voting Process.

INPUT: I (image driving the voting process)

INPUT: D (disparity map)

INPUT: M (bank of masks with mask size (N, P))

STORAGE: V (a matrix for storing the votes)

//OBTAIN COORDINATES FOR DISPARITIES

(x, y) = coords( notEmpty(D) )

for i = 1 → numel(x) do

//CHOOSE THE CLOSEST MASK TO GRADIENT NORMAL

∆I = calculateImageGradient( I( x(i), y(i) ) )

mask = chooseMask( M, ∆I )

//VOTE USING THE CHOSEN MASK

V = vote( x(i), y(i), mask, D( x(i), y(i)), N, P )

end for

(x, y) = coords( notEmpty(V) )

//CHOOSE WINNERS

for i = 1 → numel(x) do

Out( x(i), y(i) ) = MAX( V( x(i), y(i) ) )

end for

The actual voting process can be seen as superimposition of the chosen mask on

top of the disparity value to be propagated where each neighboring pixel (deﬁned by

the mask) receives as many votes for the disparity as deﬁned by the weight of the mask

as each position. For each pixel position the number of votes for each disparity needs

to be stored so that a winner can be chosen accordingly.

4

3 Experiments and Results

We benchmarked the method using well known stereo-images from the Middlebury

database

1

. In order to study how the VMP densiﬁcation process behaves when dealing

with different initial densities and/or errors, we have used two different methods for

generating the initial disparity maps provided to the VMP model. The different meth-

ods used were dynamic programming (DP) [14] and a phase-based method [18][16].

Furthermore we have used different thresholds and interleave factors for DP in order

to generate initial disparity maps with different densities and errors. Computational

complexity of the our method was approximated by comparing it with execution times

of the DP method. We also introduce a sample application that clearly beneﬁts from a

more dense disparity map as input. In the experiments, size of the propagation masks

was 7x7 pixels. Density is given in terms of a ratio between the number of pixels for

which disparity has been deﬁned and the total number of pixels in the image. Overall

accuracy is measured as the percentage of correct pixels (±1 disparity level) calculated

against the ground-truth values.

3.1 Results

Fig. 2 shows the original stereo-pair images and results calculated directly using DP

and densiﬁed by VMP.

1

http://vision.middlebury.edu/stereo/

5

tsukuba C:74.6% D:15.9% C:74.1% D:55.9%

venus C:90.1% D:12.6% C:90.4% D:47.2%

cones C:83.8% D:17.8% C:82.7% D:67.9%

teddy C:77.2% D:12% C:77.5% D:50.3%

Figure 2: Results for the test images: the left-hand column contains left-hand images

of the original stereo-pairs, the middle column shows disparity maps calculated by

dynamic programming and the right-hand column disparity maps densiﬁed using VMP.

C denotes the percentage of correct disparities (±1 disparity level) and D, density.

Fig. 3 demonstrates the results for four different initial maps with different densi-

ties. The initial maps are calculated using different thresholds for occlusion detection

with the effect of increasing density at the expense of accuracy. Thus it can be observed

that after certain reasonable limit, in order to obtain even more dense map, the error

starts to increase. In such a case it is better to calculate more reliable initial map and

then densify.

6

(a) Tsukuba&Venus results

(b) Cones&Teddy results

Figure 3: Densiﬁcation results for several initial densities obtained using different

thresholds for dynamic programming. DP refers to results calculated by directly us-

ing dynamic programming and VMP refers to results densiﬁed from corresponding DP

results using voting mask propagation. (a) TS refers to Tsukuba and V refers to Venus

images (b) C refers to Cones and TE refers to Teddy images

Fig. 4 shows the results for the Venus case only. It can be clearly seen that as the

cost for occlusions gets higher (threshold from one to four) density of the resulting

disparity map increases slightly at the expense of accuracy. On the other hand as the

error increases the voting mechanism starts to vote out some of the erroneous values

thus increasing the accuracy.

7