published in journal of Machine Vision and Communications
Spatial and Temporal Constraints in Variational
Correspondence Methods
Jarno Ralli
1
, Javier D
´
ıaz
1
, and Eduardo Ros
1
jarno@ralli.fi, jdiaz@atc.ugr.es, eros@atc.ugr.es
1
Departamento de Arquitectura y Tecnolog
´
ıa de Computadores
Abstract
This paper proposes a new method for optical-flow and stereo estimation based
on the inclusion of both spatial and temporal constraints in a variational frame-
work. These constraints bound the solution based on a priori information, or in
other words, based on what is known of a possible solution or how it is expected to
change temporally. This knowledge can be something that (a) is known since the
geometrical properties of the scene are known or (b) is deduced by a higher-level
algorithm capable of inferring this information. In the latter case, the constraint
terms enable the exchange of information between high- and low-level vision sys-
tems: the high-level system makes a hypothesis of a possible scene setup which is
then tested by the low-level one, recurrently. Since high-level vision systems in-
corporate knowledge of the world that surrounds us, this kind of hypothesis testing
loop between the high- and low-level vision systems should converge to a more
coherent solution.
1 Introduction
1.1 Motivation
Even though results of the latest correspondence forming methods are impressive (see
e.g. Middlebury Computer Vision Pages
1
for stereo and optical-flow [17][4]), in some
cases, the lack of a meaningful structure in the data deteriorates performance, which at
worst can render the method useless for the task at hand (e.g. object detection/recognition
and/or manipulation). However, in real image sequences, something can typically be
assumed of the solution: the sky is, relatively speaking, far away from the cameras
(i.e. disparity zero or near zero); a road is a relatively flat surface; in the case of
automatic video surveillance, something is known regarding the background; floors,
ceilings, walls, and other man made structures tend to be relatively flat surfaces; if the
capture rate of the camera(s) is high enough, then movements in the real world trans-
late into small movements between frames in the camera plane and so on. In the above
Escuela T
´
ecnica Superior de Ingenier
´
ıa Informatica y de Telecomunicac
´
ıon, Universidad de Granada,
Calle Periodista Daniel Saucedo Aranda s/n, E-18071 Granada, Spain
1
http://vision.middlebury.edu/
www.jarnoralli.fi 1
mentioned cases, it would be beneficial if this knowledge of the geometrical and/or
temporal setup could be plugged in the variational framework in order to constrain the
solution based on what is known. Also, as evidence is accumulated temporally to sup-
port a certain solution, the lack of evidence in the immediate future (next frame or so)
should not change the solution, unless the data (evidence) points otherwise. This kind
of accumulation and propagation of evidence is called temporal coherency and is an
important aspect of real applications. On the other hand, outputs produced by higher
level vision (interpretation of cues contrasted with world models) are seldom used as
a feedback to refine low level estimations. This is paradoxical since it is at this high-
level where reasoning based on the extracted low-level cues takes place. For example,
in the case of object recognition, the low-level cues are compared with previously gen-
erated object models. Therefore, once the object in question has been identified, the
high-level model of the object could be used for improving the noisy and sometimes
ambiguous low-level cues. This leads to the idea of a signal-symbol loop [22][19][18]
where high-, middle- and low-level vision systems could interchange and fuse data: e.g
a hypothesis formed by the higher level method would either be accepted by the low-
level algorithm, if it fits the data, or rejected. Such a hypothesis forming and testing
cycle would then eventually lead to improved estimates and coherency generated on all
the levels.
1.2 Contributions
In our earlier paper [22], we introduced the idea of using a priori geometrical infor-
mation to increase the coherency of the solution for a phase-based method. In this
paper, we extend the same idea into the field of variational correspondence methods
by describing a novel use for a well known temporal constraint term [7]. We show (a)
that incorporating temporal information of the solution improves robustness, (b) that
similar kind of a constraint can also be used for incorporating geometric knowledge of
the solution, thus enhancing accuracy of the results and (c) that spatial- and temporal
constraints can be used together. Although in this paper, we concentrate on expand-
ing the basic variational framework used for disparity and optical-flow calculation in
order to embed both the spatial- and temporal constraints, we also give an example of
a hypothesis forming-validation loop: a geometrical hypothesis of the scene is formed
based on low-level interpretation of what is observed and then, the hypothesis is used
to further improve the scene interpretation.
1.3 System Scheme
First of all, we describe the system schematically, making it therefore easier to follow
the rest of the text. By d we denote disparity and by (u, v) optical-flow (apparent
movement of pixels in the camera plane), while subscripts sc and tc denote spatial- and
temporal constraints, respectively. Actual methods used for calculating the disparity
and optical-flow are explained in Section 2. Fig. 1 shows the use of spatial constraint,
d
sc
in disparity calculation; while Fig. 2 shows the same for optical-flow calculation
with the generation of predicted temporal constraints u
p
and v
p
.
We proceed by reviewing related work in Section 1.4 and then, describe the varia-
tional methods for disparity and optical-flow calculation, with the new terms, in Section
2. Both quantitative and qualitative results of the conducted experiments are given in
Section 3. Finally, the obtained conclusions are discussed in the last section.
2
Disparity
Left
Right
Dsc
D
Figure 1: Use of spatial constraint in disparity calculation. Dsc (d
sc
in the text) is the
spatial constraint while D is the solution (disparity). This particular case shows how
the knowledge of the geometrical setup, in this case, related to the form of the road, of
the scene can be used to constrain the solution.
I(t)
I(t-1)
Vsc
Vtc
V
U
Usc
Utc
Optical-flow
Vp
Up
Figure 2: Use of spatial and temporal constraints in optical-flow calculation. Usc and
Vsc (u
sc
and v
sc
in the text) are the spatial constraints while Utc and Vtc (u
tc
and v
tc
in
the text) are the temporal constraints. Up and Vp (u
p
and v
p
in the text) is the predicted
optical-flow at time t + 1.
1.4 Related Work
The idea of using temporal information in optical-flow calculation is certainly not new.
Amongst the first works on the use of using both spatial and temporal information
3
as energy terms are those by Black and Anandan [7][6]. In their work, they propose
causality of the solution in the form of the temporal term. Roughly speaking, en-
ergy based methods incorporating temporal information can be divided into causal-
and batch processing. In the case of causal processing, a solution calculated at t is
propagated forward in time, for example to t + 1, and then is used to improve tem-
poral coherence of the solution starting from t + 1. On the other hand, in the case of
batch processing, the complete sequence of interest is processed at once: in this case,
a 3D-regularization (smoothness) term is needed. Batch type processing methods are
less suitable for real-time implementations than the causal type due to their increased
demand of processing power. More recent methods incorporating temporal terms are
those of Werlberger et al. [31], Weickert and Schn
¨
orr [30], and by Salgado and S
´
anchez
[25]. The first, by Werlberger, can be regarded to be causal whereas the latter belong to
the batch type. As is mentioned in both [25] and [31], incorporating temporal informa-
tion raises an additional challenge of modeling the movement between several frames.
If the movement is modeled by being symmetrical both forward and backward in time
between several frames, this imposes restrictions on the movement: it is expected to be
of a constant velocity (acceleration zero). As can be expected, and is shown in [31],
models accepting only constant velocities do fine as long as this assumption is not bro-
ken and actually perform worse when it does not hold. In our causal model, movement
is modeled by having both the velocity and acceleration components, as in [7], and
therefore, the model does not suffer from this shortcoming.
Our work differs from the above mentioned ones in that we use geometrical knowl-
edge of the scene for constraining the disparity solution and incorporate both the spatial-
and temporal constraints simultaneously for the optical-flow. We also give an example
of a hypothesis forming-validation loop: a hypothesis of a plane is made based on
the data after which, iteratively, the hypothesis is verified against the data and used to
constrain the solution.
2 Variational Stereo and Optical-flow
Since variational methods incorporate a regularization term, they are, in general, robust
and relatively well performing. An additional benefit is that they have a solid mathe-
matical base, with existing efficient solvers [12], they are well understood with clearly
definable terms and therefore, extending these methods is straightforward [3][29][10][11].
We start by introducing the used notation and then, continue to the energy functionals
with the added spatial and temporal constraint terms. Finally, each of the terms is
described in more detail.
Table 1 has been included for convenience, since three different types of error func-
tions are used in the model. The use of each error function will be justified later on in
the text. In the data terms, I
{L,R},k,t
refers to a k:th channel of left or right image, defined
by subindex L or R, at time t. By channel here we mean channel of a vector valued
image, such as RGB. Without k written explicitly, all channels are referred. I
w
{L,R},k,t
refers to a warped version of the image [3][10]. In the error function case, subindices
refer to functionality or, in other words, how the error function in question is used in
the model. Thus, D, R, CS, and CT refer to data, regularization, constraint-spatial, and
constraint-temporal, respectively. The functionals for stereo and optical-flow are given
in (1) and (2), respectively.
4
Table 1: Used notation for images and error functions
DATA TERMS
I
{L,R},k,t
= I
{L,R}
(x, y, k,t)
I
w
{L,R},k,t
= I
{L,R}
(x + u, y + v, k, t) optical-flow
I
w
{L,R},k,t
= I
{L,R}
(x + d, y, k,t) disparity
ERROR AND CORRESPONDING INFLUENCE FUNCTIONS
Ψ
D
(s
2
) =
s
2
+
ε
2
Ψ
D
(s
2
) = 1/
s
2
+
ε
2
Ψ
R
(s
2
) =
s
2
+
ε
2
Ψ
R
(s
2
) = 1/
s
2
+
ε
2
Ψ
CS
(s
2
) = ln(1 + s
2
/
λ
2
)
λ
2
Ψ
CS
(s
2
) = 1/(1 + s
2
/
λ
2
)
Ψ
CT
(s
2
) = exp(s
2
/
λ
2
)(
λ
2
) Ψ
CT
(s
2
) = exp(s
2
/
λ
2
)
E(d) =
D(I
L,1
,I
R,1
,d) +
α
S(I
L,1
,d)
dx (1)
+
γ
s
C
s
(d
sc
,d)
dx
E(u, v) =
D(I
L,1
,I
L,0
,u, v) +
α
S(I
L,1
,u, v)
dx (2)
+
γ
s
C
s
(u
sc
,v
sc
,u, v)
dx
+
γ
t
C
t
(u
tc
,v
tc
,u, v)
dx
where the data terms for stereo and optical-flow are D(I
L,1
,I
R,1
,d) and D(I
L,1
,I
L,0
,u, v),
while S(I
L,1
,d) and S(I
L,1
,u, v) are the regularization terms, respectively, and
α
> 0 is the weight of the smoothness term. The spatial constraint for stereo is
C
s
(d
sc
,d), where d
sc
is the constraining value. C
s
(u
sc
,v
sc
,u, v) is the spatial constraint
for optical-flow where u
sc
and v
sc
are the constraints arising from geometry.
C
t
(u
tc
,v
tc
,u, v) is the temporal constraint for optical-flow, where u
tc
and v
tc
are the
predicted values.
γ
s
> 0 and
γ
t
> 0 are the spatial- and temporal constraints’ weights,
defining the influence of the new terms.
2.1 Data Terms
As is shown by Ralli et al. in [23], image representation plays a vital role in the data
constancy term. Therefore, we have chosen to use a combination of a gradient and a
gradient magnitude, since these are capable of producing reliable results under both (a)
illumination errors and (b) image noise [23]. As can be observed from (3) and (4), late
linearization of the data terms is used [3][20].
5
D(I
L,1
,I
R,1
,d) =
b
1
K
k=1
Ψ
D
I
L,1,k
x
I
w
R,1,k
x
2
+ b
1
K
k=1
Ψ
D
I
L,1,k
y
I
w
R,1,k
y
2
+ b
2
K
k=1
Ψ
D
|I
L,1,k
I
w
R,1,k
|
2
= b
1
K
k=1
Ψ
D
I
L,1,k
x
I
w
R,1,k
x
2
+ b
1
K
k=1
Ψ
D
I
L,1,k
y
I
w
R,1,k
y
2
+ b
2
K
k=1
Ψ
D
I
L,1,k
x
I
w
R,1,k
x
2
+
I
L,1,k
y
I
w
R,1,k
y
2
(3)
D(I
L,1
,I
L,0
,u, v) =
b
1
K
k=1
Ψ
D
I
L,1,k
x
I
w
L,0,k
x
2
+ b
1
K
k=1
Ψ
D
I
L,1,k
y
I
w
L,0,k
y
2
+ b
2
K
k=1
Ψ
D
|I
L,1,k
I
w
L,0,k
|
2
= b
1
K
k=1
Ψ
D
I
L,1,k
x
I
w
L,0,k
x
2
+ b
1
K
k=1
Ψ
D
I
L,1,k
y
I
w
L,0,k
y
2
+ b
2
K
k=1
Ψ
D
I
L,1,k
x
I
w
L,0,k
x
2
+
I
L,1,k
y
I
w
L,0,k
y
2
(4)
where the spatial gradient operator is given by := (
x
,
y
)
T
and b
1
> 0 and b
2
> 0
are the weights of the data terms. The benefit of non-linearized constancy terms (or
late linearization) is that the model copes better with large displacements, especially
when used together with multi-resolution strategy where the solution is propagated
from coarse to finer levels together with warping. Another benefit is that the full range
of information available in the images is used. Instead of using quadratic error function,
we use Ψ
D
(s
2
) =
s
2
+
ε
2
[10][9], which is applied individually to each channel in
each term [32].
ε
is used for stabilization [1] where s
2
is near zero. This kind of a
robust error function gives less importance to outliers in the data, such as occlusions
and image noise.
6
2.2 Regularization Terms
For regularization, we have used both image- and flow-driven isotropic regularizations
and a combination (mixed regularization) of the aforementioned. In the case of the
mixed regularization, we propose swapping between image- and flow-driven regular-
izations. For example, on every fourth iteration, we use image-driven regularization
instead of flow-driven one. Regularization based solely on image information can pre-
vent smoothening within objects since not all image borders necessarily adjust with
object borders. On the other hand, where image borders indeed do coincide with object
borders, such information can prevent over smoothening across objects. Mixed regular-
ization approach is both computationally attractive, since image-driven regularization
is linear and thus, the diffusion weights need to be calculated only once per image, and
it also combines information of both the image- and the flow-field. There are more
complex and elaborated ways of combining both the image- and flow information in
order to achieve excellent results [32], and we expect that using such regularization
terms would further improve the results obtained in this paper. Regularization terms
for stereo- and optical-flow are given in (5) and (6), respectively.
S(I
L,1
,d)
=
g(|I
L,1
|
2
)(|d|
2
) if i-d,
Ψ
R
(|d|
2
) if f-d
(5)
S(I
L,1
,u, v) =
g(|I
L,1
|
2
)(|u|
2
+|v|
2
) if i-d,
Ψ
R
(|u|
2
+|v|
2
) if f-d
(6)
where the error functions are Ψ
R
(s
2
) =
s
2
+
ε
2
as in [9] and g(s
2
) = 1/(1 + s
2
/
λ
2
)
[2][21]. Here, i-d and f-d denote image-driven and flow-driven regularization, re-
spectively. The purpose of the error functions is to prevent the regularization term
from smoothening across object boundaries and thus, to make the solution piece-wise
smooth. In the g(s
2
) case,
λ
is a parameter indicating which strength’ of the edges of
the image are regarded important and thus, it controls diffusion strength. Again,
ε
is
used for stabilization where s
2
is near zero.
2.3 Spatial- and Temporal Constraints
The last points that need to be defined are the spatial- and temporal constraint terms,
for both stereo and optical-flow, which are given in (7), (8), and (9).
C
s
(d
sc
,d) = Ψ
CS
(d
sc
d)
2
(7)
C
s
(u
sc
,v
sc
,u, v) = Ψ
CS
(u
sc
u)
2
+ Ψ
CS
(v
sc
v)
2
(8)
C
t
(u
tc
,v
tc
,u, v) = Ψ
CT
(u
tc
u)
2
+ Ψ
CT
(v
tc
v)
2
(9)
where Ψ
CS
(s
2
) = ln(1+
s
2
λ
2
)
λ
2
and Ψ
CT
(s
2
) = exp(
s
2
λ
2
)(
λ
2
) are robust non-quadratic
error functions. u
tc
and v
tc
are the temporal constraints, whereas d
sc
, u
sc
, and v
sc
are
the spatial constraints. The constraints function as priors, therefore, in a sense guiding
the solution towards the constraint.
λ
is a parameter, that depends on the image scale,
used for determining the shape of the influence function: where the constraint does
not fit the data, its influence upon the solution is rejected. Influence functions of the
corresponding error functions are displayed graphically in Fig. 3.
7
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
s
2
Ψ′
Ψ
CS
Ψ
CT
Figure 3: Influence functions Ψ
CS
(s
2
) = 1/(1 +
s
2
λ
2
) and Ψ
CT
(s
2
) = exp(
s
2
λ
2
) for
λ
of 0.2
As can be observed from Fig. 3, the influence function based on the exponential
error function approaches zero faster than the influence function based on the logarith-
mic error function. For temporal constraints, we have chosen to use the exponential
error function, since we expect that the temporal displacements are small and, there-
fore, steeper influence functions are preferred. Thus, if there is no proper temporal
continuum (e.g. a new object enters in the image) the temporal constraint should be
rejected and the solution should be based solely on the data and smoothness terms.
On the other hand, the expected dynamic range of the disparities is higher and, there-
fore, in this case, it is beneficial that the influence function has a longer non-zero tail.
The shape of the function is controlled by
λ
. This is clearly beneficial, since the dis-
placement depends on the scale (multi-resolution processing) and thus, the shape of the
function can be adapted as per scale.
Now that the new terms have been introduced, the question from where we obtain
the actual constraints remains. We partially we answered this question in Sections 1
and 1.3. Spatial constraints reflect our knowledge related to the spatial properties of the
scene setup. In [24] Ralli et al. show how this kind of information related to the scene
setup can be deduced from the initial disparity map. In the case of optical-flow, the
spatial constraint can be embedded, for example, in the fundamental matrix F [14][16]
and it has been studied in [28][27]. On the other hand, the temporal constraint is used
for applying temporal coherency upon the solution and thus, reflects knowledge related
to how the observed scene is expected to change.
2.4 Predicted Temporal Constraints
For the temporal constraints, we need approximations of u(x, y,t) and v(x, y,t) at t + 1.
At first sight this seems simple enough, but actually, the problem is two folded. We
already know the apparent movement in the camera plane at time t, since this is exactly
what we have calculated and is expressed by (u, v). Therefore, we know how each of
the pixels move between times t and t + 1, but what we do not know is what is the
actual value of u and v at t + 1. In other words, we have to know how the optical-flow
8
changes temporally. This can be expressed, for example, using Taylor series as given
in (10).
u
p
= u(x,y,t + 1)
= u(x,y,t) +
u(x, y,t)
t
+
n=2
u
(n)
(x, y,t)
n!
v
p
= v(x,y,t + 1)
= v(x,y,t) +
v(x, y,t)
t
+
n=2
v
(n)
(x, y,t)
n!
(10)
where superscript (n) stands for the n:th derivative of the function, and u
p
and v
p
are
the predicted values. We have used an approximation up to the first order (discarding
higher order terms), where the first order is approximated from the current and last
approximation: for example
u(x,y,t)
t
u(x, y,t) u(x,y,t 1). Physical interpretation
of these terms (
u(x, y,t)/
t and
v(x, y,t)/
t) is the acceleration of the optical-flow.
In order to account for the current movement, as explained earlier, each of the predicted
terms needs to be warped as expressed in equation (11).
u
tc
= u
p
(x u, y v)
v
tc
= v
p
(x u, y v)
(11)
where u
tc
and v
tc
are the actual temporal constraints.
2.5 Solving the Equations
The energy functionals (1) and (2) are minimized using their corresponding Euler-
Lagrange (see Section A.0.2) equations: a necessary, but not sufficient, condition for
a minimum (or a maximum) is for the Euler-Lagrange equation(s) to be zero. Be-
cause of the late linearization of the data terms, the model has the benefit of coping
with large displacements, which, however, comes at a price: the energy functionals are
non-convex. Due to the non-convexity, many local minimas possibly exist and there-
fore, finding a suitable relevant minimizer becomes more difficult. Another difficulty
is due to the non-linearity of the robust error functions. One such way of finding a rel-
evant minimizer are the so called continuation methods (e.g. Graduated Nonconvexity
[8]): search for a suitable solution is started from a simpler, smoothened version of
the problem which is used to initialize the search at a finer scale. This is also known
as a coarse-to-fine multi-resolution (or multigrid) strategy [3][10][13] and is also used
to speed up convergence, for example, in solving PDEs (partial differential equations)
[26]. In order to deal with the non-linearities, a lagged diffusivity fixed point method
is used [11][9]. The solver used for the linearized versions of the equations is ALR
(Alternating Line Relaxation) which a Gauss-Seidel type block solver [11][26].
3 Experiments
We have evaluated effects of the spatial and temporal constraints, both quantitatively
and qualitatively, using known test images from the Middlebury
2
database [17][4] and
2
http://vision.middlebury.edu/
9
images from the DRIVSCO
3
and GRASP
4
projects. Quantitative results justify the
model, but in fact, we are more interested in knowing how the model behaves with real
images.
3.1 Error Metrics
Formulae of the used error metrics are given in Equations (12), (13), and (14).
MAE :=
1
n
n
i=1
abs
(d)
i
(d
gt
)
i
(12)
C :=
1
n
#
i|abs
(d)
i
(d
gt
)
i
1
(13)
AAE :=
1
n
n
i=1
(u
gt
)
i
u
i
+ (v
gt
)
i
v
i
+ 1
(u
gt
)
2
i
+ (v
gt
)
2
i
+ 1

(u)
2
i
+ (v)
2
i
+ 1
(14)
where n is the number of pixels, d
gt
u
gt
, and v
gt
are the ground truths of the disparity and
optical-flow maps. MAE is the mean average error, C is the percentage of disparities
with absolute error of 1 or smaller, and AAE is the average angular error [5].
3.2 Quantitative Results for Spatial Constraint in Disparity Calcu-
lation
The aim of this experiment is to demonstrate how spatial knowledge of the scenery,
especially in difficult cases where disparity calculation normally tends to fail, can be
used to enhance the results. The reason for not testing with the standard set of Middle-
bury images (Tsukuba, Venus, Teddy, and Cones) is that we consider those to be fairly
simple’, with more or less planar surfaces containing texture, and therefore, these do
not reflect the challenges related to real applications which we are dealing with. Fig. 4
shows the used stereo-images. Parameters are the same through all the experiments.
(a) (b) (c)
Figure 4: (a) Monopoly (b) Midd1, and (c) Midd2
It can be seen that in the case of the chosen test images (Fig. 4), the backgrounds
do not contain clearly visible structure that could be used for establishing correspon-
dences. Therefore, it is expected that any method generating correspondences based
on image structures (such as edges, corners, or differentiable pixels) will fail for most
of the background. The idea of using spatial constraint, therefore, is to use a priori
3
http://www.pspc.dibe.unige.it/
~
drivsco/
4
http://www.csc.kth.se/grasp/
10
knowledge of the solution in order to improve the results in such cases. Fig. 5 shows
the results with and without the use of a spatial constraint. The spatial constraint used in
this case is taken from the ground-truth and is the background. Results corresponding
to Fig. 5 are given numerically in Table 2.
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
(j) (k) (l)
Figure 5: (a) Monopoly ground-truth (b) Midd1 ground-truth (c) Midd2 ground-truth
(d) Monopoly, mixed-diffusion without SC (e) Midd1, mixed-diffusion without SC (f)
Midd2, mixed-diffusion without SC (g) Monopoly SC (h) Midd1 SC (i) Midd2 SC (j)
Monopoly, mixed-diffusion with SC (k) Midd1, mixed-diffusion with SC (l) Midd2,
mixed-diffusion with SC. SC stands for spatial constraint.
In Fig. 5, the first row displays the ground-truths, the second row, the results with
flow-driven regularization, the third row shows the used spatial constraints for generat-
ing the results seen on the fourth row (with mixed-regularization and spatial constraint).
As can be observed, the results improve significantly, both visually and numerically,
using the spatial constraint. It can also be seen from Table 2 that by mixing both image-
and flow-field information (mixed regularization), the results have further improved.
It can be argued that the given example (and therefore, the results) are artificial
since the spatial constraints were obtained from the known ground-truths. However,
11
Table 2: Results in MAE (mean average error) and percentage of correct disparities
using different regularizations without (WO) and with (W) a spatial constraint.
MAE
WO spatial constraint W spatial constraint
flow image mixed flow image mixed
Monopoly 8.3 7.5 8.0 4.9 2.4 2.1
Midd1 6.2 5.1 5.6 1.6 1.8 1.6
Midd2 6.3 5.5 5.9 1.9 2.3 1.9
Percentage of correct disparities
WO spatial constraint W spatial constraint
flow image mixed flow image mixed
Monopoly 66.3% 64.2% 66.3% 61.1% 76.9% 83.4%
Midd1 47.0% 46.9% 47.2% 81.9% 79.3% 82.3%
Midd2 48.4% 46.5% 48.3% 73.1% 72.5% 79.3%
from the point of view of object detection, scene interpretation, or segmentation based
of disparity maps, it is not necessary for the used spatial constraint to be correct (or even
near correct). From the object detection and segmentation (based on disparity) point
of view, the main problem, in this case, is that the objects and the background tend to
’fuse’ together. Even if we did not know the correct value for the backgrounds, we
could still make a guess, in order to improve object separability from the background.
The guess that we have made here is that the background is far away (zero disparity)
from the cameras. As can be observed from Fig. 6, such as a guess has also improved
object separability notably.
(a) (b) (c)
Figure 6: (a) Monopoly (b) Midd1 (c) Midd2. In each case, the spatial constraint is 0
and mixed regularization is used.
By using a spatial constraint of zero (Fig. 6), we can still separate the background
clearly from the foreground even if the used spatial constraint is not correct.
3.3 Quantitative Results for Temporal Constraint in Optical-flow
Calculation
Here, we study the effects of the temporal constraint upon optical-flow calculation. Fig.
7 shows the used image sequences while Table 3 displays the results. The parameters
are the same through all the experiments.
In the case of (a) Rubberwhale (b) Grove2 (c) Grove3 (d) Hydrangea, and (e)
Urban3, frames 8 to 10 are used for calculating the optical-flow whereas in the (g)
Yosemite, frames 5 to 8 are used. P.ERR (prediction error) is the angular error of the
temporal constraint (approximated from previous solutions).
12