Approach

To track a template between consecutive frames, The classical Lukas-Kanade and Matthew-Baker (Inverse Compositional) algorithms are used. In essence, these algorithms find the Jacobian of the error between the frames and calculate the required transformation matrix to reduce the error.

Methodology

Some Terminology

\(x\) is pixel coordinate. \(I(x)\) is the image intensity at \(x\). \(W(x;p)\) is a warp operation being applied on \(x\) based on parameter \(p\). \(T\) is a set of coordinate points that denote the template that we want to track, i.e. a crop of the image of the object we are tracking. \(\nabla I\) is the image gradient. \(J\) is Jacobian matrix, \(H\) is Hessian matrix

Lukas-Kanade Tracker

The goal of the tracking can be summarized with the following objective function

\(p^\ast = \underset{p}{\mathrm{argmin}} \sum_{x} \| T(x) - I(W(x;p)) \|^2\)

Hence we are trying to find a warping parameter \(p\) that minimizes the L2 pixel difference between the 2nd frame and the warped 1st frame, i.e. what transformation will align the two images together well. To simplify this problem, we will breakdown \(p\) so that we only need to find the deltas for its forward-additive alignment.

\(p^\ast = p_0 + \delta p\)

The loss function is now therefore

\(p^\ast = \underset{p}{\mathrm{argmin}} \sum_{x} \| T(x) - I(W(x;p_0 + \delta p)) \|^2\)

Which can be approximated with Taylor first order series to:

\(p^\ast = \underset{\delta p}{\mathrm{argmin}} \sum_{x} \| T(x) - I(W(x;p_0) - \nabla I(x)\frac{\partial W}{\partial p_0}\delta p) \|^2\)

We then rearrange the equation to be written as a least squares formulation of \(Ax-b\)

\(p^\ast = \underset{\delta p}{\mathrm{argmin}} \sum_{x} \| \nabla I(x)\frac{\partial W}{\partial p_0}\delta p - (T(x) - I(W(x;p_0)) \|^2\)

However, we can't directly solve this with your favourite least squares solver as our initial estimate of \(p_0\) might be quite off, so we will break the Taylor Series approximation. Instead, we will iteratively solve this by computing the Jacobian \(J\), then the Hessian \(H = J^T J\) and computing \(\delta p\) with the following equation

\(\delta p = H^{-1}{\sum_{x} [\nabla I(x)\frac{\partial W}{\partial p_0}]^T{T(x) - I(W(x;p_0))}}\)

And iteratively compute \(p \leftarrow p + \delta p\) until \(\| \delta p\| < \epsilon \)

Matthew-Baker Tracker

Key advantage of the Matthew-Baker algorithm compared to Lukas-Kanade is in its efficiency, this is because of instead of warping a new image \(I\) to the template \(T\), Matthew-Baker warps the template \(T\) to image \(I\). This is useful because the template does not change from frame-to-frame. Hence, we can pre-compute the Jacobian and Hessian matrix, and re-use it throughout the tracking. The loss function is now:

\(p^\ast = \underset{\delta p}{\mathrm{argmin}} \sum_{x} \| \nabla T(x)\frac{\partial W}{\partial p_0}\delta p - (I(W(x;p_0) - T(x)) \|^2\)

And the closed-form solution to the \(\delta p\) is

\(\delta p = H^{-1}{\sum_{x} [\nabla T(x)\frac{\partial W}{\partial p_0}]^T{I(W(x;p_0)) - T(x)}}\)

Another trick used by Matthew-Baker algorithm is in the warp parameter update. Instead of doing \(p^\ast = p_0 + \delta p\), Matthew-Baker uses

\(W(x;p^\ast) = W(W(x;\delta p);p_0\)
\(W(x;p^\ast) = W(p_0)W(\delta p)^{-1}x\)

We can now mimic the Lukas-Kanade algorithm that solves for \(p^\ast\) iteratively, but now we pre-compute the Jacobian and Hessian. And updates the warp with an inverse composition.

Results

Below are the final result of Matthew-Baker algorithm tracking some objects. Algorithm performs pretty well when object does not move rapidly between frames, but it fails when there are rapid movement (ballet and horse), or when there is a dramatic change in pixel intensity values, such as when the car goes under the bridge.