Panning gesture

Panning gesture is an important kind of camera interaction in map applications. While we are all accustomed to using it every day, it might not be so easy to implement from a programmer's perspective...

Problem formulation

A panning touch gesture starts at a certain screen position $s_0$ with initial camera state $c_0$. Each time ($t$) the touch position changes to $s_t=s_0+\Delta_t$, the camera $c_t$ needs to be updated accordingly.

I. A naive screen-space approach

We can simply apply some offset in x- and y- directions in camera space, that is, camera's right- and up- directions in world space, to camera's position:

$$c_t[pos]=c_0[pos] - \lambda(\Delta_t[x] c_0[right] + \Delta_t[y] c_0[up])$$

while maintaining the camera's look direction.

Try to adjust $\lambda$ and see what happens. Apparently too small or too large the value results in unnatural experience, and a fixed value would never suit all zoom levels.

We can make it better by calculating $\lambda$ precisely.

II. Calibrated screen-space panning

We want the world moving consistently with touch movement on the screen, no more, no less. This can be achieved by calibrating $\lambda$ adaptively:

$$\lambda=\frac{2 dist(c_0[pos], c_0[target]) tan(\frac{c_0[fovy]}{2})} {viewport[height]}$$

Despite its complex form, the idea behind is quite simple: making the connection between world-space length and screen-space length. Note that $target$ could either be the world position of screen center on the ground, or touching position, depending on your need. Here I'll go with the former one since it's simpler.

III. World-space panning

In my experience, screen-space panning is primarily used in CAD applications or debug tools but not games or maps, since it's not friendly for end-users who want to view the scene. Most map apps adopt another control fashion - world-space panning, where the camera moves on a hypothetical plane parallel to the ground during panning:

As for the implementation, well, it's not something brand new. On the foundation of the previous method, all we need to do is project the y- movement on the plane, by replacing the camera's up by the product of ground's up and camera's right.

Also note that a compensation term $u$ regarding the camera tilting is usually needed, or the panning would be too slow with large tilt angles due to perspective foreshortening.

$$u=\frac{1}{cos(c_0[tilt])}$$

And the solution turns out to be:

$$c_t[pos]=c_0[pos] + \lambda(\Delta_t[x] c_0[right] + u\Delta_t[y] ((0, 0, 1) \times c_0[right]))$$

IV. Tracing world-space panning

If our goal is to make the starting world position always stay with the touch, then we need some extra work, for the above algorithm assumes a fixed $\lambda$ during panning where it should actually be be variant: the further the touching position is, the larger.

We can take a different approach that solves this problem elegantly. Initially the world coordinate at touching position $s_0$ is $w_0$. At the moment $t$ the touch moves to a new position, while the camera remains its last state. As a result, the touching world position becomes $w_t$ but not $w_0$. To correct it, we have to translate the camera in the opposite direction:

$$c_t[pos, target]=c_t[pos, target]+(w_0-w_t)$$

There remains only one problem to solve: how to calculate the world position corresponding to a screen coordinate? Note the world position is always on the ground, the problem can be transformed to solving of the intersection between a ray and a plane with $z=0$, which has pretty efficient solution in real-time.

This method has its own downsides though. For one, the world position must be computable, so map applications that render the world as an virtual globe would have problems when the touch lies outside the globe. Another problem is that while the world position follows the touch faithfully, it can be moving too fast and annoys users if the touch is near the horizon. So map apps generally prefer the non-tracing approach in practice.

V. 3D tracing world-space panning

The best place to apply tracing approach is scenes with 3D terrain. If the touch points to the side of a mountain, it is generally expected that position stays with the touch.

The major challenge of it, compared to the 2D version, is that the touching position is not on the ground. Yet there's a simple workaround: after the initial position $w_0$ is determined (through ray-object intersection solving), we can treat the plane $z=w_0[z]$ as the ground, and perform subsequence operations as before.

3D Model courtesy of Google from Poly

DemoHut