The multi-touch interface challenge

By Tim Semen, Senior Consultant

tims@hiser.com.au

Gesture-based interfaces are the way of the future – but it could be a path set with user barriers if we don’t identify a set of intuitive and consistent gestures for standard commands.

Gesture-based interfaces have been around almost as long as computers have had displays. Light pens first appeared in the late 1950’s[a] (predating even the trackball and mouse) and were briefly popular in the 1980’s. The Palm device line was extremely popular from the late 1990’s (even though they came with a stylus you only managed to keep for a week before it was never seen again!). And in recent years, my wife has been wowing her students with the Tablet PC she uses daily for teaching class.

However, those devices simply allow you to poke, tap, or scribble gestures on a fairly standard interface of windows, icons, menus and pointers. As useful as these attempts were to provide a more natural interaction, they did little to bridge the “computer world” with the “real world”.

In the computer world we’ve largely been reduced to poking at things with one finger at a time through the on-screen pointer, whereas in the real world we’ve learned to use all ten digits and both hands together. Consequently, we’ve never been able to signal our intent to computers using the full range of our capabilities.

Perhaps that’s why they’ve never felt very natural to us.

The “Minority Report” interface

In 2002, the movie-going world saw the future of computer interfaces in “The Minority Report” – but this was not sci-fi, it was cutting edge reality. A quiet research topic for decades, the multi-touch interface finally caught the public eye.

What we saw was a preview of the seamless interaction between a human and a computer. Realistic objects were directly manipulated in a realistic manner - a digital photo could be rotated as if it was right in front of you. Documents could be brought up or discarded with the wave of a hand. The desktop worked like, well, your desktop.

Since the release of the movie, quite a number of similar (and working) interfaces have popped up in various places. Although this isn’t a review of those designs, they’re well worth a look on YouTube – if you’re only going to watch one, see Jeff Han’s presentation at TED2006.[b]

Physical metaphors for new interactions

These various products and demos have one thing in common - the lack of what we have come to know as “an interface”. Where are the standard windows, icons, and menus that we have come to expect? Items are directly manipulated in a similar manner to their “real world” peers – by pushing, pulling, spinning, stretching and flicking with multiple fingers. For example:

  • Zooming is accomplished by “stretching” the item between two or more fingers (instead of selecting a value from a dropdown menu).
  • Rotating a photo is accomplished by turning it between two or more fingers, like we would if it was on a desk (instead of clicking on a button).
  • Rewinding a video clip is as easy as spinning your fingers in a counter-clockwise motion.

But, wait, did I just say rewind by spinning? These interfaces are supposed to be “intuitive”, as in “immediately understood” by the person using it – but would anyone who hasn’t used a cassette or reel-to-reel player think to spin a digital video clip in the direction that tape reels used to spin?

If we’re looking to use real-world metaphors for our gesture-based interfaces, which ones will we use? Which ones won’t require historical or society-specific knowledge?

And will we need to learn different gestures as we move around and between devices?

Consistency within a device?

Have you seen Microsoft’s Surface?[c] Several impressive applications have been shown for this multi-touch device – a photo light table, snowboard customizer, wine selection helper, and even a “finger painting” application. It’s all very cool stuff – and it’s all very real stuff.

Something that stands out in the Surface demos is how items seem to take on the characteristics of a physical 3D object. They have fronts and backs, and they can be flipped over and rotated as desired, allowing all surfaces to be annotated or manipulated. For example, photos can have notes added to their reverse sides and snowboards can be flipped over for customisation.

This is all very neat until you look at the details of the interaction. In the demo, the photos have a small curled corner like a turning page to show that you can flip it - swipe that with your finger and the photo turns over. We call this visual cue an “affordance” and it’s a good way to let users know what can or cannot be done within the interface.

But the snowboard in the demo had nothing to indicate you could flip it over, so the user had to know they needed to swipe their finger across the top edge of the board to do so. This coincidentally happens to be the same gesture that was used to slide around or flip a photo to the reverse side and (before you think I’m beating on Microsoft) move sequentially through a stack of albums in Apple’s Cover Flow on the iPhone.

Consistent 2D and “3D” interactions?

It gets more complicated when 2D and “3D” gestures are combined for the same object. Say you have a stack of albums and you need to both:

  • Flip through the whole stack, and
  • Flip over an individual album.

Can you have the same swipe to do both? Probably not. How then do you accomplish both through “intuitive” direct manipulations?

And even if we can design this consistently across the rest of the interface, will we be resorting to ever more obscure multi-touch gestures to work with “3D” objects in a 2D interface? 

(Pardon me for a moment, but I’m having a flashback of the old “cheat sheets” across the tops of WordPerfect keyboards…)!

Consistency between devices?

The Dock in Apple’s OS X is a series of icons along the bottom of the screen for launching and indicating running applications. To launch, you double click on an icon. To remove the icon from the Dock, you drag it off the dock onto the desktop where it disappears in a little puff of smoke. Neat.

Within OS X, this is a very consistent experience. However, things are not quite so straightforward if you happen to have a Samsung F480 mobile phone[d]: a gesture-based touchscreen phone it too has a “dock”. There is a strip of icons for the available applications (looking remarkably similar to the OS X Dock) which you launch by dragging the icon off the “dock” and closing the app by dragging it back on the “dock”.

While it’s true that, to date, we have dealt with and adapted to variations between different interfaces, these various interfaces have maintained an overall comfortable level of consistency between them. Icons are still clicked, menus are visible for selections, and displayed options tell us what we can or can’t do to something – whether you’re on Windows, OS X, Amiga, BeOS, or something more obscure.

We do take it for granted today that certain things, like left-clicking on a document, will not delete it - no matter what device or system we are using. We need a similar degree of confidence that an action will not have unexpected or unfortunate consequences as we move between devices in the future.

Gesture-based devices will not automatically improve the situation

So why is all this a fly in the ointment for gesture-based interfaces?   In a single word – ubiquity.

In a matter of years, multi-touch interfaces are going to become staggeringly common. Even today:

  • New mobile devices are rolled out almost daily – each attempting to distinguish themselves from the competition.
  • Game systems such as the Nintendo Wii are gesture-based.
  • Windows 7 will have multi-touch built in[e] based on Microsoft’s Surface technology.[e]
  • You can be sure that the technology in the iPhone will make its way into other Apple products.

People are going to learn gestures on one device and expect the same gesture (or at least a very similar one) to have a similar effect on another device – not the complete opposite effect (like deleting their data!)

Critical questions

This leaves us with a critical question regarding the “no interface” interface – how are we going to design for it?

  • Which physical world metaphors are leveraged for our future gesture-based inputs?
  • How do we consistently allow the user to perform “3D” gesture-based actions in a 2D environment?
  • How will users know if something can’t be done, or if they’re simply using the wrong gesture?
  • How are gesture capabilities indicated?
  • How are gestures learned?
  • How do we ensure that there is a relatively consistent experience on the same device, and across other devices?

The answer to creating a well designed multi-touch gesture-based future interface lies in consistency - consistency across all gesture-based devices for basic movements and interactions so users can maintain a degree of confidence that similar gestures will not have dramatically different consequences.

However, I’m not sure this consistency extends to creating a standard set of dropdowns, icons, menus and other interface objects such as we have today. A standard way to invoke a menu, yes. But trying to create a standard menu or dropdown will, at best, stifle the development of this fantastic new medium, and at worst simply recreate the single-finger-poke interface of today on a fancy screen.

What does this mean to us as interaction designers?

It means that we need to have a broader understanding of all gesture-based interfaces – past, present, and future, not just the ones we are working on in our own projects. We need to understand how our work fits into the grand scheme of things to ensure that these hurdles, frustrations and conflicting gestures do not arise within or between different devices.

Our work in the future will be to help weave together the gesture-based interface – not a collection of independent interfaces.

Daunting? Absolutely.

Possible? Definitely.

One last thing…

There is one last question we also need to address, which is by no means the least important: accessibility. Not everyone is Tom Cruise – many people who do not have full use of their hands and fingers may be forever excluded from this new multi-touch world.

There should be no excuses for this new technology making the world a less accessible place.

We are on the cusp of a new era for interfaces. Many years ago, the WIMP interface arose based on our understanding, capabilities and limitations of the time - and defined decades of interaction practices. Now we’re looking at the exact same thing for multi-touch gesture-based interfaces – and what we do now may determine how we interact with computers for the coming decades.

If we begin to work on these design problems now - before these interfaces begin to “set” and become permanent fixtures in our lives in their current forms - we can design our way towards a more usable, accessible and friendly future.

----------

[a] Brad A. Myers. "A Brief History of Human Computer Interaction Technology." ACM interactions. Vol. 5, no. 2, March, 1998. pp. 44-54.

[b] http://www.ted.com/index.php/talks/jeff_han_demos_his_breakthrough_touchscreen.html
TED Talks - Jeff Han: Unveiling the genius of multi-touch interface design.

[c] http://on10.net/blogs/nic/Microsoft-Surface--CES-2008/Default.aspx
This demo shows some of the discrepancies between applications for gestures.

[d] http://www.gsmarena.com/samsung_mwc_08-review-215p3.php
Video of Samsung F480 widgets

[e] http://windowsteamblog.com/blogs/windowsvista/archive/2008/05/27/microsoft-demonstrates-multi-touch.aspx
Windows 7 touch demonstration



Close