Ever since that keynote in 2007, we were introduced to the importance of how humans interact with machines or UI, the touch screen that made machines ever so personal. Another thing we realized is the word association of “Yachhh!” and a stylus — who wants a stylus? which created the very basic notion that there IS bad UI, and if there is bad there must be good UI.
GUI is not UI
It’s just one (albeit the dominant today) way of interfacing with machines, by a graphical user interface. machine<>human UI has made tremendous progress in the past 50 years:
Today with the advancements in processing, microchipping, machine learning, etc. engineers are working on the next step in the evolution of UI, not merely the next iteration but the next step, a major one.
Think about how you felt when you saw Steve Jobs typing on that tiny on-screen keyboard and thinking to yourself “no way it’s that precise…” but then when you got a hold of one you immediately saw that you are willing to have some room for errors or false-positives in a UI feature (especially one as important as the main input/output) as long as in the overall experience you get much more out of it WITH the error margin.
What will Touch-less, or, the next evolution of UI might entail for how we interact with machines?
Based on this patent released just over a month ago, Apple is planning on taking the touch interface to a whole new level, but one that is still on the same path. The patent describes how you can make the touch keyboards feel more real (actually use a morphing screen texture) and much less prone to typos, and while that’s great news and there’s no reason not to improve our main I/O with the device, it’s not the next step in the evolution of UI.
This paper for example is much more on the point as it tries to combine what is already considered an accepted machine <> human UI (more on that later) with new ways of interpreting human communication behavior to create the right UIs for the next evolution of the entire tech industry, it’s called Multimodal.
30% of our communication is verbal
If we’re interfacing with machines only via the keyboard I/O, we’re not using 70% of our other, natural communications skills, how can we “use” them as an interface, that’s where the multimodal approach comes in. As an example in the paper, they use the pupil of the eye as a mouse cursor because they can track it individually and thus their sensors will be less distracted and more accurate but most important of all — feel natural to the user. One example is how to make hand gestures be less of a burden to the user.
A product I used in 2013 and wanted to be the next step of UI was the LEAP Motion, it was a tiny little depth-of-field + IR + some tech USB connector you place in front of your keyboard which lets you use “natural” hand gesture to draw or control the mouse with. It was cool for about 15 minutes and to demo around, but it hurt my hands after about 3 minutes because holding them up in the air above the sensor is not natural, comfortable or just right. A good approach to a better hand gesture would be to try and understand from context what the user might be doing and minimize all other gesture recognition to fine-tune the “noise” and hit the right action, i.e., if you’re in the music app, the sensor will only look for Next, Previous, Play/Pause, Searchgestures — that way it only needs to find 1 out of 4 gestures instead of everything it has in its data base.
Who’s to say touching or clicking is the best way to interact with machines?
Touch UI came as a better way to control/replace the mouse and is mainly around relatively small devices rather than powerful workstations, meaning the mouse and keyboard are still very much in control. But as we know, 70% of our communication is not verbal and a mouse gesture is not verbal, it still requires you to touch something, so in what other ways we could interface with machines that are not touch?
Many companies today are focused not only on the development of new sensors but also on training or teaching or fine-tuning these sensors to “look” for very specific behavior or gestures to determine if the user wishes to perform a certain function.
Their premise is that every interaction in the machine<>human UI is a determined one, for example, if the user clicks on an on-screen button in a touch UI it determines (almost) for sure, what the user wishes to do. It’s a natural way of looking at UIs because while most of us grew up into a mouse/keyboard/touch UIs and we knew there’s one way of using it — taping into other human communication habits requires starting from what feels natural and focusing on improving that.
Tesla does a great job of multimodal sensors to enable just Auto-pilot, which may I remind you is a US federally approved technology standard to let machines take over something as precious as our lives. But that’s not a classic case of UI because the interface is between the car and the environment where the user only needs to be notified, but the combination of various technologies is built on intent — what is the intent of the car in the lane next to me trying to cut me?
Natural user interface is the result of analyzing the intent behind the interaction and the context around it. Since we now can use the eyes, hands, head, mouth, fingers, and dare I say it — our thoughts, to control the interface, we can enjoy a wider selection of UI approaches. Yet, with options come challenges, for example, I can use hand gestures and eye tracking to point and click in an AR-based experience, but I risk more false-positives as I am sensing two user interfaces (hands and eyes) and figuring out how to prioritize them to determine what the user wants to do. Luckily this is mostly solved on the software level and by assigning certain control for the eyes and other ones for hand gestures, but the approach is to make it natural for the user. If we can achieve a natural interface it might change how we see interaction with technology altogether, especially in the eyes of users will grow into it (probably born after 2008), but as of now it’s not there yet. why is that?
Lowest common denominators
If you build it — they will come, right? wrong. To get the mainstream audience to see any UI as natural means most users won’t even notice they’re “interfacing” at all and rather feel comfortable and, you guessed it — natural from the first use. You see there’s nothing natural about the keyboard and the mouse, they’re just there all the time, and now with AR/VR/MR (and the big companies getting behind it), we need to make all old-new UIs feel natural. By old, I mean waving your hand or winking, and by new, I mean that you will wave your hand to unlock your car and wink twice to start the engine. This is why AR/MR will clearly be mainstream first and VR only after that, it is because VR expects too much from the everyday user — a full immersion and a completely new UI, while AR eases the transition a bit by not ignoring the actual environment the user is in. Unfortunately, we can’t benchmark new UI disruption, so we can only look back and realize that the mouse/keyboard is not here to stay and we very well get used to the idea it will not come back, at least not in it’s formal glory
Shipped devices and UI iterations
Jobs mentioned in the famous keynote, that another problem or challenge with physical keyboards had was the fact that you can’t change something in UI once the device is shipped. The same can be said about the quality and grade of the multimodal sensors, once they are shipped in the device, only in the next iteration of the device will you be able to update your hardware and as a result — your UI, but the engineering behind these sensors is more about software than hardware today, and big R&D teams can do miracles with the software behind the hardware.
Who cares about UI anyway?
This whole research about the next evolution in UI started from cars. I am working with a company in the automotive industry, and there we had to spend some serious though time about what is the UX while you’re riding, but not driving, in a vehicle, then deriving from that what will be the best UI to implement.
Our considerations were: Avoid nausea — it is known since we were kids that reading while in a car causes nausea. We don’t want to create a heavy, time-consuming UI that will make the user feel nauseous by the time the ride ends.
Should be captivating — it should serve a purpose, and by purpose I mean the user needs to understand that this process, flow or journey will end very quickly, in fact, we made it so the user technically needs only 3 steps out of 10 to complete the process.
Should be ‘gamified’ — well why wouldn’t it? I guess many UX’s can be gamified and we certainly did so we made it into more a game where the user feels instant gratification towards a bigger goal and a bigger reward.
Should be personal — because that’s a sure way to get user stick through a processor retaining them. We made it a learning process but one that makes it so clear in the UI that we learn from any action the user is doing and make it so he/she is looking forward to the next experience with us.
Should be super-easy to get — this is a part that is truly dear to my heart when it comes to guiding a user or a person through some sort of journey, whether its a student or a grocery shopper and that is: Using existing knowledge to make the gap smaller. We used a well-known UI hack and fitted it to perfection to our needs so the user “gets” it from the get-go.
And of course by combining everything above and making each consideration solution feed the next one down the user path — it all made sense. That’s the thing about UI’s, They’re either derived from the wanted UX or they are bold and try something new like the Mouse, Touch, etc.