Zoe: A Virtual Talking Head That Can Express Human Emotions

Posted: March 21, 2013 by phaedrap1 in News, Science
Tags: Virtual Reality

– Scientists are making a lot of progress in the area of artificial intelligence.

We have previously seen examples of robots like Nico that can learn how to become self-ware.

Researchers are also working on the first ever Super-Turing computer based on Analog Recurrent Neural Networks. A Super-Turing machine should be as adaptable and intelligent as the human brain.

Now, a group of researchers just announced they have successfully developed Zoe, a digital talking head which can express human emotions on demand with “unprecedented realism” and could herald a new era of human-computer interaction.

According to the developers, this virtual “talking head” can express a full range of human emotions and could be used as a digital personal assistant, or to replace texting with “face messaging”.

The lifelike face can display emotions such as happiness, anger, and fear, and changes its voice to suit any feeling the user wants it to simulate. Users can type in any message, specifying the requisite emotion as well, and the face recites the text. According to its designers, it is the most expressive controllable avatar ever created, replicating human emotions with unprecedented realism.

Meet Zoe, digital talking head and interface of the future. The virtual talking head, “Zoe”, uses a basic set of six simulated emotions which can then be adjusted and combined. (Credit: Toshiba Cambridge Research Lab / Department of Engineering, University of Cambridge)

The system, called “Zoe,” is the result of a collaboration between researchers at Toshiba’s Cambridge Research Lab and the University of Cambridge’s Department of Engineering. Students have already spotted a striking resemblance between the disembodied head and Holly, the ship’s computer in the British sci-fi comedy, Red Dwarf.

Appropriately enough, the face is actually that of Zoe Lister, an actress perhaps best-known as Zoe Carpenter in the Channel 4 series, Hollyoaks. To recreate her face and voice, researchers spent several days recording Zoe’s speech and facial expressions. The result is a system that is light enough to work in mobile technology, and could be used as a personal assistant in smartphones, or to “face message” friends.

The framework behind “Zoe” is also a template that, before long, could enable people to upload their own faces and voices, but in a matter of seconds, rather than days.That means that in the future, users will be able to customise and personalise their own, emotionally realistic, digital assistants.

If this can be developed, then a user could, for example, text the message “I’m going to be late” and ask it to set the emotion to “frustrated.”

Their friend would then receive a “face message” that looked like the sender, repeating the message in a frustrated way.

“This technology could be the start of a whole new generation of interfaces which make interacting with a computer much more like talking to another human being,” Professor Roberto Cipolla, from the Department of Engineering, University of Cambridge, said.

“It took us days to create Zoe, because we had to start from scratch and teach the system to understand language and expression. Now that it already understands those things, it shouldn’t be too hard to transfer the same blueprint to a different voice and face.”

Holly, in the British sci-fi comedy, Red Dwarf is an “intelligent” computer. Holly’s user interface appears on ship screens as a disembodied human head on a black background, and can also be downloaded into a watch worn.

As well as being more expressive than any previous system, Zoe is also remarkably data-light. The program used to run her is just tens of megabytes in size, which means that it can be easily incorporated into even the smallest computer devices, including tablets and smartphones.

It works by using a set of fundamental, “primary colour” emotions. Zoe’s voice, for example, has six basic settings — Happy, Sad, Tender, Angry, Afraid and Neutral. The user can adjust these settings to different levels, as well as altering the pitch, speed and depth of the voice itself.

By combining these levels, it becomes possible to pre-set or create almost infinite emotional combinations. For instance, combining happiness with tenderness and slightly increasing the speed and depth of the voice makes it sound friendly and welcoming. A combination of speed, anger and fear makes Zoe sound as if she is panicking. This allows for a level of emotional subtlety which, the designers say, has not been possible in other avatars like Zoe until now.

“Present day human-computer interaction still revolves around typing at a keyboard or moving and pointing with a mouse.” Cipolla added.

“For a lot of people, that makes computers difficult and frustrating to use.

In the future, we will be able to open up computing to far more people if they can speak and gesture to machines in a more natural way. That is why we created Zoe — a more expressive, emotionally responsive face that human beings can actually have a conversation with.”