Thursday, July 7, 2016

Student creates the first EB GUIDE human-robot interface

Dr. Dominique Massonié

Student creates the first EB GUIDE human-robot interface

For his thesis, bachelor’s degree candidate Thomas Ranzenberger of Technische Hochschule Nürnberg Georg Simon Ohm built an extension for EB GUIDE—a tool for building HMIs—and used it to create a human-robot interface. To see the human-robot interaction powered by EB GUIDE, take a look at this video:


By loading the video, you agree to YouTube’s privacy policy.
Learn more

Load video

Thomas extended the EB GUIDE modeling framework to develop an interface that allows simple speech interaction between a user and the robot. A user can ask the robot a question which requires that the robot request information from the cloud (a weather service, for example).

While the user speaks, the robot moves its head slightly up and down, and its LED eye is randomly lit, giving the user feedback that the robot is recognizing what’s being said. The interface performs grammar-based recognition for the search command and then activates the cloud-based recognition.

Robot begins thinking motions

At the same time, the robot runs its thinking animation, which moves the right arm near the head and moves the head up a bit. If the cloud result is available, the robot stops the thinking animation and answers the user. For example, “Tomorrow, it will be between 33 and 48 degrees Fahrenheit and rainy in Berlin.”

Robot returns to rest motion


How did he do it?

For his work, Thomas used a humanoid robot named NAO, which includes four microphones for audio input and two speakers for output, in addition to two cameras. The interface he developed uses face tracking and speech input. To simulate a human-like interaction, when it responds, the robot simultaneously speaks and gestures.

A realistic and effective robot interaction needs to be as human-like as possible. That’s a challenge because it requires subtle gestures on the robot’s part during listening, speaking, and thinking modes. And it requires that these gestures are combined with speech when the robot responds to the user. The interaction must seamlessly integrate multiple modalities. Behind the scenes, it calls for many user profiles, configurations, and scenarios to define, run, and test. Human-machine interface (HMI) tools like EB GUIDE make it faster and more efficient to specify, prototype, and conduct usability tests of these complex interactions.

EB GUIDE already handles multiple modalities, like speech and haptics, through one configuration model. In addition, EB GUIDE supports plugins, allowing creative users like Thomas to extend it with new modalities, like gestures, and new applications, like robot interaction.Thomas extended EB GUIDE to control and use a robot for speech synthesis, speech recognition, and gestures. The extension that he developed included an inventory of the robot gestures that control the NAOqi Framework, making it easier to integrate them into the speech dialog model created with EB GUIDE.

Using parallel state machines

With EB GUIDE and his extension, Thomas was able to create the multimodal user interface using similar models for the different modalities, such as speech and gestures. EB GUIDE uses the model concept of state machines to describe the flow of interaction with the user and modality-specific interaction elements within states that control user input or system output.

Running parallel state machines, each machine is responsible for executing a modality-specific business logic. Events received can be processed by all currently active state machines, enabling synchronization. And that allows for complex patterns, such as providing instructions to a robot via speech and/or UI control.

As Thomas describes in the summary of his thesis, Interacting with Robots—Tooling and Framework for Advanced Speech User Interfaces, “Our modeling tool EB GUIDE Studio provides separate state machines to specify speech dialog and robot gestures and generates a model that can be read by the target framework…[The] Target Framework is the platform dependent runtime environment to run the exported model e.g. on Windows, Android, or Linux. It includes the speech dialog manager (SDM) which provides different possibilities for speech recognition. SDM can use a grammar based recognizer which is contained in the Speech Target Framework or connect to a cloud based recognizer via the SdmNetProxyPlugin. The SdmNaoPlugin extends the SDM: it forwards all voice prompts to the robot and provides additional script functions to allow the specification of the robot behavior within EB GUIDE Studio. The plugin uses the NaoGestures library to perform the published script functions. The NaoGestures library itself implements gestures like ‘moving hands’ and connects to the NAOqi framework on the robot to perform the behaviors and to provide feedback about the speech output for the current dialog state.”

This architecture is summarized on the following picture:

EB GUIDE Studio architecture diagram

And that’s how Thomas leveraged the strengths of the EB GUIDE model-based development environment and the flexibility it offers for extensions to create a unique, human-robot interface—and, in the process, to earn his bachelor’s degree. Congratulations, Thomas Ranzenberger!