Robot audition

“Blindness separate us from things but deafness from people” said Helen Keller, a famous American author who was the first deafblind person to obtain a Bachelor in Arts, in 1904. And indeed, being able to interpret an auditory scene from a robot is now a required capability when operating with humans, mainly interacting by speech with each others. Auditory scene analysis is a quite well-known research topic in Acoustics and Signal Processing, but considering its implication in Robotics, it is not as easy as expected to port all the already developments inside a robot. The robotic context exhibit original constraints like embeddability, real-time, reveberations, ego-noise, etc.

Two main paradigms are currently exploited in Robotics:

  • on the one hand, array processing approaches exploit microphones array to exploit redundant audio information to perform sound source localization, source separation, source recognition in a very efficient way. Recent developments definitely show that this is the way to go to develop an efficient audio system in Robotics;
  • on the other hand, binaural approaches try to somewhat mimic the human auditory system, at least from an external point of view (two ears, generally with two external ears). Using only two ears in a Robotics context is still a very challenging task. But while one could question the choice to restrict ourself to only two ears, this is also a unique opportunity to test auditory models of human audition, and to stress the importance of action in the hearing process.

Indeed, hearing is rarely a purely static task. For instance, one often makes small head movements to disambiguate sound location, or better sound recognition. This is specifically what I have been dealing with, i.e. active binaural audition. On this topic, I have mainly proposed contributions on:

  • the characterization of binaural cues used for sound localization and source recognition in realistic conditions,
  • the specific sound localization problem,
  • the use of the binaural movement to better sound localization,
  • and the building of multimodal representation of unknown environments through head movements.

This article was updated on February 20, 2022