Multimodal sound localization for humanoid robots based on visio-auditive learning

Abstract

This paper deals with sound source localization in a humanoid robotics context. Classical binaural localization algorithms often rely on the following process: first, binaural cues are extracted from the left and right microphone/ear signals; next, a model is exploited to infer the possible localization of the sound source. Such a method thus requires an accurate modeling of the head acoustic shadowing, or precise Head-Related Transfer Function measurements. In order to avoid these last complicated steps, we propose in this paper an original multimodal sound source localization method. The relationship between binaural auditory cues and the position of the sound source to be located in an image is learned by a partially-connected neural network. This approach has a higher resolution and is less complex than state-of-the art techniques. Simulations and experimental results are shown, demonstrating the effectiveness of the proposed method. A very accurate azimuth estimation is provided, while elevation requires additive cues to be more efficiently approximated.

Publication
in 2011 IEEE International Conference on Robotics and Biomimetics