Research

Interactive perception

... this is where we deal with what’s hidden in the sensorimotor flow!

Selected publication

A Formal Account of Structuring Motor Actions With Sensory Prediction for a Naive Agent.
Jean-Merwan Godon, Sylvain Argentieri, Bruno Gas (2020). in Frontiers in Robotics and AI.
📝PDF 🌐DOI 📚Bibtex

For naive robots to become truly autonomous, they need a means of developing their perceptive capabilities instead of relying on hand crafted models. The sensorimotor contingency theory asserts that such a way resides in learning invariants of the sensorimotor flow. We propose a formal framework inspired by this theory for the description of sensorimotor experiences of a naive agent, extending previous related works. We then use said formalism to conduct a theoretical study where we isolate sufficient conditions for the determination of a sensory prediction function. Furthermore, we also show that algebraic structure found in this prediction can be taken as a proxy for structure on the motor displacements, allowing for the discovery of the combinatorial structure of said displacements. Both these claims are further illustrated in simulations where a toy naive agent determines the sensory predictions of its spatial displacements from its uninterpreted sensory flow, which it then uses to infer the combinatorics of said displacements.

Robot Audition

... this is where we deal with how to understand an audio scene!

Selected publication

Binaural Localization of Multiple Sound Sources by Non-Negative Tensor Factorization
Elie Laurent Benaroya, Nicolas Obin, Marco Liuni, Axel Roebel, Wilson Raumel, Sylvain Argentieri (2018). in IEEE/ACM Transactions on Audio, Speech, and Language Processing.
📝PDF 🌐DOI 📚Bibtex

This paper presents non-negative factorization of audio signals for the binaural localization of multiple sound sources within realistic and unknown sound environments. Non-negative tensor factorization (NTF) provides a sparse representation of multi-channel audio signals in time, frequency, and space that can be exploited in computational audio scene analysis and robot audition for the separation and localization of sound sources. In the proposed formulation, each sound source is represented by mean of spectral dictionaries, temporal activation,
and its distribution within each channel (here, left and right ears). This distribution, being dependent on the frequency, can be interpreted as an explicit estimation of the Head-Related Transfer Function (HRTF) of a binaural head which can then
be converted into the estimated sound source position. Moreover, the semi-supervised formulation of the non-negative factorization allows to integrate prior knowledge about some sound sources of interest whose dictionaries can be learned in advance, whereas the remaining sources are considered as background sound which remains unknown and is estimated on-the-fly. The proposed NTF-
based sound source localization is here applied to binaural sound source localization of multiple speakers within realistic sound environments.