(This dissertation is written in French). Being able to perceive and analyse an auditory scene has been identified about 20 years ago as one of the seven challenges faced by artificial intelligence in Robotics. It is today a scientific topic on its own, dealt with by multiple scientific Communities (Signal Processing, Acoustics, Robotics). While the Robot Audition community has been mainly focused on microphone array based approaches, exploiting only two microphones in a binaural setup is still a challenging task. The work presented in the first part of this dissertation deals with binaural audition, and is dedicated to sound source localization and active multimodal scene analysis in realistic robotics conditions involving noise and reverberations. The movement of the robotic plateform plays a fundamental role in these works: while causing changes in acoustic conditions, the robot action can also been exploited by closing the traditional perception/action loop to better the multimodal scene analysis. The second part of this dessertation is dedicated to a more formal approach to perception, where action can not be separated from perception anymore: perception is only possible by interacting on and with the environment. This interactive perception paradigm allows to study how a naive system is able to build by itself a representation of its interaction with its environmenent by discovering invariant structures inside its own sensorimotor flow. This formal approach could allow the robot to incrementally experience the notion of space, shared by the system and objects in the environment. Such a fundamental problematic has not been addressed specifically inside the audio modality, and aims at proposing a new sensorimotor framework to better understand the perception process that could allow Robotics system to gain in Autonomy.