In multimodal realistic environments, audition and vision are the prominent two sensory modalities that work together to provide humans with a best possible perceptual understanding of the environment. Yet, when designing artificial binaural systems, this collaboration is often not honored. Instead, substantial effort is made to construct best performing purely auditory-scene-analysis systems, sometimes with goals and ambitions that reach beyond human capabilities. It is often not considered that, what enables us to perform so well in complex environments, is the ability of: (i) using more than one source of information, for instance, visual in addition to auditory one and, (ii) making assumptions about the objects to be perceived on the basis of a priori knowledge. In fact, the human capability of inferring information from one modality to another one helps substantially to efficiently analyze the complex environments that humans face everyday. Along this line of thinking, this chapter addresses the effects of attention reorientation triggered by audition. Accordingly, it discusses mechanisms that lead to appropriate motor reactions, such as head movements for putting our visual sensors toward an audiovisual object of interest. After presenting some of the neuronal foundations of multimodal integration and motor reactions linked to auditory-visual perception, some ideas and issues from the field of a robotics are tackled. This is accomplished by referring to computational modeling. Thereby some biological bases are discussed as underlie active multimodal perception, and it is demonstrated how these can be taken into account when designing artificial agents endowed with human-like perception.