Task-dependent control of eye movements and mobile visual sensors

Acronym: 
TACES
Term: 
2008-10 till 2012-02
Research Areas: 
B
Abstract: 

To decide ‘‘Where-to-look-next?’’ is a central function of attention. We developed a novel computational model that combines task-dependent priority control, inhomogeneous processing of static and dynamic features and their fusion at the level of visual proto-objects. For each proto-object, attentional priorities are computed. The proto-object with the highest priority is the target of the next saccade. The model has been successfully applied to real images and to the real-time control of fast shifting cameras of the “Karlsruhe humanoid robot head”.

 

Methods and Research Questions: 

To decide ‘‘Where to look next ?’’ is a central function of the attention system of humans, animals and robots. We developed a novel computational model for the control of attention that integrates three factors, that is, low-level static and dynamic visual features of the environment (bottom-up), medium-level visual features of proto-objects and the task (top-down).

The model includes all these factors in a coherent architecture based on findings and constraints from the primate visual system. The model combines spatially inhomogeneous processing of static features, spatio-temporal motion features and task-dependent priority control in the form of the first computational implementation of attentional priority computation as specified by the ‘‘Theory of Visual Attention’’(TVA; Bundesen, C., 1990. A Theory of Visual Attention. Psychological Review, 97(4), 523-547). TVA assumes two separate attentional mechanisms, namely one giving priority to the processing of objects (“filtering”) and the other giving priority to the processing of visual features (categories) for action (“pigeonholing”). Both attentional mechanisms influences the processing speed of visual features in their “race” towards visual short-term memory (VSTM). VSTM is limited in terms of slots for visual objects. The first four winners are normally encoded in VSTM. TVA has been successfully used to explain a large range of behavioral and neural data of experiments on visual attention. Importantly, TVA does not make any claims about overt visual selection by eye movements – the focus of our project.

Importantly, static and dynamic processing streams are fused at the level of visual proto-objects, that is, ellipsoidal visual units that have the additional medium-level features of position, size, shape and orientation of the principal axis. Proto-objects serve as input to the TVA process (attentional weight equation) that combines top-down and bottom-up information for computing attentional priorities so that relatively complex search tasks can be implemented. To this end, separately computed static and dynamic proto-objects are formed and subsequently merged into one combined map of proto-objects. For each proto-object, attentional priorities in the form of attentional weights are computed according to TVA. The target of the next saccade is always the center of gravity of the proto-object with the highest weight according to the task. The approach combines for the first time inhomogeneous processing, standard bottom up feature maps and TVA. Further decisive theoretical ingredients come from the Visual Attention Model (VAM; Schneider, W.X., 1995, VAM: A neuro-cognitive model for visual attention control of segmentation, object recognition, and space-based motor action. Visual Cognition, 2, 331- 376) and its experimental support (e.g., Deubel, H. & Schneider, 1996, Saccade target selection and object recognition: Evidence for a common attentional mechanism. Vision Research, 36, 1827-1837).

 

Outcomes: 

First, the model is the first computational implementation of a novel cognitive neuroscience model of attention. It combines spatially inhomogeneous processing of static features, spatio-temporal motion features and task-dependent priority control with further theoretical key assumptions of TVA and VAM.

Second, our computational model has been applied to several real world image sequences. Based on a visually specified task, the target has always been found within a few simulated sequences of fixations. Moreover, we have shown that these results are robust to parameter variations.

Third, the computational model has been used for the real-time control of a rapidly shifting camera of the “Karlsruhe Humanoid Robot Head”. We could demonstrate that the model was able to efficiently search for toy objects presented on a uniform background. It directs the head and cameras by a few combined head-eye-saccades to a feature-specified search target within an array of toy objects. Key predictions of the model (e.g, ,proto-object-based visual search) will be tested by comparison with eye tracking experiments with human subjects (lab of Werner Schneider).

Publications: