Robust tracking and remapping of eye appearance with passive computer vision

Carlo Colombo, Dario Comanducci, Alberto Del Bimbo
2007 ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)  
A single-camera iris-tracking and remapping approach based on passive computer vision is presented. Tracking is aimed at obtaining accurate and robust measurements of the iris/pupil position. To this purpose, a robust method for ellipse fitting is used, employing search constraints so as to achieve better performance with respect to the standard RANSAC algorithm. Tracking also embeds an iris localization algorithm (working as a bootstrap multiple-hypotheses generation step), and a blink
more » ... that can detect voluntary eye blinks in human-computer interaction applications. On-screen remapping incorporates a headtracking method capable of compensating for small user-head movements. The approach operates in real time under different light conditions and in the presence of distractors. An extensive set of experiments is presented and discussed. In particular, an evaluation method for the choice of layout of both hardware components and calibration points is described. Experiments also investigate the importance of providing a visual feedback to the user, and the benefits gained from performing head compensation, especially during image-to-screen map calibration. Intrusive techniques require some equipment to be put in physical contact with the user, such as electrodes, contact lenses, or head-mounted devices [Duchowsky 2003 ]. Nonintrusive techniques are mostly vision based, that is, they use cameras to capture images of the eye. Most of the commercial devices adopting nonintrusive techniques often rely on the analysis of infrared light generated by an emitter and reflected by the eye: The effect of such a reflection is to enhance the contrast between the pupil and iris [Morimoto and Mimica 2005] . Such systems are fairly accurate, but also require special and expensive hardware and, being based on active light emission, retain a certain degree of intrusiveness. Moreover, sunlight and glasses can seriously disturb the reflective properties of IR light. A common requirement for commercial systems is that the head remain perfectly still during use; this is typically achieved by means of special supports for the chin. This is another factor that limits the degree of usability of gaze estimation systems. IR-based eye trackers generally use the center of eye and the glint (reflection of IR light on the eye surface): Assuming a static head, the glint acts as a reference point and the vector from glint to center describes the gaze direction. Exploiting a neuralnetwork-based approach, in Ji and Zhu [2004] a technique based on IR is presented that reaches an (not so good) accuracy of about 5 • , yet allows head movements. Other vision-based approaches avoid the use of active illumination, relying exclusively on natural light. Such passive approaches typically use off-the-shelf hardware, and monitor eye gaze shifts by performing iris localization and tracking. Indeed, the human iris is a good part of the eye to track under passive vision due to its perfectly circular shape (giving rise to ellipses under image projection) and its chromatic contrast against the white region surrounding it: the sclera. Incidentally, the requirement of robustness in uncontrolled conditions of illumination, image quality, and iris appearance does not allow, for the purpose of iris detection/localization, the use of standard techniques employed in biometric identification [Ma et al. 2003; Daugman 2004] working under controlled conditions (i.e., unoccluded iris). In Trucco and Razeto [2005] a robust iris localization approach is presented in order to develop a bootstrapping or failure recovery module for an eye tracker. The ellipse describing the iris is fitted by a simulated annealing approach, maximizing a criterion that compares the intensity variation across the ellipse perimeter with a model derived from observations. Pure eye localization (i.e., without temporal tracking of the estimated gaze-shifts) approaches such as this, even when accurate at 99%, are not suitable for some applications, for example, those where gaze estimates are fed back to the user for human-computer interaction purposes. As a matter of fact, a 1% failure rate means, at a video rate of 25 frames per second, one wrong estimate every four seconds. In Hansen and Pece [2005] an active contour tracker is presented, combining particle filtering with the expectation-maximization algorithm. The tracker is complemented with a gaze-estimation system based on a projective model of the image-to-gaze direction map. The tracker works at multiple scales and reaches a good accuracy in the image plane, especially with close-up views of the eye. However, the gaze-estimation method proposed requires that the head remain fixed, thus limiting the usefulness of the approach. In Wang and Sung [2001] the iris contours are modeled as two planar circles and their projections on the retinal plane are estimated. Given some anthropometric knowledge and user distance, gaze determination is obtained from the elliptical shape of the projected iris with 0.5 • error. In Beymer and Flickner [2003] and Shih and Liu [2004] the problem of avoiding the calibration of the image-to-gaze direction map is addressed, and solved in both cases by the use of a stereo-camera pair. Besides methods such as those cited earlier that exploit features extracted from the image, such as contours and eye corners, other approaches are appearance based and use all the raw image data as input. For example, in Xu et al. [1998] a neural network is fed with 2000 training example images, and a gaze estimation accuracy of about 1.5 • is obtained. Among the application domains of gaze estimation systems mentioned before, advanced humancomputer interaction is one of the most interesting for both its social and commercial impact. The
doi:10.1145/1314303.1314305 fatcat:7zlse6ummvcn5ehkpjjhieiesu