Chapter 6. Skin locus in face tracking

Table of Contents
6.1. Face Video Database
6.2. Ratio histogram and histogram backprojection
6.3. Adaptive ratio histogram
6.4. Tracking with skin locus: settings and results
6.5. Comparison with other tracking methods
6.6. Robustness to localization errors
6.7. Mean shift with skin locus

The obtained knowledge about the range of skin chromaticities (skin locus) will be shown to be useful in colour based face tracking and segmentation. To envisage this, a Face Video Database under drastically varying illumination conditions has been created. The skin locus is combined with different tracking algorithms and applied to these videos. No cue other than colour is used; obviously adding other cues would improve the overall performance. However, because the colour is the only cue it is much easier to evaluate its performance and usefulness.

6.1. Face Video Database

The Face Video Database (Paper VII) was designed for development, testing, comparison, and verification of algorithms related to face-based applications. Because (at this moment) three cameras (Alaris, Nogatech and Sony) are employed, it is possible to study and compare the performance of algorithms with different cameras.

The database consists of the following data for each camera: face images and information related to their acquisition, face videos and manual localization of faces in the videos. The videos and images were taken with 1CCD cameras (Alaris and Nogatech) and with a 3CCD (Sony) camera. Alaris and Nogatech are both low-cost web cameras with an automatic intensity (gain) level control. When taking videos and images, their automatic colour correction options were turned off after initial white balancing because colour correction can lead to unstable and unpredictable results, and the main interest was in the effect of illumination changes. The Sony DXC-755P does not have automatic gain or colour correction.

When the face images were taken with a camera, the same procedure and same conditions as in the creation of the Physics-based Face Database were applied (See Section 4.2). Table 15 shows a face image series taken by the Nogatech camera with corresponding prevailing and white balancing illumination conditions. The SPDs of the illuminants used and the spectral reflectance of skin for some persons are available. During image acquisition, it was noticed that calibration results were not always very good for the 1CCD cameras due to their limited capabilities. This can be observed when comparing the uppermost and second rows of Table 15. The white calibration failed to remove totally the reddish cast of illumination H and to make white appear white. Because the white calibration has its own limitations, there is an obvious need for illumination insensitive techniques.

Table 15. Sixteen faces for Nogatech.

The illumination in the face videos is challenging and commonly encountered in practice; the videos are made under both indoor and outdoor illumination conditions. The illumination field over the objects varies in time and in space. The videos have also different initial white balancing settings. The videos have persons with dark, pale and yellowish skin tones. Tables 16 and 17 display a few selected frames from an indoor and outdoor video. In the outdoor video, the person is on the roof of a building. The illumination field over the face varies from direct sunlight to the cast from the plain sky. The indoor movie shows a person moving in a corridor with an illumination field created by either fluorescent lamps, daylight from the window or both.

Ground truths for face localization are selected manually and they make possible numerical quantification of the results. The localization is made by a box (ground truth bounding box) surrounding the face region in the image. For some selected videos, the face area is defined more accurately by a polygon (a ground truth bounding polygon). Both ground truths are visualized in Figure 27

Table 16. Selected frames from an outdoor Nogatech video.

Table 17. Selected frames from an indoor Alaris video.

Figure 27. Face localization by (a) a box and (b) a polygon.