| Face colour under varying illumination - analysis and applications | ||
|---|---|---|
| Prev | Chapter 2. An overview of colour-based face image and skin analysis | Next |
The skin colour is often used as a cue for detecting, localization and tracking targets containing skin, like faces and hands in an image. It is often not enough to separate skin objects from non-skin objects like wood, which can appear to be skin coloured. Therefore, skin is often combined with other cues like motion, texture and edge features, but in this section only the handling of colour is overviewed.
The goal is to divide the pixels of the image into skin coloured and non-skin coloured ones. The simplest methods define skin colour to have a certain range or values in some coordinates of a colour space. This can easily be implemented as a look-up table or as threshold values as in Chai and Ngan (1998). Dai and Nakano (1996) enhanced orange-coloured parts in YIQ space by selecting only a certain range of the I component. Hidai et al. (2000) defined an “ideal skin colour” by an average of precaptured face images, and based on the closeness of image pixels to this point they defined skin and non-skin pixels. Additionally, histogram equalization was made to increase robustness against brightness fluctuations. The second approach is to assume that the skin colours have different probability to occur and these probabilities follow a certain distribution which can be learned. Common features for these approaches are thresholds and tunable parameters; also the use of chromaticity coordinates is typical. The amount of skin pixels used for these off-line probability calculations varies greatly in the literature. Hsu et al. (2002) suggested colour correction before skin detection in YCbCr space. The colour correction was a version of the white patch method in which transformation coefficients are calculated from the mean of the highest 5 % luminance pixels if their amount exceeds a fixed threshold and the mean is not a skin tone value. However, their correction algorithm does not take into account saturated channels or the possibility of high valued pixels belonging to chromatic colour. After the correction, a nonlinear transformation was applied to chromatic data to obtain a better fit for the elliptical skin colour model. The detection algorithm was tested with quite moderate illumination change and the most demanding cases have a simple, white background.
The selection of threshold(s) has also been made in various ways to exclude those skin colours which occur too rarely. Comaniciu and Ramesh (2000) use a 1D skin colour distribution with mean shift to track faces (see Appendix 3). The object probability distribution was obtained off-line from an image or images taken in an office room. Although they mentioned that images were taken at different times (morning, afternoon and night) it was not clear how big the skin colour changes were. Generally, their test of mean shift tracking seems to be made under quite stable illumination conditions. Schiele and Waibel (1995) have made a face tracker based on only skin colour. They use a probability distribution to intensify the skin coloured region. Although they mention a colour map for most of the possible face-colours, they do not show or specify the chromaticity changes. Not all distributions are calculated off-line; for example Saxe and Foulds (1996) have suggested an on-line iterative method in which after user-initialization, the histogram of the selected area is compared to other histograms of patches.
The common parametric methods are based on Gaussians: unimodal Gaussian density function (Cai & Goshtasby 1999, Kim et al. 1998, Yang & Ahuja 1998) or multimodal Gaussian mixtures (Jebara & Pentland 1997, Jebara et al. 1998, Yang & Ahuja 1998). The parameters of the former can be estimated using maximum likelihood (Cai & Goshtasby 1999, Kim et al. 1998, Yang & Ahuja 1998) whereas the estimation for the latter requires an Expectation-Maximization (EM) algorithm (Jebara & Pentland 1997, Jebara et al. 1998, Yang & Ahuja 1998). An output image which contains a skin probability has also been presented for face detection: Menser and Müller (1999) applied PCA on skin tone probability images obtained from a 2D Gaussian colour model. However, an interesting study has shown that histogram models provide better accuracy and lower computational cost than mixture models for skin detection (Jones & Rehg 2002). In addition, according to Yang & Ahuja (2001) single Gaussian distribution may detect less well the skin regions than a mixture of Gaussians. Additional assumptions, like an homogeneous intensity field over the object, have been made to separate more effectively skin and non-skin objects which have similar chromaticities (Abdel-Mottaleb & Elgammal 1999). Skin colour distributions have been learned also by neural network based approaches. Karlekar and Desai (2000) used a multilayer perceptron to learn skin colour distribution and classify pixels into skin-tone and non-skin tones. A Self Organizing Map (SOM) for labelling skin tones was used by Piirainen et al. (2000). It seems that all these different approaches work only in very well behaving illumination conditions; at least they seem to be designed for stable illumination conditions due to static models.
Adaptive approaches have also be suggested in order to cope with changing conditions. One way is to define a range of possible skin colours in which a finer model is found. Sahbi and Boujemaa (2000) collect a coarse skin colour model using neural networks from “a very large population ethnicity” which is used for coarse level skin detection. Later, the areas found are subjected to Gaussian colour modelling for relevant and noisy skin points and the parameters of the models are evaluated using a fuzzy clustering approach. They also assume that skin objects have a homogeneous local colour distribution. Sigal et al. (2000) adapted the skin colour histogram using a second order Markov model and feedback from the current segmentation results. They initialized tracking using the model suggested by Jones and Rehg (1999) for Internet images. Bergasa et al. (2000) presented a Gaussian skin colour model which is both unsupervised (prototype) and adaptive. They use a prototype cluster for representing human skin and the colour cluster which is closest to the prototype is considered to be skin. However, this limits usability of their approach to quite static illumination conditions. The adaptation of the model is done using a linear combination of previous model parameters. Cho et al. (2001) also used a predefined area for HSV skin colours in which a finer area is selected by adjusting several threshold values. They did not consider skin tone shifts because the thresholds for the hue component were fixed. Background areas were eliminated by assuming that their area is small compared to skin regions. Also a cluster analysis was performed to separate dominant background colour vectors from skin coloured ones. The skin coloured vectors were defined to be those which were nearest to predefined values. Approaches with user initialization have also been proposed.
Rasmussen and Hager (1997) have developed a tracking method in which the user gives an initialization region which is subjected to PCA to parametrize an ellipsoidal model. The ellipsoidal model assumes that the object colours can be confined by a simple, point-symmetric cluster. Their tracking method uses a fixed tracking window and based on the target found, the model is once again updated with PCA. However, their targets do not seem to contain any chroma shifts. Tsapatsoulis et al. (2001) combine skin colours and shape to template matching. They use an adaptive 2D Gaussian model whose parameters are re-estimated based on the current image. The pixels classified as skin were used for re-estimation of the Gaussian mean value. Schuster (1994) use two colour models: an ellipsoid model and a mixture density model using RGB values. The mixture density model is obtained as a weighted sum of colour density functions which describe the distribution of colour values. Based on the localized target, colour model parameters are calculated and used for prediction of the parameters in the next frame. He also used a global colour model which contains a priori knowledge about parameters. Shape information was used to make sure that the pixels used for adapting both colour models were part of the object. Yang et al. (1998) suggest adapting a Gaussian model using maximum likelihood criteria by modelling it as a combination of the previous Gaussian distributions. Also in this case no big changes in skin colour were shown.
For adaptive tracking, two different spatial constraints have been introduced for selecting the pixel for refreshing the skin colour model. Raja et al. (1998) (later also in McKenna et al. (1999)) suggested adapting a Gaussian mixture model by a small area inside the localization. The Gaussian mixture model approximates the multi-modal distribution of the object’s colours by using a number of suitably weighted Gaussians. They also use a normalized log-likelihood measure to prevent adaptation under tracker failure which seems to be caused by a shift in hue. Another spatial constraint was presented by Yoo and Oh (1999) who used histogram backprojection for face tracking. The purpose of histogram backprojection is to form a greyscale image in which the grey value shows the probability of a colour shade belonging to the object. It is assumed that the blob of high values in the image indicates the presence of the object. The face was assumed to be an ellipse and the pixels inside the located face ellipse were used to update the skin histogram. Also transductive learning has been suggested for skin tracking (Wu & Huang 2000) for a linear subspace of a combination of HSV and RGB spaces. The goal is to transduce the colour classifier so that it works well in the changed conditions. Once again, the main illumination variability seems to be caused by intensity changes.
However, the images and videos used for evaluation of these algorithms so far do not contain very many chromaticity shifts nor a nonuniform illumination colour field. The basic assumption of many methods seems to be that the illumination colour does not vary significantly due to restrictions built in the algorithms. More often the change is in the intensity (due to shadowing for example) or image geometry. It might be that a different choice of colour space would improve results, as was demonstrated by Terrillon et al. (2001). An exemption to this is the work done by Störring et al. (2001) and Störring et al. (1999). They consider skin colour under an illumination colour temperature range of 1500 K-25000 K with one camera calibration condition. Störring et al. (1999) named the area of all possible skin chromaticities under the illumination range as a skin locus because the chromaticities followed a Planckian locus. Störring et al. (2001) extended the work for mixed illumination (for example cases when there are two light sources causing a nonuniform illumination field over the skin). They concluded that the results for the body reflection chromaticities are the same as in the cases of a single light source. In both papers, they compared the average measured chromaticities to the modelled chromaticity area and found a good match with actual spectral power distributions. Before then also Matas et al. (1994) have suggested the use of chromaticity constraints. Unfortunately, their publications have been deprived of details, so further evaluation of their results and constraints is difficult. Another interesting piece of research related to changing illumination conditions was made by Debevec et al. (2000) who present a method to acquire the reflectance field of a human face. They use their measurements to render the face under arbitrary illumination conditions.
Table 2 summarises some colour spaces used for pixel labelling for face based approaches. The most popular approach seems to be NCC rgb.
Table 2. Colour spaces for pixel labelling.
| Colour space | Yang & Ahuja 2001 | Other works |
|---|---|---|
| Authors | Author | |
| RGB | Jebara & Pentland 1997, Jebara et al. 1998,Satoh et al. 1999, | Rasmussen & Hager,Yang et al. 1998 |
| normalized RGB or NCC rgb | Crowley & Bedrune 1994, Crowley & Berard 1997, Kim et al. 1998, Miyake et al. 1990, Oliver et al. 1997, Qian et al. 1998, Starner & Pentland 1996, Sun et al. 1998, Yang et al. 1998, Yang & Waibel 1996 | Bergasa et al. 2000, Sahbi & Boujemaa 2000, Schiele & Waibel 1995 |
| HS-based | Kjeldsen & Kender 1996, Saxe & Foulds 1996, Sobottka & Pitas 1996a, Sobottka & Pitas 1996b | Cho et al. 2001, Yang et al. 1998 |
| YCrCb | Chai & Ngan 1998, Wang & Chang 1997 | Hsu et al. 2002, Karlekar & Desai 2000, Luo & Eleftheriadis 2000, Menser & Müller 1999 |
| YIQ | Dai & Nakono 1995, Dai & Nakono 1996 | |
| YES | Saber & Tekalp 1998 | |
| CIE XYZ | Chen et al. 1995 | |
| CIE LUV | Yang & Ahuja 1998 | |
| ab | Kawato & Ohya 2000a, Kawato & Ohya 2000b | |
| YUV | Abdel-Mottaleb & Elgammal 1999 | |
| Farnsworth’s UCS | Wu et al. 1999 |