I. INTRODUCTION
The face detection is one of the most important issues in the image processing. The face detection is not only used for the face recognition, but also used for various applications such as the face auto-focusing and the posture recognition in digital cameras. However, there are problems that the accuracy of existing face detection methods for the color image is drastically reduced when the captured area is dark.
With the advancement of depth sensor technology, the depth image can be acquired accurately [1]. As the result, the depth camera has become widespread. Compared with the color image, the face recognition based on the depth image has the advantage of regardless of the illumination or the pose. Various studies based on the depth image including the face detection, the face element detection, the gender recognition, and the identity recognition are researched.
The studies related to the face detection and the face recognition need the face alignment processing and the face normalization processing [2-4]. At this time, the nose is located at the center of the face, so it becomes the reference point for the face alignment. The nose is also commonly used to normalize the depth of the face [5, 6, 7]. Gordon [8] used curvature information to detect the nose. However, this method is suitable for clean 3D data and does not work with image including noise. Werghi [9] researched the method of the nose detection and the frontal face extraction by estimating mesh quality. He measured and assessed the mesh surface quality by extracting groups of triangular facets. The nose tip is detected by identifying the single triangular facet using the cascaded filtering framework mostly based on simple topological rules. Bagchi [10] used the HK classification to detect the nose tip and the eye corner which is based on curvature analysis of the entire facial surface.
In this paper, we propose the method of the detection of the nose and the face for the depth image. We detect the nose point using the structural features of the face and the features of the neighboring depth pixels around the nose point in the face. The nose is protruding from the face, so the nose point has the smallest depth locally. Also, the width of the face is within the certain range so we can detect the nose by checking whether the detected point is inside the face. When the nose is detected, we find the face area and perform the binarization using the depth of the nose in order to detect the face area.
II. Nose and Face Detections in Depth Image
In this paper, we propose the method to detect the nose from the body from the depth image and to detect the face area. Figure 1 shows the flow chart of the proposed method.
After capturing the area containing the body by the depth camera, the nose is detected using the relative distance feature of the nose in the face. After that, we set the rectangle whose center is the nose to the region of interest using the position of the nose and the distance of the nose.
We normalize the depth in the region of interest and we perform binarization in order to detect the face.
First, we capture the depth image including the human body. We perform the binarization in order to separate the body from the background. In the body in the depth image, the nose is usually the closest distance from the depth camera so the depth of the nose point has the minimum value. Fig. 2 shows the characteristics of the depth of the face including the nose. In Figure 2, the nose has the lowest depth in the depth image of the face.
However, the other body part, such as lips, eyes, or chin, can have the lowest depth depending on the body’s pose. Figure 3 and Figure 4 shows this case.
In order to avoid such erroneous detection cases, it is necessary to compare between the depth of the nose point and depths of surrounding pixels. Table 1 shows depth features of the nose which is distinguished from surrounding pixels. Depth features of the nose can be used to detect the nose point correctly and to avoid from the false detection.
The depth decreases continuously as it gets closer to the nose. In horizon consecutive 2N+1 pixels centered on the nose, N pixels continuous in each the left direction and right direction in one pixel are searched for pixels having the depth larger than that of the immediately preceding pixel as equation (1). In case of vertical consecutive pixels, it is also satisfactory. So we find the pixel which is satisfactory equation (1) as horizontal direction and vertical direction.
The locally protruding point can be found by Equation (1). However, the depth of the nose point is significantly lower in depths of face. So we compare the depth to surrounding pixels. We compare the depth of the found pixel to depths of eight points around the pixel located at the distance of M(M>N) from the each found candidate pixel to up, down, left, and right sides to check whether each of the comparison points has a depth value larger than pi, which is a candidate pixel. This processing is shown by Figure 5.
After that, we can find two boundary points pleft≡(xl,yl) and pright(xl,yl) by drawing the horizontal line including the pixel of the nose point. We can obtain actual width of the face by Equation (2) which is the relation between the image coordinate including the depth and the 3D real coordinate [11]. In Equation (2), f is the focal length of the camera as the factor of the depth camera, and xc, yc, zc are the coordinate values in the real world coordinate system. zc is equal to the depth value.
The actual width of the face wface can be obtained as Equation (3). In Equation (3), dleft is the depth of pleft, and dright is the depth of pright.
Since the actual human face width is 13 cm to 22 cm, it can be seen that the candidate pixel is the nose point unless wface is outside the range.
This method can detect the nose accurately even in the posture that is not the frontal face because the face is almost similar to the sphere at the boundary around the nose.
In order to detect the face region using nose detection, a rectangular region of interest is first set. In this case, the width of the face is generally the largest in the nose portion, or almost similar to the face width. The boundaries of the left and right sides of the rectangle of the area of interest are defined by giving horizontal lines around the nose, L lines from the point where the boundary intersects with the vertical line. However, in the binary image, the neck just below the face is included. At this time, it can be used that the depth of the person’s neck suddenly increases. Figure 6. It can be considered that it corresponds to the jaw portion in the face when the variation of the depth value is equal to or more than Tc. Using this, the lower boundary of the region of interest is defined as the jaw, and L pixels are provided in the jaw. As shown in Figure 7.
Not only the face but also part of the neck area is included in the area of interest. Normalize the range of depth values in the region of interest to remove these areas of interest. To normalize the depth value, the maximum depth value and the non-zero minimum depth value in the region of interest are obtained, and the range is normalized to [1, Pmax]. The pixel whose original depth value is 0 is kept it. The histogram of the normalized depth values in the region of interest is then obtained. In this case, the histogram of ROI with face and neck is shown in Figure 8.
At this time, the characteristic of the histogram of the ROI can be found that the neck part and the face part are separated. By using the feature of the histogram, the threshold value Tf when binarization is performed is obtained. In this case, the depth value in the nose region is 1, which is the minimum value in the ROI, and thereafter, the depth value is continuously distributed according to the depth value of the face. Therefore, we first set the threshold Tf to the point where the depth distribution value is less than ε, which is the enough small value. We perform the binarization with a threshold value Tf for the corresponding ROI, then the face region can be obtained as follows.
III. SIMULATION RESULTS
For examining the performance of the proposed method, we used Kinect v2 as the depth camera. The camera has the following property: the depth image resolution is 512×424; the method of obtain the depth is TOF.
We performed the simulation by capturing the face from various angles as Figure 10 (a) and we also performed the simulation by capturing various face as Figure 10 (b).
We also performed face detection and comparison using existing color. We used the face detection algorithm included in the OpenCV library as a face detection method using color.
The results of the experiment with different face angles are shown as Tables 2 and Table 3.
Horizon angle of face (degree) | Accuracy of Detect Nose (%) |
---|---|
−30 | 94 |
−15 | 100 |
0 | 100 |
15 | 96 |
30 | 94 |
Vertical angle of face (degree) | Accuracy of Detect Nose (%) |
---|---|
−30 | 86 |
−15 | 94 |
0 | 100 |
15 | 96 |
30 | 92 |
The results shows that the nose is correctly detection when the angle of view is close to the front. However, when the captured angle is tilted much from the front of the face, it can be seen that there is the case of misdetection.
The results of measuring the accuracy by changing the distance between the depth camera and the body are shown by Table 4. At this time, the simulation is performed within the range of about 1.5m ~ 4m due to the characteristics of the depth camera.
Distance between body and camera (m) | Accuracy of Detect Nose (%) |
---|---|
1.5 | 100 |
2 | 100 |
2.5 | 100 |
3 | 96 |
3.5 | 88 |
4 | 80 |
The result shows that the detection accuracy is almost accurate up to 3m depending on the distance. However, it is confirmed that the detection accuracy is slightly lower at a distance of 3.5 m or more
The accuracy of facial detection is compared with the existing color method as shown in Table 5. In this case, simulation is carried out for the case where the illumination is bright and the case where the illumination is dark.
Accuracy of Detect Face | ||
---|---|---|
Brightness of illumination | Conventional Method for color image | Proposed method |
Bright | 98 | 96 |
Dark | 21 | 94 |
The result shows that the accuracy of the method using the existing color image is higher when the illumination is bright. This is because the measurement accuracy of the depth camera is still lower than the accuracy of the color camera. However, when the illumination is dark, the color image can hardly measure the face, but the depth image can be detected accurately without illumination.
IV. CONCLUSION
In this paper, we propose a face detection method using depth camera only. After separating the body from the background, the position of the end of the nose was detected using the feature of the depth pixel of the nose with respect to the body. We then set the region of interest using the end point of the nose and normalize the depth value of the region of interest. The normalized depth was binarized by separating the neck and face parts by histogram analysis. Through this, I found the face area accurately in the depth image.
In this paper, we propose a face detection method that separates background and object using depth difference. Since the nose end point is located at the center of the human face, the face area can be detected by identifying the nose end point. The method proposed in this paper can perform face detection even in dark environment compared with existing face detection algorithm, and can perform stable nose detection and face detection without being affected by body or face posture.
The nose detection method proposed in this paper is useful not only for face detection but also for face sorting for future face recognition. In existing color images, there is a limited method of correcting faces taken from the side face to face. However, in the depth image, face alignment can be performed by rotating the face using the depth characteristics. In this paper, we propose a face detection method which can reduce the amount of computation as well as simplify and accurate face recognition.