I. INTRODUCTION
But while a simple device or vehicle to move things may be in past, the shape (or style) and function is very fast and variety progressed in now. The research for the vehicle was not only the improvement of the vehicle performance but also the protection of people in the vehicle after accident. Nowadays, the core of the research is changing to prevent the accident and to protect the human from inner and outer. Pedestrian detection system is divided three categories – infrastructure enhancement, passive safety system, and active safety system [1].
The development for the safety device design of vehicles, especially airbag, bumper, electronic equipment, and etc., has been focused on protecting drivers from accident on the road and enhancing comfortability until a few years ago. The most of research for the safety of human has been worked to improve the detection rate for pedestrian and vehicle on the road [1, 2, 3]. Lately, the vulnerable road users (VRUs), consisting of human, bicyclist, two wheelers, and other small vehicles, detection has been researched to protect before accident for intelligent vehicles [4].
The motivation of this paper is as follow. Firstly, two wheelers detection system is still not considerable time investment to find good algorithm. Secondly, it is familiar with pedestrian detection which has accuracy and efficiency in still images. But it is one of the most difficult works due to a variety range of poses, as well as environmental conditions, cluttered backgrounds, and composite object (shown several shapes than pedestrian, according to the view point). So in this paper, we suggest a new algorithm to detect the two wheelers for the weak part on the road.
Many improvements of vision-based and other kinds of camera have been proposed, consequently pushing performance [1, 5, 6]. Pedestrian detection system is divided several categories according to the sensors: advanced and expensive sensors such as far infrared camera and near infrared camera for night vision (FIR and NIR) [7, 8, 9], LIDAR or RADAR based on extremely high frequency [4], Laser Scanner in order to obtain much information and to make robust real-time detection [10], and radio-based mobile communication such as Global System for Mobile Communication(GSM) or Universal Mobile Telecommunication(UMTS) [1]. Despite their attractive aspects, vision-based system, particularly monocular systems have many advantages; easy extension, lower price and less computation [5, 11]. And this is still a challenging problem because of the fact that people and two wheelers can be appeared quite different shape due to differences in the clothing/hairstyle, body pose, and two wheelers model type [12]
Generally speaking, the literature on pedestrian detection system is abundant. Features can be distinguished into global features, local features, single features, and multiple features depending on how features are measured and used [13, 14, 15]. Global features operate on the entire image of datasets such as principal component analysis (PCA) [2, 16]. On the other hand, local features are extracted by dividing a sliding window into different subset region of image, with one or more kinds of features extracted in each subset region [11, 14]. Similarity, Mikolajczyk, etc., [17] divided into whole body detection and body parts detection as the local features. The advantage of using part-based research is that it can deal with variation in human appearance due to body articulation. However, disadvantage of using this approach is that it is difficult to calculate due to the more complexity to the pedestrian detection problem [2]. Widely used single features for pedestrian and object detection in the literature are edge [18], shapelets [19], local binary pattern(LBP) [20, 14], histogram of oriented gradients(HOGs) [21, 22], local representative field [5], wavelet coefficient [7], Haar-like features and its application features [23]. On the contrary to this, approaches based on multiple features combine several types of the above single features. For example, Wang, and etc. [24] suggested new feature extraction method using different kinds of histogram features. And another method id that different features can be used to train classifiers individually, and a final decision is reached by the majority voting or by a classifier cascade [7, 25], using support vector machine (SVM) [26], neural networks, or k-NN classifiers [27].
Pedestrian detection has many algorithms that are similar to two wheelers detection. Because of this reason, our paper describes about the pedestrian detection. Pedestrian detection has allured an extensive amount of interest and received much attention from the computer vision research community over the past few years. The approaches for pedestrian detection have been proposed in terms of features, data sets, classification, and general architectures. Pedestrian detection research and data sets have been proposed by pioneering workers [28]. Feature extraction method from vision-based images has been primarily studied in Haar wavelet-based method, HOG which has direction of gradient, and local receptive field (LRF). And support vector machine, neural network, and Adaboost algorithm are applied a lot of categorizing method [29]. Papageorgiou [30] detected pedestrian based on a polynomial SVM method using modified Haar-wavelet, Depoortere etc. [31] obtained an optimized result study. And Gavrila and Philomin [32] performed comparison using distance of chamfer from edge image. Distance of chamfer is shown as the mean distance of close features. Viola etc. [33] detected moving pedestrian in more complicated region based on Haar-like wavelet and space-time difference using Adaboost algorithm. Ronfard etc. [34] accurately realized physical body detection by joint SVM based on limb classification about primary and secondary Gaussian filter. Classification based methods have comprised the mainstream of research and have been shown to achieve successful results in object detection via pedestrian detection: SVM, neural network, Adaboost, and etc. And the object classification approaches have been shown to achieve successful results in various research areas of object detections. The state-of-art, statistical pattern recognition techniques become primary methods for classifier training in pedestrian detection system.
As we mentioned previously, two wheelers similar with not only the shape of pedestrian but also detection technique based on several features. Two wheelers consist of human and machine; usually a human is upper part and machine is lower part in the shape. In this paper, we define that bicyclist (BL) is a people riding the bicycle and motorcycle driver (MD) is a people riding the motorcycle. So two wheelers detection system can be adapted to the pedestrian detection algorithms for features extraction, classification, and non-maxima suppression. HOG [21] based detector system has slow performance because of dense encoding scheme and multi-level scale images. Porikli [35] solved this problem using the concept of “Intelligent Histogram” [35] to speed up the feature extraction process. Another solution of the problem is to use a boosting algorithm [4] to speed up classification process. Because of above reasons, we tried to use modified HOG algorithm to select best features and Adaboost to improve detection rate. In this study, we invented new algorithm based on adapted HOG value which is normalized by correlation coefficient between two wheelers area and the cell. More detail about general and modified HOG will describe in section 2. This paper proposes a system for the detection of two wheelers ride on people with both efficiency and accuracy.
This paper is organized as follow. In section 2, this paper explains the original feature set using general HOG algorithm and the associated evolutionary algorithm which is improved the detection rate. Section 3 describes the framework and training procedure of proposed two wheeler detection system. The results of their evaluation and a detailed performance analysis are presented in section 4. Section 5 concludes this paper.
II. FEATURE EXTRACTION
Histograms of Oriented Gradients (HOGs) are feature descriptors used in computer vision and image processing for the purpose of object detection. It converts the distribution directions of brightness for a local region into a histogram to express them in feature vectors, which is utilized to express the shape characteristics of an object. And it is influenced a little from an effect of illumination by converting the distribution of near pixels for a local region into a histogram, and has a strong feature for a geometric change of local regions. The following is a detailed explanation on how HOG description is calculated.
Value of gradient at every image pixel I(x, y) is calculated by derivatives fx and fy in x and y direction by convolving the filter mask [-1 0 1] and [-1 0 1]T. Refer equation (1) and (2).
where I is an example gray scale image and ⊗ is the convolution operation.
The gradient magnitudes m(x, y) and orientation direction θ(x, y) for each pixel are calculated by
This stage defines production of an encoding that is sensitive to local image content. The image windows are divided into 8x8 rectangular small spatial regions call cells, as shown in Figure 1 (c). Similar to [20], we used unsigned gradients in conjunction with nine bins for every cell (a bin corresponds to 20°). The 8x8 cell magnitude pixels are accumulated in one of the nine bins according to their orientation direction. Figure 1 (c) depicts a graphical representation on how the gradient angle range is binned in its respective cell.
Directional histograms for brightness prepared in each of the cells were normalized as a block of 3x3 cells. This is performed by grouping cells in larger spatial regions called blocks. Characteristic quantities (9 dimensions) of row i, column j, Cell (i, j) is expressed as Fi,j=[F1, F1, … ,F9]. The characteristic quantities of k’th block (81 dimensions) may be expressed as:
Normalization processes are summarized in Figure 1 (d), where a movement of block is based on fact is moved to the right side and to the lower side by one cell each. The overlapping process is done to ensure the important features of each cell. The normalized characteristic vectors for each block are given by
And the feature vectors are saved by concatenation method. For example, the dimension number for the height and width of an input image are 128x64 pixels, the dimension number of the histogram is 9, the size of cell is 8, and the size of block is 3, then the calculated the number of HOG feature vectors with 6804 dimension is obtained.
The Haar-like features value, which is first proposed by Oren, et al. [37], is used to represent the feature of horizontal, vertical, and diagonal edge in image. The purpose of these are used as overcomplete Haar wavelets for pedestrian detection. Later, Papageorgiou & Poggio [30] studied the overcomplete Haar wavelets for the detection: face, car, and pedestrian. Following the idea of overcomplete Haar wavelets, Viola and Jones [21] proposed the Haar-like rectangle features for face detection and Lienhart and Maydt [38] added rotated rectangle features to the feature set, which is called extended Haar-like features. This calculate a sum of each component: the pixel gray level values sum over the black rectangle and the sum over the whole feature area, as shown in Figure 2.
Because in a real and robust classifier is used hundreds of features, the direct computation of pixel sum over multiple rectangles will make the detection work very slow and not suitable for real time application. Viola et al. [333] introduced a very effective algorithm for computing the sum quickly, so called integral image. The integral image, summed area table (SAT), is computed over the whole image I [33]. The definition of SAT ( K(x,y) ) is following:
The sum is done by using the rectangle corners coordinates, as described in the equation (7) and Figure 3.
A coefficient of correlation or Pearson product-moment correlation coefficient (PMCC) is a numerical measure of how much one number can be expected to be influenced by changes in another. It is expressed between -1 and 1 that measure the strength of the linear relationship between two variables. A correlation coefficient of zero means that the two numbers are not related. A non-zero correlation coefficient means that the numbers are related, but unless the coefficient is either 1 or -1 there are other influences and the relationship between the two numbers is not fixed. As previously defined, even though correlation coefficient includes the negative range, because it means that two numbers are inversely correlated, we regarded the negative value as the positive value. So this (ρ) calculator uses the following:
where σcx,σcy is standard deviation for two cell, cx and cx , and C(cx,cy) is the covariance of two cells. In General, correlation coefficient is used to explain the information we calculate about the magnitude in the one cell by observing another magnitude in the cell. As shown in Figure 2, the cells of two wheelers area are showing different type of characteristic than other area, such as background or road area (bottom). Then we emphasize that our paper proposed an innovation methods based on the relation information of two cells
III. CLASSIFIER
Adaboost is a simple learning algorithm that selects a small set of weak classifiers from a large number of potential features according to the weighted majority of classifiers. The training procedure of Adaboost is a greedy algorithm, which constructs an additive combination of weak classifier. Our boosting algorithm is basically the same as P. Viola’s algorithm [33]. The boosting algorithm pseudocode for Adaboost is given in Figure 3. The algorithm takes as input a training set (x1,y1),…(xn,yn) where each xi belong to some domain X and each label yi is in some label set Y.
Given training set: (x1,y1),…(xn,yn), where xi ∈ X, yi ∈ Y = {+1,−1}
-
Initialize weights for yi = +1,−1
m: the number of negative image (pedestrian, +1)
n: the number of negative image (non-pedestrian, -1)
-
For t=1 ···T:
-
Normalize the weights,
so that wt,i is a probability distribution of ith training image for tth weak classification
-
For each feature, j, train a classifier hj which is restricted to using a single feature. The error is evaluated with respect to wi
-
Choose the classifier, ht, with the lowest error εt
-
Update the weights:
Where εi = −1 if example xi is classified correctly, εi = +1 otherwise, and
-
-
Output the final hypothesis:
where αt = log(1/βt)
The final hypothesis H is a weighted majority vote of the T weak hypotheses where αt is the weight assigned to ht. Using two strong classifications, in this paper suggests 2nd stage cascade method. It improves the recognition rate due to the complementary role for two feature vector of quite different type.
IV. EXPERIMENTAL RESULTS
To access the effectiveness of the proposed evolutionary method, the algorithm was applied to a practical proposed method. Our system run on a platform based on Intel Core i7(2.5 GHz) and 8GB DDR memory. For the proposed efficient two wheeler detection approach, we describe the performance of our innovative technique using correlation coefficient. We evaluate the performance of our techniques on our positive samples (two wheelers datasets; bicyclist and motorcycle driver). Then the detection performance is compared with other typical features. This system used a dataset consisting of two kinds of type according to the front view point for horizontal line: 90° and 60°. And in our study, we also experiment about mixed the 90° and 60°. Our dataset contains only front and back view with a relatively limited range of poses (60° and 90°) which are scaled to size 16x128 pixels. And the negative (non-two wheelers) samples used in our experiments were extracted randomly from general street images. All our dataset examples used in this paper shows in Figure 4. In here, “P” means positive and “N” means negative.
In Figure 5, we confirmed that the mixed two degrees (90° and 60°) showed bigger area for the receiver operating characteristic (ROC) curves than non-mixed degree. For the HOG features, we have used a cell size of 3x3 pixels with block size of 3x3 cells, descriptor 9 orientation bins of signed gradients to train Adaboost classifiers.
Our preliminary experiments were tested on our two wheelers data set, and Figure 6 shows the ROC curves for the traditional HOG method. The ROC curves were generated as described in [33] by adjusting the threshold (θ) of the last two node of the object detector from -20 to 20. Note that in Figure 5 and 6, the true positive rate is plotted against the number of false positive detections.
This paper tried two kinds of correlation coefficient (CC) methods: CC1 and CC2. The CC1 is calculated the CC value between each cell and special area which is showed human and two wheelers, as shown in Figure 7. In this paper, the special area was established the red region in the Figure 7 (c). The sum of CC value calculated by equation (8) is multiplied as the cell weight. This is normalized as following:
The second method, CC2 is calculated similar to CC1. But in this case, we thought that if the human and bicycle is separated to calculate the CC as the target region, it will be more enhanced performance and have remarkable features. And then, upper cell is calculated below cells and below cell are calculated upper cells, as shown Fig. 8.
From the CC2 method, the performance reveal that the true positive rate shows near to 1 and the false positive rate under 0.04, as shown in Figure 6 ROC curves. This means that our proposed algorithm presents outstanding results than previous methods. And another excellent performance shows in table 1 which is comparison of the recognition accuracy results for the using of different feature types. In here, B means bicyclist, M means motorcycle driver, and MB means mixed bicyclist and motorcycle driver.
In table 1, the HOG_CC1 shows similar to the traditional HOG method for each training rate. But the HOG_CC2 presents not only higher accuracy rate than the traditional HOG and HOG_CC1 method for training rate and all degree but also less calculation time than the HOG_CC1 due to using half image. Adapted equation is as follows:
where i is the number of CC calculation and j means cell number. K is the sum of CC and Cj means j th cell. For the same data set, a comparison of the performance results for another different feature types are shown in Figure 9. The result of Haar-like features shows lower performance than HOG and proposed algorithms, because of concentrated left lower area in the Figure 9.
Used each training set ratio of positive and negative is 1:1 and 1:2. For more detail, the bicyclist training images are 340 examples for 60 degrees and 845 examples for 90 degrees. And motorcycle driver training images are 96 examples for 60 degrees and 234 examples for 90 degrees. We also train the mixed of two degree and types (bicyclist and motorcycle driver). In the first experiments, the performance of the traditional HOG feature which was applied spatial regions periodicity method has shown in Figure 5, according to the data type and degree.
V. CONCLUSION
This representation yields not only a suggested solution for weak objects (two wheelers) on the road but computationally efficient algorithm using the correlation coefficient. It has the problem that two wheelers move faster than the pedestrian on the road. So it has very valuable research to protect the human life and to avoid the accident. To solve this problem, we proposed that HOG_CC2 used a sliding window approach has outstanding detection result than previous algorithm. Adaboost classification based methods have comprised the mainstream of research to detect two wheelers and have been shown to achieve successful results in two wheelers detection. It has been experimentally demonstrated that CC2 method generated by the correlation coefficient of local features (human bicycle) leads to better classification results than other traditional methods from ROC. Furthermore, we also confirmed the recognition accuracy of HOG_CC2 has higher detection rate than HOG and HOG_CC1.
From the experimental results, we proved that the process of two wheelers detection may use smaller local features, low dimension and less computation than earlier suggested. And we consider that there are many future experiments that could be done to extend of this study (not deals with including occluded area, object change according to the weather, and others degree in this paper.