I. INTRODUCTION
Brain metastases most frequently originate from lung cancer, breast cancer, and malignant melanoma [1]. Owing to late detection and incorrect diagnosis, most cases have extremely limited treatment options [2]. Every year, 14,000 patients die because of malignant tumors [3]. According to the World Health Organization, tumors are divided into different grades, and the criterion for this division is tumor size [4], [5]. Although surgery for tumor removal is a common treatment approach, radiation and chemotherapy can be used to slow down the growth of tumors; unfortunately, it is impossible to remove cancer physically using radiation and chemotherapy [6]. Doctors or physicians typically use computed tomography or magnetic resonance imaging (MRI) to detect cancer. MRI is a standard scanning technology that provides detailed images of the brain, and it is the most common modality for diagnosing brain diseases [6], [7]. In addition, determining the number, size, and location of lesions in the brain is important for selecting the most appropriate treatment methods for patients [8].
To find extremely small lesions, many researchers have attempted to develop high-performance deep learning algorithms. Wu et al. [9] used an axial MR brain image in their study, wherein gray-level MR brain images were transformed into RGB color images that emphasized a brain tumor. Color segmentation images of highlighted lesions, obtained from the transformed color MRI image data, were utilized in CIELab to identify features accurately.
Akram et al. [10] proposed a brain tumor segmentation and detection method. In this work, input MRI brain images were pre-processed to remove noise using a median filter, resulting in sharper images. This method masked images and applied a windowing technique to detect tumors. These processes reduced false segmented pixels and helped algorithms to detect tumors more effectively.
Dong et al. [11] used a U-Net-based deep convolutional network to develop a fully automatic brain tumor detection and segmentation method. The performance of their developed model was on par with that of the recent challenger model of multimodal brain tumor image segmentation. In this study, brain tumor detection and segmentation algorithms were able to detect lesions; however, U-Net was found to be slow in clinical tasks.
Dahab et al. [12] segmented brain tumors to detect lesions. They used probabilistic neural networks (PNNs) for model learning. The proposed system exhibited a 100% accuracy in the PNN learning performance; however, this accuracy had a limited qualification. In addition, they reduced the processing time of the learning vector quantization-based PNN system compared to the conventional PNN.
In recent years, most researchers have progressed to segmentation. In studies using segmentation, different pre-processed anatomical slices of brain images in the sagittal plane are considered for detection. Brain segmentation performs better than detection, but the segmentation process is slower. In clinical tasks, AI helps doctors to detect and locate tumors. Processing speed is an essential factor in analyzing and finding tumors in many patients. Object detection presents an advantage in finding object locations faster, and its pre-processing is more efficient than segmentation. The most crucial aspect of this technique is locating the tumor. Finding an extremely small tumor is difficult, and sometimes doctors can miss the tumor’s existence. AI detection can help in avoiding such an oversight.
In this study, we compared three different pre-processed MR images to measure the AI performance. We applied the general histogram equalization (GHE) and contrast-limited adaptive histogram equalization (CLAHE) techniques. These techniques enhanced the contrast and revised the MR images such that lesions could be more easily located. To increase the model performance, pre-processing was performed before training the AI; this was a significant step in the process.
II. METHOD
Brain tumor MR images were collected from Seoul National University Hospital, Seoul, Republic of Korea, in 2016. A total of 11,200 MR images were collected from 64 patients. These MR images consisted of the sagittal plane; among all the images collected, 7,875 MR images from 45 patients were used as training data and 3,325 MR images from 19 patients were used as test data to validate performance. For this data set, experts annotated the regions of interest (ROIs) for the lesions using ImageJ software (NIH, Bethesda, MD, USA). The ROIs were indicated through free-hand drawing, and the ROI information was considered the ground truth for the training and testing of the model.
In this study, we utilized Python 3.6 to pre-process the data. For the training of deep learning algorithms, we used a single NVIDIA GeForce RTX 2080Ti GPU (NVIDIA, Santa Clara, CA, USA), Tensorflow 2.0.0, Keras 2.3.1 with Tensorflow Backend, and OpenCV (Intel, Santa Clara, CA, USA).
Fig. 1 presents the overall process of pre-processing and deep learning process. For deep learning training, a convolutional neural network requires identical vertical and horizontal sizes. To train the data, we rescaled the collected images to 512 px vertically and horizontally. MR images consist of 12-bit digital imaging and communications in medicine (DICOM) files. We converted the DICOM extension files into 8-bit JPG files. The data had varying window levels and widths; therefore, we had to mediate the data when converting the DICOM images into JPG images. After conversion, the window level was 1100 and the width was 1500. Our radiologist adjusted the window level and width to make the MR images clearer. To enhance and compare the training performances, we applied GHE and CLAHE. GHE enhances the contrast of images and flattens the density distribution [13], and CLAHE divides the image into block tiles and applies histogram equalization [14]. Thus, CLAHE can be emphasized more accurately. These two histogram equalization techniques were used to emphasize the lesions in this study. The resulting contrast-enhanced images improved the model performance. We set the CLAHE clip limit to 2.0 and tile grid size to 8 x 8. In addition, we measured the maximum and minimum values of the x and y coordinates in the ROI. Then, we converted the free-hand drawings into the box-shaped ROIs. Fig. 2 shows the converted MR images. Fig. 3 demonstrates the histogram of the MR images.
In this study, we used RetinaNet for learning and testing the detection of brain tumors. RetinaNet is well known for deep learning detection algorithms with a good training speed. RetinaNet utilizes ResNet 152 as a backbone; we used ResNet 152 for more specific learning. Lin et al. [15] determined the following:
RetinaNet is a single, unified network composed of a backbone network and two task-specific subnetworks. The backbone is responsible for computing a convolutional feature map over an entire input image and is an off-the-self convolutional network. The first subnet performs convolutional object classification on the backbone’s output; the second subnet performs convolutional bounding box regression (Fig. 4).
Moreover, the performance of RetinaNet is as fast as that of a one-stage network. The one-stage detector has a class imbalance problem; however, a new function has been suggested to resolve this problem. In our study environment, we employed eight batch sizes, 100 epochs, a learning rate of 0.00001, and an image size of 512 × 512. Additionally, transfer learning from ImageNet was applied to the model.
In this study, we verified the model performance using a confusion matrix for the three types of pre-processed images. The confusion matrix is a method for comparing the predicted values with the actual values. This technique is used to calculate the sensitivity, precision, and false positives per image. We verified the model detection performance using the free-response receiver operating characteristic (FROC) curve. To illustrate the FROC curves, we calculated the sensitivity and false positives per image; for the sensitivity, we used a 95% confidence interval.
III. RESULT
In the proposed method, we verified the model performance using 3,325 MR images (the test set) from 19 patients. We compared the original images with those pre-processed through GHE and CLAHE. Fig. 5 illustrates the detection result of three different kinds MR images. The original image in the Fig. 5, model failed to find the location of the tumor. However, model found the tumor on the GHE and CLAHE MR images. GHE and CLAHE increased the contrast of the images, emphasizing the area of the lesions.
The measurement of model performance involved evaluation of the sensitivity, false positives per image, and precision. To calculate these values, we obtained a confusion matrix. In the confusion matrix, a true positive suggested that the model detected a tumor correctly, a false positive indicated that the model incorrectly detected normal tissue as a tumor, and a false negative indicated that the model failed to find a tumor. The sensitivity, false positives per image, and precision were calculated using equations (1) to (3). The calculated values are presented in Table 1.
In the original test set data, compared to the pre-processed images, the normal sensitivity was 80.06%, precision was 95.85%, and false positives per image were 0.038. The GHE result revealed a sensitivity of 80.63%, precision of 94.58%, and false positive per image were 0.050. The CLAHE result indicated 81.79% sensitivity, 94.02% precision, and 0.057 false positives per image.
From these results, it can be concluded that using CLAHE pre-processing yields the best performance values among the different approaches. Furthermore, Fig. 6 presents a FROC curve to analyze the model performance. This FROC curve demonstrates three comparison data for variation in sensitivity with false positives per image.
IV. DISCUSSION
In this study, we compared three types of pre-processed MR images to evaluate and improve model performance. In addition, we utilized transfer learning from ImageNet to enhance the learning effect. For the detection model, we chose RetinaNet with a two-stage network, which was as fast as a one-stage network. Brain tumor MR images present an inevitable class imbalance problem: the number of regular MR image slides exceeds that of the abnormal slides. To solve this problem, we used the focal loss function contained in RetinaNet. In the definition of focal loss, the importance of the easy example is reduced to focus on hard negatives [14]. Therefore, this model is a suitable alternative to segmentation for detection of brain tumors.
Detecting brain tumors is challenging because of the varying shapes and sizes of tumors. To provide a better learning solution to the tumor detection model, image pre-processing was a significant step. Pre-processing emphasized the lesions, which the model could then distinguish from the normal area. In gray-level images, histogram equalization techniques provided contrast enhancement, changing the intensity of similar pixels [15]. CLAHE exhibited the best model performance for detecting brain tumors. CLAHE is a type of histogram equalization that divides an image into block tiles and contrasts, thus limiting noise [16], [17]. This pre-processing technique is an improvement over histogram equalization, which reduces the noise. As a result, the RetinaNet training algorithms were easy to learn and demonstrated a high accuracy.
In this study, we compared three differently pre-processed images. The results revealed a small gap in the model performance. However, the AI found more lesions in the CLAHE pre-processed images than in the other pre-processed images. The limitation of this study was that only three types of pre-processed images were considered. Compared to other types of tumors, brain tumors have different patterns, sizes, and shapes. To increase the detection performance, certain pre-processing tasks must be implemented before detection, such as normalization and data augmentation. For example, the data augmentation process reduces the weight of the MR images, thus the improving the model learning speed and accuracy. Furthermore, there are some evolved models that exhibit good performance and speed. Utilizing recent models and well-processed training images can improve the accuracy of tumor detection. To find tumors accurately, further research must be conducted, and irregular patterns of extremely small tumors must be considered.
Moreover, there are mask regions with convolutional neural network features (Mask R-CNN) technology that we plan to use in future works. Mask R-CNN is the combination of Fast R-CNN and region proposal network. Mask R-CNN has a classification, localization branch, and mask branch [18]. Faster R-CNN is the model to use in object detection, but Mask R-CNN aims to use in image segmentation. Mask R-CNN improves the segmentation’s processing speed. It makes up the shortcoming of segmentation, and it would be helpful in future studies in finding tumors.
V. CONCLUSION
In conclusion this study was based on deep learning detection algorithms, and we proposed a method to increase the detection performance of such algorithms. In future studies, we will develop and improve detection algorithms for AI used in hospitals. We plan to research the detection of brain tumors using recently developed models and well-processed data. Finally, we plan to gather more variable tumor MR images and develop a sub-decision system that supports doctors in finding extremely small tumors and irregular patterns of brain tumors.