I. INTRODUCTION
High Efficiency Video Coding (HEVC) is the latest video coding standard proposed by the Joint Collaborative Team on Video Coding (JCT-VC). Increasing demand for HD and beyond HD video formats has created a need for HEVC and around 50% bit rate reduction is achieved using HEVC over the previous standard (H.264/AVC) with an equal perceptual video quality [1]-[2].
In the HEVC encoder, a frame is initially divided into slices and each slice could be processed in parallel. To achieve a flexible and efficient block partitioning structure in HEVC, each slice is partitioned into coding tree units (CTU) [3]. CTU is analogous to the macroblock (MB) structure of H.264/AVC, but with a flexible size. Moreover, to maintain a high degree of flexibility, a CTU can be split into multiple coding units (CU) with variable sizes of 64×64, 32×32, 16×16, and 8×8. For this purpose, each CTU contains quad-tree syntax or a coding tree that specifies sub-division into CUs.
For prediction of each coding unit (CU), HEVC generally allows 11 prediction modes [3] using appropriate prediction units (PU). In order to achieve the best prediction, the HEVC encoder calculates rate distortion costs (RD cost) for all possible combinations of PUs and CUs for a CTU. The combination that gives the minimum RD cost is considered as the best mode and CTU is encoded in that CU size with that PU mode. For this reason, the RD optimization (RDO) technique requires many RD cost calculations. This process significantly increases the encoding time of the HEVC encoder. The RD cost optimization (RDO) is one of the most curtail and decision-making phenomenon in terms of the perceptual quality of the encoded video. The traditional Lagrangian multiplied based cost function is used in the RDO. The RD cost value is calculated using (1), where D is the distortion (calculating using SSE), R is the rate value and λ is the corresponding Lagrangian multiplier.
A simplified flow chart of the RDO technique in the HM is shown in the Fig. 1. In this diagram, we did not consider the CU level optimization. In HM, for a given CU structure, the RD cost values of every possible combination of PU modes are calculated. The first RD cost is calculated for the 2N×2N. The PU mode with the lowest RD cost is considered as the best mode and the corresponding RD cost is Jbest. In the Loop, every time the Jbest is compared with the Jtemp and the corresponding value is updated. Since, the RD cost value is a linear combination of rate and distortion, most of time due to little improvement in the RD cost value, highly distorted blocks are selected. In this paper, we have explored the RD cost calculation of HEVC encoder and come up with a novel two-step RD cost calculation technique. In the proposed method, along with the conventional RD cost calculation method, the structural similarity index (SSIM) is also used. Moreover, we propose an approximated version of SSIM, FSSIM, which has less computational complexity compared to SSIM.
In the next section, an analysis of the RDO in the HEVC encoder is discussed. The proposed two-step algorithm described in section 3. Experimental results are shown in section 4. Finally, conclusions of the proposed work is drowned in the section 5.
II. ANALYSIS
The RD cost value of the HEVC encoder is a linear combination of rate and distortion. In the previous section, the RDO of the HEVC encoder is described. In this section, we analyze the number of selected CUs that is selected by RDO in the HEVC encoder, but have higher distortion value than the non-selected one. A very high-level architecture of this analysis is shown on the Fig. 2. In this experiment, we have checked the Jbest and Jtemp values if the condition Jtemp > Jbest is satisfied (Figure 2). After that the distortion values of the best and temp modes (Dbest and Dtemp respectively) are checked. If the conditions (2) is satisfied, then only the count for the number of CUs is increased. In this expression, ΔRD represents (Jtemp - Jbest) and δ is threshold value.
We have analyzed this phenomenon for 7 sequences (BQTarrace, Cactus, Nabuta, ParkScene, PeopleOnStreet, SteamLoconotiveTrain, and Traffic) with first 100 frames at target bitrate 4 Mbps. In this analysis we have plotted the number of CUs which are selected by the RDO of the HEVC encoder having higher distortion than the non-selected one (Figure 2). In this context we have restricted the RD cost increment by using a parameter δ as shown in the flow chart in the Figure 3.
From this analysis, it is quite clear that the RDO in the HEVC encoder could not accurately select the lowest distorted block. Moreover, if the RDO condition is relaxed only 10%, then at the order of thousands of CUs can be selected with better subjective quality.
III. PROPOSED WORK
From the analysis discussed in the previous section, it is quite clear that good number of CUs selected as best mode in HEVC which has RD cost slightly better than the second best, but the distortion in the best CU is quite higher than other modes. One way to solve this problem is to introduce an extra checking for the distortion.
There are different ways to check the distortion or subjective quality of a CU. In the HEVC, only mean square error (MSE) or peak signal to noise ratio (PSNR) is used as the objective quality metric. However, PSNR is generally weakly correlates with subjective quality. Good amount of literatures provides the evidence that SSIM is stable and reliable perceptual quality metric than MSE. In [4], it is shown that PSNR is an unreliable video quality metric when different contents are considered in the test conditions. A relationship between SSIM and PSNR is shown in [5]. Moreover, in [6] a SSIM based RDO technique has been proposed for H.265/AVC and it has been shown that the proposed scheme can achieve significantly better rate-SSIM performance and provide better visual quality than conventional RDO coding schemes.
The abovementioned references, point out that if we can use SSIM in RDO for HEVC, then it should enhance the overall visual quality. In our first attempt, we have developed an algorithm called two-step RDO algorithm. The block diagram of the proposed algorithm is shown in the Figure 4.
Like the HEVC rate-distortion optimization, initially the RD cost is calculated for the 2NX2N block. Along with that SSIM value is also calculated. As shown in the Fig. 4, in the step 1, the normal RDO is performed. In the second step, criteria stated in (2), is checked. If this criterion is satisfied, then only a SSIM based distortion condition is checked. This is the extra checking in this algorithm. Moreover, best RD cost and best SSIM values are updated for the CTU.
In the above-mentioned algorithm, one of the most important block is the SSIM calculation. According the literatures, SSIM can be calculated using the (3).
All the abbreviations shown in (3) are described in [8]. In this equation μx and μy are the mean values of the reference and current blocks respectively. Standard deviations and covariance of the current and reference blocks are represented as δx, δy and δxy respectively. In this equation, c1 and c2 are the constants. However, the computational complexity of (3) is quite high. Hence, a fast but accurate SSIM calculation algorithm is required. In the [7], a fast SSIM calculation technique is proposed (called dSSIM) using MSE which is given below:
Where δx2 is the variance of the current block and c2 is the constant. In the (4), MSE represents mean square error value. Now in this approach, too many assumptions are taken which makes it inaccurate. To overcome this problem, we have developed a novel fast SSIM calculation algorithm with less approximation. Our proposed Fast SSIM (FSSIM) calculation technique is given below:
of mean square error (MSE). The MSE value is calculated in any modern hybrid video encoder. So, by reusing this in the FSSIM calculation, reduces the complexity of FSSIM calculation. Apart from the MSE, only mean values (μx and μy) of the current and the reference blocks are used here.
In the overall algorithm, we have used this proposed FSSIM calculation technique. Hence, in the proposed two-step RDO algorithm, the FSSIM value calculation is even optimized by the proposed technique.
IV. PERFORMANCE ANALYSIS
The proposed algorithm has been implemented on the HM reference software version 15.0 [9]. The main purpose of this work is to enhance the perceptual quality of the encoded video stream. One of the most challenging problems in this experiment is the evaluation of subjective quality of an encoded bitstream. However, the best way to evaluate the subjective quality is by using human
Where
The proof of the proposed FSSIM calculation is given in the appendix. In the (5), R is the ratio of mean values of current and reference blocks respectively. E[x] is the expectation or mean of the reference block. Just like (3), c1 and c2 are the constants. Δ is the square of the difference of the mean values of current and reference blocks. The mean square error is represented here as MSE.
The FSSIM is an approximated version of SSIM value, which has less computational complexity than SSIM. In (5), C1 and C2 represent constant. The value of these constants is same as the SSIM expression (3). The most important part of the (5) is that, the FSSIM is expressed as a function intervention, by checking the quality of the bitstream and then score manually. This technique is generally referred as mean opinion score (MOS), where five levels of accuracy is used.
This method of subjective quality evaluation is highly time consuming and expensive. Moreover, the score varies person to person. So, at least 20 scores are accumulated from different person and the mean of these points is use as a single entry. However, there is another way to calculate the subjective score, by using suitable tools. In this experiment, we have used Tektronix tool to evaluate the subjective quality. This tool provides Differential Mean Opinion Score (DMOS) value. In the DMOS value, the lower value is better.
In this experiment, we have used three videos: BQ Terrace, Park Scene and Cactus with resolution of 1920X1080. The target bitrate is 4 Mbps. We have compared our result with HM reference software version 15.0. The results are shown in the Table 1. There are four δ values are used in this experiment.
From the Table 1, it is quite clear that when the δ value increases, DMOS value is also increases. In other words, subjective quality decreases when the δ increases. On the other hand, for BQ Terrace, Park Scene sequences the subjective quality of the proposed algorithm is always better than the HM 15.0. In the Cactus sequence quality wise, we do not get much improvement. However, in terms of encoding timing, proposed algorithm always provides superior results than the HM 15.0.
V. CONCLUSIONS
In this work we have proposed a novel approach to enhance the overall rate distortion optimization process of HM reference software. The proposed approach can improve the overall subjective quality of the HM. The proposed algorithm is performed in two steps. In the first step, like HEVC, it performs general rate distortion optimization. The second step is an extra checking where a SSIM based cost is evaluated. Moreover, a fast SSIM calculation technique is also proposed in this paper.