I. INTRODUCTION
To improve the quality of life of people with disabilities, achieving self-sustaining economic stability is the most important issue 0. The rate of participation of persons with disabilities in economic activities is the highest for those with at least a university degree, and the rate of employment exhibits a similar trend. Hence, the education and employment rates of the disabled are closely related 0. Therefore, the university attendance of the disabled is an important process for increasing the possibility of their participation in future economic activities. This requires an environment where people with disabilities can learn by themselves at school and at home.
Although learning methods using various assistive and smart devices to enable independent learning by the disabled exist, they are inconvenient for the disabled to use on their own 0. In the existing smart device environment, general users use a graphical user interface (GUI) environment with the windows, icons, menus, pointers metaphor to record, retrieve, and reuse information. To control the device and contents in the existing GUI environment, it is critical for the disabled to control a pointer using a mouse. However, this can be difficult, depending on the disability type 0. Hence, technology is being developed to enable the disabled to use a GUI.
Most applications for all blind people are the only means of manipulation by touching the screen. Touching the screen to operate is a practically impossible task for all blind people without visual ability. To compensate for this, a voice recognition function was added, but applications using the voice recognition function only provide text input function, so there is a limit to controlling the device [5, 6].
People with low vision can browse the web by enlarging the screen with the help of special-purpose auxiliary devices such as web screen readers [7, 8]. However, the biggest problem is that accuracy is degraded in selecting or manipulating small menus and click objects of web browsers and web contents. To compensate for this, the current smartphone device provides voice guidance services that combine touch and voice, such as VoiceOver [9] and Talkback [10], but there are limitations in expanding and selecting content, including GUI and TTS control problems. Since unnecessary content is also outputted by voice, it causes fatigue to be used by people with low vision.
The gaze tracking technology is receiving the most attention as a technology for web browsing of upper limb disabilities. In the existing research, when controlling an object through gaze tracking, an execution command is performed on a pointed object such as a mouse click through the dwell time [11] or eye blink [12]. However, the dwell time and eye blink method for pointer execution cause a problem of repeated pointer execution malfunction when the user wants to see only the screen without giving a gaze command.
However, the provision of consistent technology is limited for various types of disabled people. For example, a blind person cannot select and execute contents and menus because a screen is not visible to him/her. People with poor vision experience difficulty in selecting and executing contents and menus in a GUI, which are typically small, whereas people with upper-limb disability experience a malfunction problem when using eye-tracking technology to manipulate a pointer.
Therefore, it is essential to solve problems that occur when people with various types of disabilities control a web interaction using a GUI in a web and smart device environment. Web interaction means the interaction that occurs when the screen is recognized, manipulated, and navigated in a web environment.
In this study, a multimodal interface pilot solution is presented to enable people with various disability types to control web interactions more easily [13]. The multimodal interface is an interface that combines two or more NUI technologies.
First, we classify web interaction types using digital devices and derive essential web interactions among them. Second, to solve the problems that occur when performing web interactions considering the disability type, the necessary technology according to the characteristics of each disability type is presented. Finally, a pilot solution to the multimodal interface for each disability type is proposed.
II. MODELING OF WEB INTERACTIONS
To utilize the interface design for each disability type, the essential interactions were derived by classifying web interaction types for content and menu manipulation. The analysis was conducted in three stages to classify the interaction type and operation details for access control and web content for devices systematically. First, we analyzed the Windows GUI interaction types and web browser interaction function, and then, the interaction types of the two environments were matched and categorized. Finally, we used the web browser interaction function frequency and the result of the previous step to extract the final interaction type and function.
Windows GUI interaction types can be classified into six categories 0, 0: the menu, move–grow, text, trace, new point, and angle interactors. These include all types of interaction methods used in mouse-based GUIs including text inputs.
To extract the web browser interaction function, we used the service task from the analysis of the digital information level of the disabled 0 in Korea. The surveyed service tasks were information and news search, e-mail communication, multimedia content service use, Social Networking Service, and cloud service. We investigated the interaction functions that occurred while using the Internet service and recorded the number of occurrences per interaction.
Based on the results obtained by matching the interaction types of the two environments, seven interaction types for accessing web content were discovered: the menu, move–grow, text, trace, object select, display control, and mixed interactors. The seven newly classified types of interactions excluded the angle and new pointer interactors from the Windows GUI. The angle interactor is an interaction type that performs angle calculation, whereas the new pointer interactor creates graphic objects such as rectangles. However, the analyzed web browser interaction functions cannot work similar to these two interactions types. The object selects, display control, and mixed interactors, which were newly observed in the web browser environment, were extended.
As the final step in the modeling of interactions, we extracted the interaction types and functions that are frequently used in the newly classified web interactions. The frequency of use is the number of times the web browser service is recorded in the previous section. Only the interaction functions with a score of at least three points from a total score of five points were extracted. The interaction types and functions of the interaction modeling are presented in Table 1.
Table 1 details the interactions for controlling the GUI result by moving the pointer and Pointer executing (clicking). This means that the movement and execution of pointers are the most important considerations for manipulating the GUI. Therefore, pointer movement and execution were determined as the reference points when designing the interface for each disability type. Finally, we analyzed the problems in performing web interactions according to the categorized disability types and suggested some solutions. The validity was confirmed by interviews with experts and user groups 0.
III. CLASSIFICATION OF DISABILITY TYPES
This section classifies the disability types based on the smart-device usage environment and derives the characteristic technology for each type to compensate for the limitations of web interaction control. Traditionally, disabilities are classified based on physical aspects; however, this study reclassifies the disability types based on Korea’s Disability Welfare Act and device accessibility requirements. Device accessibility requirements were applied to the smart-device usage environment to perform the necessary web interactions derived in Section 2.
The classification targets were derived based on two criteria. First, the disability types specified in the Disability Welfare Act of Korea, which have no restrictions regarding the operation of the device, were excluded. Consequently, visual impairment and upper-limb disorders of retardation and brain lesion disorder were derived.
Second, functions that require the execution of essential web interactions (pointer movement and execution) are classified into visual perception and hand operation ability, as shown in Table 2, and the disability types selected in the previous stage were classified according to the degree of visual perception and hand operation ability.
Based on the device accessibility criteria, assistive devices, and natural user interface (NUI) technology, we investigated whether essential web interactions can be performed according to the reclassified disability types. The device accessibility-criteria analysis included checking whether the software screen can be recognized, and the viewer menu operated. The auxiliary device and NUI technology were analyzed based on the possibility of auxiliary device operation and voice and gaze interface use. Finally, based on the analysis results, solutions were presented for each disability type. Table 3 presents the results of analyzing the possibility of performing essential web interactions and the use of alternative technologies according to the disability type.
In addition, the speech-based characteristic technology was derived because speech can be used for all disability types. The results are as follows: some hand-available user groups and hand-enabled groups in the blind group provided simple auxiliary controls and voice input and output. The low vision group suggested that the contents can be identified by expansion and voice output technology. The group with upper-limb disability, whose hands cannot be used but who had normal vision, suggested the eye-tracking and voice-command technology as an alternative to control smart devices without the limitation of hand usage.
Finally, the following can be described as the disability type grouping. Validity of the disability type grouping and the characteristic technology of each group were examined through interviews with experts and real user groups. The results indicated that the user group classification based on device usage function was appropriate, as was the classification and technology of the derived user group.
IV. DEVELOPMENT OF PILOT SOLUTION FOR EACH DISABILITY TYPE
Based on the results of the study presented in Sections 2 and 3, we developed a multimodal interface with solutions based on the types of disabilities.
Android-based mobile memo applications were developed to enable free voice memos and control for blind people [18]. The application instantly recognizes menus with voice output and applies Bluetooth remote control and voice functions to freely navigate through multilevel menus or folders.
We designed the menu control interface for multi-level menu navigation and execution by mapping it with the buttons on the remote control. The remote control is provided with up, down, left and right direction buttons and execution and pause button. The multi-level menu operation and execution control interaction was designed using the corresponding button. The up and down buttons used to navigate the menu list displayed on the same level. Left and right buttons move between upper and lower menus. Left button moves to upper menu, right button moves to lower menu. At this time, the right button moves to the sub-menu and does not execute the menu. Actual execution of the corresponding menu is performed by the execution button, and the pause button is a stop for executing the sub-menu during voice recording.
For people with low vision, we developed a voice browser for mobile environments based on Android. A selective focusing technique that only selects the desired area within the web content and expands it in the selected order or outputs it by voice was applied.
Previously, if the content was enlarged by the radius of the moving area of the pointer, this study expanded the range of selection to access the content of the web document and select only the desired part by item or sentence unit. Here, the content is set to the content of the menu and body area of the web page that the user can browse. As content selection methods, individual element selection and range selection methods were presented. The individual element content selection is designed to individually select the content of the screen in units of sentences or paragraphs. The range selection was designed to be selected at once, even to an ancestor node containing a selection sentence through a double tap function. In addition, a yellow background color was designated for the selected area so that it was intuitive to know whether it was selected.
We developed a PC-based eye-tracking and voice-command web browser called Eye-Voice for persons with upper-limb disability [19]. The pointer execution method of the existing gaze tracking technology causes a malfunction of pointer execution. These malfunction problems cause the problem of unintended execution of objects and difficulty of pointing due to the small size of execution objects. The eye-tracking technology was used to move the pointer, whereas the voice-command technology was used to perform pointer clicks. Subsequently, we developed a function that automatically expanded only objects that can be clicked in the path of the user’s eyes.
The usability of the pilot solution for each disability type was verified. Tests were conducted to perform evaluations under the approval of the Institutional Review Board, which has been established and operated to secure human bioethics and safety for human subject research conducted at Sookmyung Women's university in Korea, as prescribed by the Act on Bioethics and Safety. To ensure voluntary participations, subjects were recruited through a recruitment notice posted on the researchers’ university bulletin board and on the website of a relevant research center.
There were a total of 10 participants in each experiment; they wore an eye patch for the blind test. In the experiment involving participants with low vision, those with myopia removed their glasses. In experiments involving participants with upper-limb disability, the participants were restricted from using their arms. The experimental results are presented subsequently.
To verify the usability of the Bluetooth remote control voice output application for blind people, we conducted an experiment comparing the efficiency and accuracy of the menu with those of the “Voice Note” application. The result indicated that the Voice Note application required 20.85 s and 0.98 times, whereas Voice Memo (our application) required 36.24 s and 1.00 time. The number of retries between the two applications did not exhibit any significant difference in a T-test of t = 0.31. This means that Voice Memo will require a slightly longer time while the blind person is using the remote control, compared to a sighted person using the screen; however, the accuracies were the same. Table 4 is the average of the results measured by conducting the experiment twice for each task.
Task (#) | Voice Note | Voice Memo | ||
---|---|---|---|---|
Duration (sec.) | Retries (count) | Duration (sec.) | Retries (count) | |
1 | 27.55 | 1.5 | 36.41 | 1.36 |
2 | 15.77 | 0.64 | 31.82 | 0.86 |
3 | 19.23 | 0.82 | 40.50 | 0.77 |
To verify the effectiveness of the selective focusing mobile voice web browser for the visually impaired, a comparison experiment with the “android talkback” service was conducted. In the experiment, the accuracy of element selection, shortening of reading time, and satisfaction were compared. The experimental results are shown in Table 5.
The results indicated that the talkback service required 1.02 times and 56.48 s on average, whereas the voice web browser required 0.29 times and 40.16 s. The results of the T-test confirmed a significant difference (p = error rate 0.042, execution time = 0.025 s). Compared to the talkback service, the voice browser reduced both the error rate and execution time such that only the desired area can be read quickly and accurately. The satisfaction assessment confirmed the effectiveness of the selective focusing technique and reduced user fatigue.
Comparative experiments on the Eye-Voice web browser for people with upper-limb disability were conducted to verify the reduction in the malfunction of pointer execution with existing gaze interfaces (blinking, dwell time). Table 6 shows the results of experiments for each interface.
The results indicated that the dwell-time interface required 1.86 times and 104.70 s; the blinking interface required 1.99 times and 105.01 s; the Eye-Voice interface required 1.73 times and 83.65 s. According to the T-test for the number of retries, no significant difference was observed between the Eye-Voice and the dwell-time interfaces; however, a significant difference was observed in the comparison with the blinking interface (p = 0.08; p = 0.002). In the T-test, the measured execution time was significantly different from that of the Eye-Voice at both interfaces (p = 0.0006; p = 0.02). This means that the Eye-Voice reduced the malfunction rate of the pointer execution, thereby verifying its effectiveness.
V. CONCLUSION
In this study, problems of interaction control in the web environment were analyzed, and solutions for people with various disability types were suggested. Essential interactions in the web environment were derived, and the disability types were reclassified in terms of device accessibility. In addition, we analyzed the problems for each disability type when performing web interactions and suggested solutions. Furthermore, we developed a pilot multimodal interface to apply the solution.
We developed a remote-control operation voice interface for blind people and confirmed that no significant difference was observed from sighted people using the screen. By developing a voice output interface applying the selective focusing technique for low vision, we confirmed that the selective focusing function was fast and effective and reduced user fatigue. Finally, we developed a gaze-tracking and voice-command interface for GUI operations for people with upper-limb disability and confirmed that the malfunction rate of the pointer execution reduced in comparison with those of existing gaze-tracking systems.
This study confirmed that improved usability and accessibility enabled people with different disability types to control digital devices more easily. These findings offer independent learning for the disabled through digital devices and the web, thereby leading to improved economic stability for the disabled.