I. INTRODUCTION
Users are connecting and staying online for a lot longer now due to recent advancements in computing and the COVID-19 pandemic. A paradigm shift in computing, such as virtual reality (VR), extended reality (XR), and the metaverse, is significant in this regard since it allows users to gradually integrate their real lives into the interconnected virtual worlds known as the metaverse [1-2]. The metaverse roadmap identified four key technical components of the metaverse as virtual worlds, mirror worlds, augmented reality, and lifelogging [2]. In modern metaverse platforms, a user is represented by a graphically produced avatar with a wide range of look and interaction options [3-4]. Researchers and consumers might wonder which approach is viable and effective in different contexts. Indeed, it would be beneficial to present a well-articulated distinctions in the form of a taxonomy to comprehend various research topics and techniques [5]. In this research, to serve many stakeholders in the metaverse by preparing a concrete reference, we propose a taxonomy of avatar-based interaction (ABI) in the metaverse to highlight elements of our taxonomy and to exhibit diverse qualities.
II. RELATED WORK
Avatars have been used to graphically portray users from the real world who are aesthetically dissimilar or similar in appearance since the beginning of metaverse research [6]. Avatar generation, avatar interaction, and the effects of employing avatars have all been the subject of recent studies.
Avatars can already be created from a single image by re-constructing a high-quality, textured 3D face with neutral expressions and normalized lighting [7]. Likewise, Bao et al. have shown off a fully automatic system that uses a consumer RGB-D selfie camera to create high-fidelity, lifelike 3D digital human heads [8]. The avatar’s appearance has an impact on training [9]. Such lifelike avatars displayed higher copresence, interpersonal trust, and increased attention concentration through facial expressions [10].
A VR communication system was created by Aseeri et al. [11] using an avatar that imitates user gestures, facial expressions, and speech. Additionally, a full-body moving avatar demonstrated the highest co-presence and behavioral dependency [12]. For the augmented reality system, Wang et al. investigated various full-body avatar design types (i.e., body alone, hands and arm, and hands only) [13]. According to Ma and Pan, the cartoon-like avatar made it simpler for participants to manage their facial expressions [14].
The impact of an avatar’s body part visibility on the gameplay and performance of VR games was investigated by Lugrin et al. [15]. A body “avatarization continuum” [16] was presented by Genay et al. to illustrate various levels. The Proteus effect, which describes how people’s views and behavior mimic those of their avatars, makes this type of avatar research crucial [17]. Electroencephalography (EEG) was used by Gonzalez-Franco et al. to track participants’ brain activity as they interacted with look-alike avatars [18]. While others researched advanced subjects on avatar personalization and visual quality effects, Kegel et al. reported avatar facial expressions that cause diverse brain reactions [19].
III. AVATAR-BASED INTERACTION IN METAVERSE
Our proposed taxonomy for ABI in the metaverse is shown in Fig. 1. Four first-level criteria, including avatar types, flexibility, fidelity, and interaction, make up the taxonomy. To make the taxonomy concise and understandable, there are typically four to five second-level criteria. These characteristics are briefly discussed in this section.
The various ways an avatar is graphically generated and displayed on a display are referred to as avatar types [6,8-9]. Users interpret various avatar types based on their visual characteristics.
-
2D Avatars: The avatar is rendered on a two-dimensional (2D) display using a 2D coordinate system. Examples include cartoonish and pixel-based characters.
-
3D Avatars: The avatar is either rendered using a three-dimensional (3D) coordinate system or drawn in a way to visually make the avatar appear in 3D.
-
VR Avatars: This type of avatar is associated with the specified 2D or 3D virtual world and constrained by the rendering properties of its virtual world.
-
Digital Human: This type of avatar attempts to represent a real user, historical figures, and fictional human-like agents realistically at the highest resolution and quality.
The various metaverse applications’ platform, software, and hardware requirements all affect the current ABI. Additionally, users can create and customize avatars [20]. Flexibility describes both dependency and customization of ABI.
-
Hardware Dependency: ABI may use a special hardware (i.e., HMD, smartglasses, smartphone, PC) to perform as intended.
-
Software Dependency: ABI may use special software (i.e., Android, macos, Windows, Unity, Unreal Engine) to perform as intended.
-
Platform Dependency: ABI is categorized as only usable in a proprietary platform or available across different metaverse platforms (i.e., cross-platforms).
-
Customization: ABI is customizable by users. For example, users can change the look and feel of avatars through different items (i.e., clothes, accessories, forms).
Avatar can be highly detailed or simply abstracted. Fidelity refers to this characteristic of visual properties [21].
-
Abstract: The simplest abstraction is used to represent avatars. For example, an avatar may be represented as a dot or with a person icon.
-
Low-Fidelity: The avatar is low-fidelity in design if there is a small number of features to distinguish one’s avatar from other avatars. In other words, the avatar is limited in its expressivity.
-
High-Fidelity: The avatar is high-fidelity in design if there is a greater number of features to distinguish one’s avatar from other avatars. In other words, the avatar is highly expressive.
-
Photo Realistic: The avatar is photo-realistic if the realistic depiction is used to create an avatar that is self-identifiable.
-
Life-like: The avatar is considered life-like if it can behaviorally act like a real person (i.e., walk, run, jump, dance, smile, cry).
Interaction defines the ways an avatar or a user behind the avatar operates an avatar to communicate in the metaverse [25-26].
-
GUI: Graphical user interface is a typical interface we use with a WIMP (windows, icons, menus, and pointers) metaphor. For example, an avatar can be controlled using a mouse and a keyboard.
-
Chat: An avatar can communicate with text chat or voice/video chat with other avatars. For example, a text bubble or altered voice may be used.
-
Facial Expression: An avatar can make various facial expressions such as smile, frown, eyebrow, and lip movements.
-
Hands & Limbs: When interacting, users see avatars making gestures and postures using their hands and limbs.
-
Full-Body: Avatars can make full-body movements where many body parts are used to make active physical motions.
IV. APPLYING THE TAXONOMY
As shown in Table 1, our proposed taxonomy can be used to describe various metaverse applications [1] and business cases [4]. We briefly discuss 7 scenarios categorized into 3 groups in this section.
A VR avatar that can be modified with a purchased fashion item may be used in a commerce scenario. A VR avatar that can be modified with a purchased fashion item may be used in a commerce scenario. High-fidelity and full-body interaction must be offered to support these functionalities. We could implement an abstract or low-fidelity GUI-controlled avatar for navigation scenarios. Similar to consumer use cases, industrial use cases do not require workers’ avatars to be indepth because the application’s main focus is on the industrial apparatus depicted as a digital twin. Fig. 2 depicts situations for business, navigation, and industry.
Personal and social scenarios are shown in Fig. 3. Highfidelity, digital human, photorealistic, self-identifiable avatars with expressive faces should be offered in personal contexts. The crowds for a social event like concerts can be depicted modestly, but the artists with detailed renderings.
Formal settings located in a workplace and school are shown in Fig. 4. So that multiple users with various hardware, software, and platform requirements can engage in these applications, hardware, software, and platform dependencies should be carefully designed and managed. Additionally, approaches that require reduced computing resources (like as conversation, a graphical user interface, and low-fidelity) can be adopted when an avatar group grows in size. A group of five employees would benefit more from a 3D avatar type, but a classroom of 100 avatars could be implemented with a 2D avatar type.
Our taxonomy can be used to describe current metaverse platforms and services. Table 2 shows four metaverse plat-forms (Gather, ifland, Roblox, ZEPETO) characterized by our taxonomy criteria. Currently, all of the metaverse plat-forms we examined are compatible with running on mobile devices. Due to this hardware dependency, interaction methods are constrained to GUI and chat (text, voice, and video). Another interesting observation is that all of these platforms encourage avatar customization.
Recent technical developments would be advantageous for the seven scenarios that were covered. Platforms for the metaverse will leverage low-latency broadband networks made possible by 5G and 6G to provide avatars. When connecting the virtual and physical worlds using extended reality and block chains with higher levels of security and protection, rendering that is abstract or highly detailed will need a lot more data to be maintained and transmitted through the cloud. Distributed platforms will improve interoperability and reduce dependency problems in platforms, devices, and software. In addition, media and spatial objects that recreate the portrayal of the real world will be combined with user-representing avatars in the form of digital humans aided by artificial intelligence (AI). To achieve this, Internet-of-Things (IoT), collaborative robots, and digital twins will all contribute to a fully human-in-the-loop metaverse. Avatars will interact with such augmented spatial objects and spaces in the metaverse through real-time user interface (UI) and user experience (UX).
V. CONCLUSION
In this paper, we proposed a taxonomy for avatar-based metaverse interaction. We defined each criterion and applied it to several use cases using the lenses of avatar types, flexibility, fidelity, and interaction. We anticipate that by defining scenarios in concrete terms and laying up the required technologies, our taxonomy can aid researchers working in the field of avatar-based interaction. Researchers can, for instance, investigate the various avatar types and interaction modalities listed in our taxonomy to appropriately situate their services in various application settings.