logo
ResearchBunny Logo
Paintings in naked-eye virtual reality: a parallax view between the surface and volumetric depth

The Arts

Paintings in naked-eye virtual reality: a parallax view between the surface and volumetric depth

Y. Wang and H. Zhang

Dive into the fascinating world of naked-eye virtual reality (VR) video, where 2D paintings are transformed into a 3D experience. This engaging article by Yiwen Wang and Huiyu Zhang offers an interdisciplinary exploration of how this innovative approach challenges traditional perceptions of art and media, allowing viewers to step beyond the surface of images.... show more
Introduction

A novel genre of videos, labeled as "naked-eye virtual reality" (VR), has gained viral status on Bilibili, one of China's most popular video streaming platforms. These video creators contend that spectators can perceive a stereoscopic effect on a two-dimensional (2D) screen without requiring a VR headset. Three distinct types of naked-eye VR videos have attracted Bilibili audiences. The first is a roller coaster ride that transports spectators from ocean depths to the sky. The second simulates a spacecraft journey that propels spectators into outer space through wormholes, sometimes even taking them into the maw of a black hole. The third sub-genre introduces spectators to canonical works of art history by allowing them to step into the canvas or scrolls. In contrast to the first two genres, which transport spectators to spatial realms characterized by volumetric depth, the third type creates the illusion of a three-dimensional (3D) world on a 2D surface. Rather than enabling spectators to ascend to the sky or dive deep into the ocean, it generates the illusion of penetrating the painting's flat surface. Therefore, the aesthetics of naked-eye VR can be summarized as 2D moving images that fool the naked eye using 3D depth cues.

Our perception of the 3D world in which we live is the fusion of two images captured by the naked eye. The depth cue that the brain extracts is the difference between the two images (Urey et al. 2011, p.541), and this depth cue is referred to as "binocular disparity"—the disparity between the images of an object perceived by the two eyes. Wheatstone (1962) conceptualized binocular disparity—that is, the differences in distance between the observed images due to the interocular distance—as the "horizontal parallax". These differences result in an apparent horizontal shift in the position of the same object (Hattler and Cheung 2023, p.18). Horizontal parallax and binocular disparity were first observed by Euclid, and Wheatstone took advantage of them to invent the first stereoscopic device in 1852 (Wade 2002, p.913). They also allow the generation of stereoscopic vision in 3D films (Hattler and Cheung 2023, p.18), as well as 3D layouts in VR games (Aizenman et al. 2023, 2).

Situated in an intermediate stage between the stereoscope, a device that predates the birth of cinema, and VR with a headset, naked-eye VR occupies a parallax intersection in media history. Parallax history refers to the rediscovery of obsolescent technology's aesthetics in new media, and the aesthetics of new media in obsolescent technology. It was introduced into the field of film history by Thomas Elsaesser (2019), who suggested that the visual effects achieved by post-cinematic media could be identified retrospectively in pre-cinematic optical toys and that these obsolescent technologies anticipated the arrival of new media technologies (p.79). The naked-eye VR can be best explored through a parallax history as it sutures the conjuncture between pre-cinematic stereoscope technology and the post-cinematic technology of VR with a head-mounted display (HMD). The binocular disparity that generates a 3D illusion in naked-eye VR can be traced back to the stereoscope. However, naked-eye VR video is a post-cinematic technology that can be described as a "deficient" form of standard VR with HMD as it fails to provide an all-immersive environment with the illusion of a frameless screen. Nevertheless, naked-eye VR has liberated spectators from the confinement of HMD, enabling them to choose their spectatorial position or even produce a naked-eye VR video. Therefore, the position of naked-eye VR in media history falls between retrospection and anticipation, pre-cinematic technology and post-cinematic art, and pre-VR media and post-VR culture.

What connects the stereoscope, naked-eye VR, and VR with a headset is the technical basis of "horizontal parallax". Whissel (2016), however, proposes the "parallax effect" concept that is not only technological, but also epistemic and affective (p.233-235).

Whissel recalibrates "positive parallax" and "negative parallax", two sub-categories of parallax effect accordingly (p.233-235). In its original technological context, positive parallax is a situation wherein binocular focus converges behind the screen, creating the illusion that an object is situated at a volumetric depth behind the screen (Petkov 2012, p.50). Conversely, negative parallax results in an illusory protrusion effect, causing an object to appear to pop out from the screen as the binocular focus converges at the screen's plane (Petkov 2012, p.50). Whissel associates positive parallax with an epistemic desire characterized by "a curious look that sees in order to know, such that the mind feels its way into the very depth of the picture", whereas negative parallax provokes a "somatic and emotional responses in spectators via the emergent 3D images" (Whissel 2016, p.235). Similarly, positive parallax in naked-eye VR promises all-seeing spectators who can look into the image without being looked at, and all-knowing subjects who can explore everything about the image without being included in it. However, in negative parallax, the image breaks the screen and "fourth wall", reminding spectators of the unknown space behind the screen and returning the gaze to the spectators by including them in the virtual space of the image.

Naked-eye VR can be categorized as a parallax media knotting together a parallax view between two incommensurable perspectives. The standard parallax view definition concerns "a change in observational position that provides a new line of sight", usually concerning the shifting position of an object against its background (Žižek 2009, p.17). Žižek (2009) specifies the parallax view as situated in an irreducible gap between an "objective scientific account" in the third-person perspective and a "subjective phenomenon experience" in the first-person perspective (p.10-17). As with the naked-eye VR, the parallax view knots together the "objective" scientific account of two flat images in slightly different positions and the "subjective" experience of spectators seeing a volumetric depth in these flat images. The spectator's position shifts from an "objective" point of view to a "subjective" one when viewing the image in parallax, transitioning from observing a two-dimensional space in third-person perspective to experiencing a three-dimensional world in first-person perspective. Naked-eye VR can, therefore, be defined as a parallax media that mediates the parallax view between the subjective and objective perspectives, first-person and third-person perspectives, and flat surface and volumetric depth.

As a parallax media, naked-eye VR can be categorized as a door that oscillates between its function as a window into the 3D depth of a fictional world and a wall that shuts at the 2D surface of the screen. Before computers became ubiquitous, media theory was dominated by three metaphors: frames, windows, and mirrors. Eisenstein likened the moving image on the screen to a framed painting on an opaque canvas (Andrew 1984, p.12); Bazin theorized the screen as a transparent window open to the world, a perspective that informs realist film theory (Sobchack 1992, p.16); and psychoanalytic film theory conceptualized cinema as a reflective mirror that allows spectators to identify with their mirror image: the character on the screen (Metz 1981, p.51). As the world entered the age of information, Manovich observed that computer screens were fundamentally different from cinematic screens because they oscillated between depth and surface, functioning as both a window into an illusionistic space and a flat control panel (Manovich 2002, p.41).

Mitchell (2015) drew a similar spectrum of a screen as a wall or window and suggested that screens can operate as walls that project the image on themselves and also function as windows transmitting visual information through themselves (p.233-235). Mitchell (2015) specified that walls symbolize the screen's surface on which the images are projected, while windows signify an open door behind the screen to the concrete object in the image and the social context surrounding the image (p.233-235). Sandifer (2011), Hattier and Cheung (2023), and Gao and Jin (2021) proposed that “window frames”, which separate the extra-diegetic space spectators are situated in and the diegetic space inside the screen, disappear in stereoscopic technology. Specifically, Zhou (2023) further argued that the HMD of the VR has shifted the screen’s operational logic from the “frame” that demarcates the real and the virtual space to the “case” that contains the spectators in a “container” (p. 139). Rogers also (2019) highlights the VR screen’s capacity to surround, envelop, or enclose spectators in a “frameless space” (p. 19). However, Rogers (2023) believes that, despite the disappearance of the rectangular screen remains, the frame separating the space on-screen and off-frame of the video as the limit between the frame and the beginning of the naked-eye VR (p.269).

In naked-eye VR, however, the frame reappears at the eyes, and even without the frame separating the vision of the two eyes, the rectangular frame demarcating the screen’s boundary remains. Only when spectators adapt their vision to the horizontal parallax does the frame momentarily disappear, allowing spectators to see a frameless stereoscopic space. The frame’s appearance and disappearance can be best categorised as a “door”, theorised by Siegert (2012) as the symbolic threshold between inside and outside and an epistemic divider separating two worlds (p.10). Siegert (2012) also drew an analogy between a door, gate, and bridge, each suggestive of a pathway to a hypothetical space beyond (p.10). As in the case of naked-eye VR, the opened door operates as a space opening to a 2D image demarcated by the frame, while the closed door is akin to a wall that represents a stereoscopic window beyond the 2D spaces that the screen and frames spectators’ scopic and epistemic desires. Therefore, naked-eye VR is a threshold between the 2D image and a frame that limit of spectators’ vision within the door opened to the volumetric depth from a subjective perspective and shutting at the 2D surface in the objective account.

In what follows, I elaborate on the operational logic of naked-eye VR as a door between the parallax view of the surface and volumetric depth, look and to be looked at, known and unknown. While traditional VR promises a frameless space wherein spectators are all-seeing and all-knowing, naked-eye VR acknowledges the screen’s frame as the threshold of the parallax view; highlighting the stereoscopic illusion generated by flat images perceived by the two eyes. The paintings in the naked-eye VR further exemplify in VR, highlighting the illusion of volumetric depth and the surface appearance of a canvas or scroll. The volumetric depth of the scene promises all-seeing and all-knowing omniscience, whereas the surface marks the end of this scopic and epistemic exploration, limiting visibility and knowability. Therefore, the screen in naked-eye VR functions as a door: when closed, it stops visual and cognitive exploration at its surface; when opened, it reveals the image’s volumetric depth and unlocks speculative imagination beyond the screen.

Literature Review

The paper situates naked-eye VR within a broad humanities and media theory lineage. It draws on foundational work on binocular disparity and stereoscopy (Euclid; Wheatstone 1852/1962; Urey et al. 2011), and relates horizontal parallax to 3D cinema and VR design (Hattler and Cheung 2023; Aizenman et al. 2023). It adopts Elsaesser’s media archaeology and parallax historiography to connect pre-cinematic optical devices and post-cinematic media (Elsaesser 2019). Whissel’s framework of positive and negative parallax is extended from a technical to epistemic/affective register (Whissel 2016; Petkov 2012). Screen metaphors from film and media theory (Eisenstein via Andrew 1984; Bazin via Sobchack 1992; psychoanalytic apparatus via Metz 1981; Manovich 2002; Mitchell 2015) are mobilized to contrast frames, windows, walls, and the proposed ‘door’. Contemporary VR scholarship on framing and immersion (Sandifer 2011; Rogers 2019, 2023; Zhou 2023; Gao and Jin 2021) informs the discussion of framelessness and containment. Art historical references to Chinese scroll painting and viewing practices (Delbanco 2008; Wu 1996; Yan 2012; Wang 2022) ground the cross-media analysis, alongside case materials from Bilibili creators and uploads.

Methodology

The study employs an interdisciplinary humanities methodology combining media archaeology, film and screen theory, art history, and science/technology studies. It develops a parallax historiography that reads naked-eye VR as an intersection of pre-cinematic stereoscopy, pictorial traditions (Chinese scrolls), and post-cinematic VR culture. Methods include:

  • Theoretical synthesis of concepts of horizontal parallax, positive/negative parallax, and screen metaphors (frame/window/wall/door).
  • Close textual and visual analysis of exemplar naked-eye VR/3D videos on Bilibili, including adaptations of Van Gogh’s The Starry Night and the Chinese scroll Along the River During the Qingming Festival, as well as VR adaptations of Zhang Daqian’s Cloud Sea of Mount Hua and other genre exemplars (roller coaster, spacecraft, bullet-through-screen scenes).
  • Attention to platform-specific paratexts, notably real-time spectator comments (“bullet subtitles”) as evidence of viewing strategies (parallel-eye/cross-eye) and phenomenological reports (e.g., seeing stitch lines, ink dots, fabric textures; experiences of ‘dimensionality reduction’ and 3D illusion).
  • Media-technical reading of display formats (stereopairs, autostereoscopic cues, panoramic stereo, stitch lines) to foreground how binocular disparity is made visible or concealed in naked-eye VR. The approach is qualitative, interpretive, and comparative across historical media forms, aiming to articulate “naked-eye VR” as a parallax medium and to propose the ‘door’ as an operational metaphor.
Key Findings
  • Naked-eye VR functions as a parallax medium that knots together two incommensurable perspectives: the objective technical basis of two flat, offset images (horizontal parallax/binocular disparity) and the subjective phenomenology of volumetric depth, producing oscillation between 2D surface and 3D depth.
  • The paper introduces the ‘parallax door’ as a screen logic distinct from frame/window/mirror: the screen operates as a threshold whose opening/closing alternately enables immersion into depth or halts perception at the surface. In naked-eye VR, frames reappear and disappear, marking this threshold.
  • Spectators’ bodies are enlisted as “parallax machines.” Through parallel-eye and cross-eye techniques, viewers fuse stereopairs without HMDs, rendering depth cues explicit and making the construction of 3D perception reflexively visible.
  • Case analyses (Van Gogh adaptations; Qingming scroll VR) show concrete traces of parallax and material surface co-present: stitch lines between spherical images; visible ink dots in the sky and fabric textures of the scroll; doors/windows within the image acting as literal and symbolic thresholds.
  • Positive and negative parallax map to epistemic and affective registers in naked-eye VR: open doors/windows guide knowing and exploration into depth; protrusive effects (e.g., bullets, swirling sky) break the fourth wall and return the gaze, producing somatic shock.
  • A parallax media history connects stereoscopic devices and pictorial scroll traditions to contemporary naked-eye VR: scrolls already integrate panoramic surface scanning (x–y axis) with implied depth (z-axis), accommodating omniscient and first-person spectatorial desires; VR adaptations literalize this by moving from panoramic hanging views to first-person street-level exploration.
  • Unlike headset VR’s frameless promise, naked-eye VR foregrounds limits: the persistent rectangular screen frame and surface remain the endpoint of visibility and knowability even as volumetric depth is suggested; viewers can rotate/zoom but cannot exceed what the video provides.
  • The work challenges mirror-based identification: naked-eye VR often lacks an on-screen avatar; first-person traversal is presented in continuous takes without POV-reverse-shot identification, emphasizing the viewer as subject of the look without a mirror image.
Discussion

The findings address the central question of how naked-eye VR reconciles the flatness of pictorial surfaces with volumetric depth by framing it as a parallax medium and proposing the ‘door’ as the operational metaphor. By making the depth cue (horizontal parallax) perceptible and by staging thresholds (frames, windows, doors) that alternately open onto depth or close at surfaces, naked-eye VR foregrounds both immersion and its limits. This positions naked-eye VR at a historical juncture: it rediscoveries pre-cinematic aesthetics (stereoscope, scroll paintings) while anticipating post-cinematic VR culture, thus enriching media archaeology and screen theory. The analysis shows that spectator agency and embodiment are central; viewers become parallax machines through viewing techniques and interface interactions, negotiating epistemic desire (to see/know more) against constitutive limits of the frame and surface. This reframes evaluations of VR immersion by highlighting that framelessness is an effect contingent on thresholds rather than a given of devices, and it emphasizes cross-cultural pictorial lineages (e.g., Chinese handscrolls) that already encoded alternating omniscient and first-person perspectives. The significance lies in reorienting media studies from mirror identification and window transparency toward threshold operations and dimensional parallax as key to contemporary screen experiences.

Conclusion

Naked-eye VR marks a juncture in media history best understood through parallax historiography. It coexists with two- and three-dimensionality, simulates post-screen depth, and yet roots itself in pre-cinematic stereoscopy and panoramic traditions while circulating as post-cinematic video remakes. Unlike headset VR, it both simulates frame disappearance and acknowledges frames and depth cues typically hidden behind the screen. In examples such as the Qingming scroll and Van Gogh adaptations, the painting’s frame disappears to enable the illusion of entering the image, while the screen’s frame and surface persist as limits along the x–y and z axes. Naked-eye 3D highlights left/right frames and their boundary to provide depth cues, functioning as a door to dimensional parallax: closed, it establishes two-dimensional boundaries; opened, it anticipates and reveals volumetric space on and beyond the screen. The paper thereby clarifies naked-eye VR as parallax media and the screen as a ‘door’ mediating between surface and depth, known and unknown, seeing and being seen, and suggests future inquiry into broader case corpora, technical variations, and cross-cultural pictorial genealogies.

Limitations
  • The study is qualitative and theoretical, without empirical experiments or quantitative user studies; findings rely on interpretive analysis of selected examples and spectator comments.
  • The corpus centers on Bilibili uploads and specific artworks (Van Gogh, Qingming scroll, Zhang Daqian), which may limit generalizability across platforms, genres, and display technologies.
  • Technical performance variables (display hardware, viewing conditions, implementation details of autostereoscopy/panoramic stitching) are discussed descriptively rather than measured, which may affect the precision of claims about perceptual effects.
  • Cultural-historical comparisons (e.g., Chinese scrolls and contemporary VR) are illustrative and may not capture the full diversity of traditions or audience practices.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny