The Visual Summary — WWDC23 Part I: Unveiling Apple Vision Pro

A sketchnote summary of how Apple will revolutionize Spatial Computing

Jonny Daenen
The Visual Summary

--

After speculating what an Apple AR Headset could look like, it has finally happened. As part of their World Wide Developer Conference (WWDC) on 2023–06–12, Apple introduced the Apple Vision Pro as part of the keynote’s “One more thing…” This device will be available in early 2024 in the U.S. for $3499 and is designated to kick-start the era of “Spatial Computing”.

In this post, I’ll give an overview of the newly announced product and have a look at how Apple materializes this so-called spatial computing. We’ll address the overall positioning (entertainment and productivity) of the device and then provide an overview of the external and internal parts. As a bonus, I’ll add my personal view on why I think using an external battery is a valid choice for this product.

Update: If you like this visual summary, make sure to check out part II and part III as well:

To kick it off, here’s my summary sketchnote:

A sketchnote overview of Apple Vision Pro announced at WWDC23.

The main takeaways:

  • Apple Vision Pro is an augmented reality headset, meaning it overlays virtual parts over your real environment. It’s not transparent but instead uses cameras and sensors to capture and display your real surroundings. It also projects your eyes on the outside by capturing them on the inside.
  • VisionOS: the new OS focusses on productivity and entertainment. It offers windowed apps and allows for immersive 3D experiences to watch video content or play games. Mac integration allows you to show a virtual screen that acts as an external monitor.
  • Input: no special input devices are needed; you use your eyes and hands to interact with the environment. Just watch and pinch, and use your voice or the keyboard (virtual or hardware) for input.
  • Content: Apple TV+ and Disney+ will be available from the start. The mindfulness app already offers a sneak peek at how fully immersive experiences can be created. You can even capture 3D photos and videos to create content yourself.
  • It has an optional external battery (or fixed power source), which many people find strange (I’ll elaborate on why I think that’s perfectly fine).

Apple Vision Pro: Augmented Reality with VisionOS

Apple Vision Pro is an augmented reality headset that — in its current form— resembles ski goggles when worn. It merges the real world with virtual overlays to augment your surroundings in an immersive manner.

“An immersive way to experience entertainment.” — Apple

Apple’s focus lies in improving two key areas: entertainment (movies and games) and productivity (apps). They introduce the concept of “Spatial Computing,” which offers immersive experiences that surround you, transcending the limitations of a traditional 2D screen.

Apple chooses an app-based approach, making several of their well-known apps available on the device.

Productivity: Control Apps with your Eyes, Hands & Voice

For the productivity part, Apple follows a similar “app strategy” as it does on its other devices, but this time in an augmented way using their new OS called VisionOS. Apple offers a productivity environment where users arrange their apps around them by resizing and moving them. Apps integrate with the physical environment by casting shadows and responding to light. And, of course, apps require interactivity.

Apple Vision Pro allows the user to look to focus on parts of the user interface and pinch with their hands to interact.

Instead of using additional input devices, interacting with the user interface consists of looking and pinching. The Apple Vision Pro allows the user to look to focus on parts of the user interface they want to interact with. Then, they simply pinch with their hands to trigger an interaction. You can even perform more complex gestures like zooming, dragging, and double tapping.

Eyes, hands, and voice are the main input methods of the device. Users are able to just look and pinch to interact with apps, without the need to raise their arms. No controllers are needed.

The Vision Pro uses eye tracking to understand what you’re watching at and uses external cameras to detect what your hands are doing. You don’t need to reach out to the non-tactile interfaces that float in front of you, but instead, you can keep your hands next to you and pinch.

You don’t need to reach out to the non-tactile interfaces, you can just keep your hands next to you and pinch.

For text input, users can leverage the dictation feature or use the virtual keyboard that floats in front of them. Of course, Siri is also available to provide assistance (for example, to open and close apps, or play media). It’s also possible to use a hardware keyboard as the Magic Keyboard is supported. And, if you need MacOS for ultimate productivity, you can bring your Mac to a virtual screen in the AR world. This creates a 4k screen to work on, just as you would with an external monitor.

Apps will be able to go beyond the standard 2D surfaces to provide fully immersive experiences; several of these were teased during WWDC. For conference call apps like FaceTime, Apple creates a digital persona of you to show to other participants.

When it comes to apps, some notable ones like Microsoft Excel & Word, Teams, Zoom, WebEx, and Facetime will be readily available. For conference call apps, the Vision Pro takes it up a notch. While it shows participants floating around you on your side of the call, they will get to witness a digital persona. The Vision Pro scans your face with it’s onboard LiDAR and cameras to bring a virtual avatar to life, animating your movements for others to see.

Finally, to support the creation of AR/VR content, Apple has announced RealityKit, which will allow developers to create content for the device and its applications. It’s also worth noting that VisionOS is like iOS, but with added components for real-time interactions, spatial awareness and special rendering.

Placing big screens in an environment brings content to life. Next to that, special support is available for panorama pictures and Mac integration.

Entertainment: Movies and Games

While entertainment seems to be one of the standard offerings of VR headsets, Apple goes beyond the standard expectations. For consuming content, like series or movies, the Vision Pro simulates a big screen cinema display. 3D movies, like Avatar, will provide the full 3D experience. Panoramic photos can be wrapped around the viewer, to create a deeper immersion.

Consuming content can be done in a so-called environment, which places your content in a virtual location, such as a like view. With the availability of Disney+ on day one, we can also expect some themed environments: imagine watching Star Wars from the seat of a sand cruiser surrounded by mountains on Tatooine.

Imagine watching Star Wars from the seat of a sand cruiser surrounded by mountains on Tatooine.

Apple went one step further and showed us a glimpse of how they see the future: court-side seats at a real-life sports game could redefine the way we watch them. Disney, from their end, showed us an interactive demo with characters from the Marvel Cinematic Universe, which brings us to gaming.

Apple Arcade, Apple’s (mainly mobile) gaming subscription service, will be available on the device with over 100 titles. In this case, the games can be controlled with a standard controller. At this point, it is unclear whether there will be more immersive games as part of their offering.

Finally, the device will also aid in creating content. The button on the display allows you to capture 3D pictures or videos, which can be replayed on the device itself. However, people seem to find it somewhat creepy to record others while wearing a headset.

While all these features seem like basic VR functionality, Apple adds their special sauce to it. Overall, I expect Apple to provide a polished experience and offer specialized content. If the latest rumors of Apple TV+ content being shot with special cameras are correct, we might be in for some truly immersive experiences.

The Design: What’s outside?

For the device itself, Apple has adopted a non-transparent screen. They essentially follow a virtual reality approach where they project the user’s surroundings on two displays, one for every eye. This approach gives Apple the ability have full control over the visual experience. Note that creating a solid virtual reality experience is far from straightforward, as minor delays can easily induce virtual reality sickness.

Packed with 5000 patents, the Apple Vision Pro has a modular design and borrows well-known components like a digital crown.

Examining the different components of the device, it comprises a front panel made of glass and metal, which houses two internal screens, an external (OLED) display, and two input elements: a button for capturing 3D photos and videos and a digital crown for adjusting the level of immersion between virtual reality (VR) and augmented reality (AR). We have seen these types of input elements before in previous products such as the Apple Watch and AirPods Max.

To ensure optimal immersion, the device features a light seal that comes in many sizes to prevent any light leakage. Additionally, a detachable headband is adjustable to fit securely on the user’s head. In some shots during the announcement, we saw an overhead band, which seems to be still under development.

On the side of the Vision Pro we find the “AudioPods”, which provide a spatial audio experience similar to the one of AirPods Pro/Max and HomePod. However, it remains uncertain whether the sound will be audible to those nearby. Adjacent to the headband, the device offers an option to attach either a battery, providing approximately two hours of usage, or to connect to mains power. (Further details on the external aspect of the battery can be found in the dedicated section below.)

Lastly, the glass panel in front introduces “EyeSight”: a depiction of the user’s eyes that allows other people to see them. When immersed in an environment, the eyes disappear, indicating to those around that the user’s vision is obstructed. When recording, the screen will flash, and if someone approaches the user while immersed, they “break” your immersion and pop through.

Personally, I don’t really see the value of having that screen on the outside yet (Marques Brownlee has similar thoughts on the matter). Plus, the eyes look a bit strange from some angles, giving me some uncanny valley vibes. I wonder which direction Apple wants to take this new feature and how/if it will appear in potential non-Pro variants of the device.

Performance: What’s inside?

Next to an M2 chip, the Apple Vision Pro also features an R1 chip, to enable real-time processing and ensure optimal latency. The device is packed with camera to track both your surroundings, your hands and your eyes.

As the Apple Vision Pro needs to efficiently and accurately track you and your environment, the front part of the Vision Pro is equipped with twelve cameras, five sensors, and six microphones. To handle real-time processing of all this data in real-time, Apple included a new chip called R1, next to the M2 chip. The new chip ensures that the time it takes for information to be processed and presented to the user is at most 12 milliseconds. This should greatly contribute to a more immersive feeling and should reduce virtual reality sickness.

On the inside, the device houses two micro OLED displays, both boasting 4K resolution with wide color HDR capabilities, resulting in over 23 million pixels. A 3-element lens covers both to shape the image. They are also surrounded by several cameras and LED illuminators to allow for high-precision eye tracking. For users with less-than-perfect vision, Apple has partnered with Zeiss to offer custom lenses, ensuring optimal visual clarity.

Regarding user authentication, the Apple Vision Pro introduces a new method (after TouchID and FaceID) called OpticID, which relies on iris scanning. Apple assures users that this iris scan data remains solely on the device and does not leave its secure environment. Additionally, it is important to note that the user’s eye movements cannot be tracked by applications running on the device. Apple’s operating system ensures that app input is only provided when the user actually interacts, such as through pinch gestures.

An External Battery?

Quite some people seem to dislike the decision of Apple to use an external battery. It’s regarded as a clunky or impractical solution, only offering two hours of battery life (some discussions here, here). I can understand that to some degree, for example, if you don’t have any pockets. However, from a functional perspective, I believe it can be a suitable solution.

First, as Apple is using more heavy components in their design (glass, metal), adding a battery to the device would make it too heavy (apparently, it’s already quite heavy as it is (avoiding plastic comes with these trade-offs). So, it really helps keep the weight down.

Having an external battery means you can easily swap it out and continue working.

Secondly, having an external battery means you can easily swap it out and continue working. In situations where freedom of movement is essential, encountering a “battery low” warning would simply necessitate replacing the battery pack, enabling uninterrupted work. This seems far more convenient than being forced to switch to a wired connection or having to recharge your headset first.

However, there are still some practical considerations to address to allow for battery swapping. If this requires shutting down and restarting the device, this would not be very practical. Instead, there are signs of a small internal battery for precisely this use case or the option to tether new batteries to an existing one. That means you can watch Avatar: The Way of Water without interruptions.

There are signs of a small internal battery and/or the option to tether batteries.

Thirdly, using an external battery offers potential for future battery advancements. By keeping the battery separate from the device itself, there’s more flexibility for Apple or third-party companies to develop batteries with longer lifespans. By avoiding integration within the device, design limitations related to shape, heat, and size can be overcome.

Of course, for this to happen, Apple needs to open up the MagSafe connection they are using on this device. But, given the modular design of the headset (headband, lenses, battery), it not be strange to think in this direction. At the very least, I’m hoping for clarity on tethering/charging the batteries while in use. This means there is hope for a “battery belt” (credits to Lieven Scheire from the Nerdland podcast)!

Conclusion & Thoughts

As part of their “One more thing…” section of WWDC23, Apple revealed how they see the future of Spatial Computing with the introduction of the Apple Vision Pro. Packed with tons of sensors and cameras and an OS that contains spatial and real-time components, the Vision Pro is set up to offer both immersive virtual reality (VR) and augmented reality (AR) experiences.

The VR capabilities of the device focus primarily on entertainment: Video and gaming. The AR aspect revolves around lifting apps from a 2D rectangular screen into your environment. Although these use cases offer “augmented” experiences compared to traditional 2D approaches, Apple also showcased more immersive demos, providing a glimpse into the potential of future applications.

Apple seems to envision a future where both the app and content realms come together.

Apple seems to envision a future where both the realms of apps and content converge, transitioning towards real AR experiences. One notable example is the Mindfulness app, which introduces a floating sphere into the user’s surroundings to induce relaxation. I expect developers to start creating more of these immersive experiences that go beyond mere 2D content and floating apps.

Having previously speculated about this device, I am eagerly looking forward to try it out. Apple has a strong track record of delivering exceptional user experiences, even it it sometimes sacrifices flexibility. Therefore, I have high expectations for the OS itself, hoping it will allow us to break free from the constraints of 2D screens and facilitate more natural human-computer interactions. However, I’m also curious about the device’s comfort for extended periods of use, especially in its current form for longer usage.

If you want more, I highly recommend watching the video from Marques Brownlee on MKBHD and reading the post from John Gruber on Daring Fireball. And, if anyone has an opportunity for a demo, I would definitely appreciate it if you could let me know your findings in the comments below!

A full-size view of the WWDC sketchnote image is available here.

--

--

Jonny Daenen
The Visual Summary

Data Engineer @ Data Minded, AI Coach @ PXL Next - Unleashing insights through data, clouds, AI, and visualization.