The Rise of Spatial Audio
Personalized Spatial Audio Technologies Listen to You
Image Source: Tony Baggett/Stock.adobe.com
By Jon Gabay for Mouser Electronics
Published November 27, 2023
The pursuit of better audio reproduction is seemingly never-ending. From Victrola's hand-cranked phonographs to the latest surround sound technology, listeners have looked to technology to improve the sounds reaching their ears. This quest to improve audio and listening experiences is taking a new turn with the latest implementations of spatial audio technology, which promises a better listening experience that is more immersive than ever before.
One of the most well-known 3D audio technologies is Apple Spatial Audio, with support for Dolby Atmos, which was announced in 2021. The next generation of this technology—Personalized Spatial Audio, supported by Apple iOS 16 and several hearables makers—is making waves…and not just sound waves. This personal audio implementation is making a splash with the audio community by tuning the audio features to a specific listener’s preferences and anatomy.
Apple Spatial Audio is not the only spatial audio technology; companies like Sony and Denon are also pioneering this technology and offering commercial products. However, this article will limit discussion to the concept of spatial audio technology in general and Apple's Personalized Spatial Audio.
Audio preferences have always been a personal experience. What sounds good to one person may not sound good to another. But with Apple iOS 16 support for Personalized Spatial Audio, there have been a lot of claims and misinformation propagating across many channels. This article discusses the present state and features of spatial audio technology.
Profiling Your Head
Personal audio comes down to physiology and the physics behind how the body functions; everyone is different. The spacing, locations on the head, shapes, and angles of our ears alter how we hear. For Personalized Spatial Audio, Apple uses the 3D TrueDepth® camera features of iPhones running iOS 16 to scan a listener's head in three dimensions.
The iPhone performs three scans: the left side of the head, the right side of the head, and the front of the face (not the inner ear canal, as some claim). The resulting profiles are unique to the individual and stored for use by the playback engine. There is concern that these profile data files can be obtained and used in advanced facial recognition systems. Apple says these files are safe and encrypted on your device and will not be used for surveillance and advanced facial recognition applications.
The TrueDepth scan parameters create an acoustic model that the audio-rendering engine uses to optimize the real-time audio stream presented to the listener’s ears.
But Wait, There’s More to It
In humans, when sounds are played into the inner ear, the inner ear resonates and responds with its own sounds. These come from the cochlea inside the ear and can be detected and measured. These sympathetic sounds, called otoacoustic emissions (OAEs), are discernably louder at frequencies to which the listener is more sensitive. Many hearables makers include sensitive microphones within the earbuds to detect otoacoustic emissions. A frequency sweep lets the Spatial Audio system profile the listener's hearing frequency response plot for each ear.
The system uses each ear’s frequency profile to tailor the audio to use the full spectrum by compensating for the frequencies the listener is less sensitive to. The resulting dynamically adjusted EQ takes advantage of the earbud emitters' specific angles to optimize the audio and spectral power for different frequencies, presenting the full audio stream frequencies.
The Sonic Sphere
Spatial Audio feels somewhat like a bubble of sound surrounding your head. The audio tracks take on a new personality, appearing not just from the left, right, center front, and center rear, as they are in surround sound from directional speakers. Instead, sound-emitting sources seem to be all around your head, and as you move your head to a "line of audio" (similar to line of sight) with these sources, they get louder and more pronounced. To achieve this, the audio track must be an encoded source track containing all the sonic sphere sound sources, their relative levels, and distances.
Uses for Spatial Audio
Spatial audio processing can be used for theatrical audio, movie audio, gaming audio, and health and fitness applications. Arguably, the most popular application is currently gaming—especially virtual reality (VR) gaming. VR headsets incorporate advanced and effective head tracking to ensure audio and video sync. Without fast and accurate head tracking, VR can make people nauseated very quickly; for example, when you turn your head, if the scenery doesn’t track in real time, the brain encounters issues. Because of this, the VR headset anchors the spatial audio engine so that as a head turns, the dominant source of sound from that direction comes through loudest. Other sounds also shift in position to track the head's position and rate of rotation.
Home theater spatial audio systems cannot perform this type of anchoring. For example, if you are sitting on a couch watching a movie, spatial audio may offer a reasonable approximation of surround sound as long as you are watching the center screen. But as you turn your head, the system will have trouble making side sounds more dominant. Machine vision cameras and artificial intelligence may help the system discern when you turn your head, but the technology is not there yet.
Accelerometers and gyroscopes used in some hearables can perform head tracking, but this is not a perfect solution. As a result, this relative head tracking will not react as quickly and accurately as an absolute head-tracking technology.
In all cases, including gaming, fast response time and low latencies are required so that as a listener moves their head, their line of audio responds with high levels of audio sources in front of them and with muted or lower levels of sources to the sides.
One possible solution for home and theatre use is to have everyone wear an immersive VR headset (Figure 1). This would work if the audio engine could simultaneously supply tailored audio streams to every individual, but this is a more expensive solution and makes going to the movies less of a social experience.

Figure 1: Using immersive VR headsets in a theatre setting allows spatial audio processor engines to take advantage of the high speed and accurate accelerometers and gyroscopes in the headset. (Source: Marija/stock.adobe.com)
Other Issues, Concerns, Technologies, and Uses
While spatial audio is purely a digital technology, there are setbacks when using earbuds. Smaller audio emitters limit the bass response. That is why bass amplifiers use more prominent speakers, woofers, and large sub-woofers. Bass depends on larger quantities of air movement, and smaller emitters don't do as good of a job as larger sonic emitters. Phased arrays have shown their ability to reproduce bass using smaller emitters at the right spacing to reinforce the power in the low end of the spectrum, but this is hard to do with earbuds.
Larger headphones usually employ larger audio emitters that can deliver better bass response. But headphones require different audio processing to reproduce the spectrum, especially with surround sound-like effects. Headphones use 360-degree head-related transfer functions (HRTF) filters. These adjust how sounds are played, so ear bounce makes it appear to perceive it at different locations and levels.
In any case, 360-degree audio technology has many potential applications and uses outside of gaming, theaters, and health and wellness. Already, white and pink noise machines help people sleep, relax, and reduce stress. A future application may be adding biometric sensors to spatial audio to record which frequencies and patterns help an individual relax, lower blood pressure, and fall asleep. Tracking alpha brainwaves could then close this feedback loop to enhance relaxed states.
This technology could also help deaf or hard-of-hearing individuals by acting as an assistive hearing device, though this is not currently a verified use. Musicians could use spatial audio to gain an ideal in-ear mix of the performance. Stage volume and venue mix are always different. The sound engineer can adjust the venue mix, but spatial audio can help performers hear better on stage.
For now, reviews of spatial audio are mixed; some love it, some are not impressed. This is a technology everyone will have to try and decide for themselves. After all, many users were unsatisfied with the expensive and complex setup for many surround sound systems, adding that background sounds overpowered dialog.
Conclusion
Spatial audio technology offers a more personalized and immersive listening experience by analyzing individual anatomy and physiology, leveraging unique audio profiles, and incorporating advanced audio rendering techniques. The applications for this technology span gaming, theaters, health, fitness, and more, and its future implications could revolutionize how we experience audio.