Skip to main content

How to Get the World of Immersive Audio Just Right

Blurry abstract audio wave light effect background. EPS10 with transparency

Image Source: Transfuchsian/Shutterstock.com

By Jon Gabay for Mouser Electronics

Published February 2, 2021

Introduction

Of the five acknowledged senses, sound is perhaps the most appreciated by many. Whether from the richness of nature, the fullness of an orchestra, or the stirring of spiritual music, sound as much as any other sense profoundly impacts us emotionally and physiologically.

We exist today in large part because of our focused ability to listen. Ancestors who hunted would use sound to track and find food for a family or village. Avoiding predators was made possible by using sound to locate potentially lethal sources in a 3D, immersive way. Not only could we sense where something was, but also how far away it was, and how fast it was approaching.

We evolved because of immersive audio and the drive to perfect it continues today. Immersive audio is the most advanced audio processing and delivery system yet. To get it right requires audio processing and an array of well-characterized and calibrated woofers, speakers, and tweeters strategically placed to provide a dynamic full-spectrum listening sensation.

Although implementations on this level have been beyond the average audiophile of the past, movie theaters and performance venues have been pioneering and taking advantage of a lot of this technology for years. Like all pioneered technology, it eventually finds its way to the rest of the world.

Modern entertainment systems take advantage of the multitude of specialized filters and dynamic processing to create affordable implementations that fit more budgets. More home theaters exist today than ever before, especially in a pandemic world, and immersive audio is sure to be in gamer and home theater locations everywhere.

Audio Advances to Where We Are

It’s mind-boggling to realize that prerecorded audio playback systems have been available for over 100 years. Thomas Edison’s earliest wax cylinder phonograph-style players provided low resolution, low bandwidth, low volume, and crude reproduction of voice and music. The soft wax didn’t last more than a dozen playbacks.

Vinyl disc players spinning 78RPM, 45RPM, and 33⅓RPM had more longevity and fidelity, and the reproducible bandwidth was proportional to the rotation speed. This was amazing analog technology that showed experimental versions which could reproduce 150KHz frequencies.

That’s better than digital 44.1KHz, 44KHz, and even 192KHz sampling rates technologies available in modern-day digital systems. The vinyl dynamic range was only 78dB, which only translates into 12-bit resolution. Digital systems of 24- and 32-bit per channel are not uncommon today and is one category where analog technology can’t keep up.

Once electronic technology was introduced, amplification, filtration, and speaker drivers drove in the era of audio for everyone. Disc phonographs existed alongside radio, which really pushed personal audio into the world of listeners. For a long time, mono was all that was available.

It has only been within a single lifetime that monaural audio first transitioned to stereo. Stereo added a second source of sound. For the first time, the listening experience became a location-dependent experience that provided a certain level of depth to the listening experience.

The modest cost increase to stereo spawned the explosive growth record and playback systems leading to multiplexed amplifiers and component systems. Stereo satisfied the needs of broadcast, recording, and listening markets. It is simple to implement in homes, cars, and theaters and is a satisfying listening experience.

Something else happened, though. For the first time, clever recording and mixing engineers could create spatial movement in sound. A listener could hear an object on the left side move in space to the right side as the recording engineer panned both channels from one side to the next. Although it was a two-dimensional experience, it was the first time audio became somewhat immersive.

Enter the Immersive World

Although stereo allows a basic surround sound capability, the most popular surround-sound and 3D audio in use today is Dolby Digital 5.1 technology. Systems based on this are called Dolby Digital, Dolby Pro Logic II, DTS, SDDS, and THX. They all feature a six-speaker configuration (five full bandwidth, one subwoofer) surrounding the listener(s) (Figure 1). These surround-sound technologies were first used in movie theaters. This helped advance these systems and make them more cost-effective and available to the masses.

A 5.1 home theatre setup. With a subwoofer, centre speaker,2 front speakers and 2 back speakers.

 

Figure 1: Surround-sound 5.1 uses six fill range speakers placed at specific locations so that the audio process engineer can mix down audio that spatially seems to move around the listener. Not shown here is the woofer because it can typically be placed anywhere. (Source: Zern Liew/Shutterstock.com)

Multiple speakers are driven with unique individual audio streams so that the perceived location of virtual sound surrounds the listener. Here, rear left and right channels are used for spatial depth, front left, right, and center channels for lateral depth, and a single subwoofer distributes all the low-frequency bass for the entire room.

Although ideal for a single listener centrally located in a listening zone (or couch), every listener will experience slight differences everywhere else. The rather homogeneous sound will let everyone in the listening zone experience audio in motion. What’s more, recording artists are advertising their latest CDs as immersive by providing 5.1 surround-sound tracks.

Interestingly, the center front channel is optimized for speech range signals. This helps listeners discern conversations while immersed in 3D sound. As the bandwidth and fullness of sound became popular, the ability to discern speech actually became harder, so center channel filtering and amplification can make conversations easier to understand.

The addition of one more rear center channel ups the specification to 6.1 surround-sound (Figure 2), and 7.1 standard systems eliminate the rear center channel but add left and right mid channels (Figure 3).

A 6.1 home theatre setup.With a subwoofer, centre speaker,rear centre speaker, 2 front speakers, and 2 side speakers.

 

Figure 2: Surround-sound 6.1 provides lateral speakers to enhance the audio in motion as an audio object moves from front to side and back. Again, woofer placement is arbitrary. Here, it’s not about the bass. (Source: Zern Liew/Shutterstock.com)

The 7.1 surround-sound technology adds more speakers and unique channels. The 2/12-D cube or polygon that can be created can extend to more speakers, tweeters, and woofers at strategic locations immersing the listener in 2D and limited 3D audio (Figure 3). Something directly on top or bottom can be somewhat approximated through signal processing, but it will never be perfect unless real speakers are above and below.

A 7.1 home theatre setup.With a subwoofer, centre speaker,2 front speakers, 2 side speakers and 2 back speakers.

 

Figure 3: More speakers placed at lesser angles helps eliminate audio hotspots that can occur, especially if the tracks aren’t mixed or processed correctly, or if the audio converters don’t process the surround sound properly. (Source: Zern Liew/Shutterstock.com)

We should note that source converters from stereo capture can perform processing on the stereo audio signal to create synthesized multi-speaker surround-sound signals. This shows how digital signal processing can separate source locations mostly from a stereo source. The best solution would be to capture sound in a 3D microphone configuration then play it back in the same 3D speaker configuration. However, this is cumbersome, difficult, and most will not go to these levels when signal processing makes a good approximation.

Is this always the best approach? Can signal processing fool our keenly developed sense of hearing using fewer speakers, or will we continue to create walls and ceilings of sound?

Surround Approaches

We have seen that strategically placed speakers combined with filtration, equalization, digital signal processing, and matched amplification can create a very realistic surround experience. Still, soundbar technology in various forms is gaining popularity. The obvious benefits of cost reduction, setup simplicity, lower power, fewer cables, and smaller size drive this technology forward, even as we drive forward.

Phased array vertical soundbars have demonstrated their ability to emulate a full spectrum of audio with good clarity and separation. Musicians who use them will tell you that soundbar columns with six-inch speakers produce an 18-inch speaker’s sound clarity for subwoofer applications. That should turn a few heads. Horizontal soundbars and soundbar-based hybrid systems (also include remote speakers) are a popular choice for many home theaters and studios.

The up-and-down enhances this, and sideways-pointing speakers used to cause sound to reflect on wall and ceiling surfaces to appear to be coming from above or behind the listener. The modern-day Tesla Model 3 uses front soundbar technology as part of its 15-speaker audio system to tout surround and immersive audio capability. To show its capabilities, turn off a Model 3’s rear speakers, and engage immersive audio mode with signal processing and reverb. Those who’ve tried this swear sound is coming from behind. Feedback is mixed, and many don’t like the effect. Reviewers praise and criticize the technology, and many reviewers have mentioned that different types of music work and don’t work with sound-bar-style immersive implementations. This makes sense because the quality reproduced will greatly depend on the recording engineers’ mix-down techniques. Advances here will mean that true above-and-below immersive-sound technology is almost achievable without floor and ceiling speakers.

Today, our mindset is that more is better, and speaker walls are just too impressive to ignore (Figure 4). This approach is alive and well with audio buffs and musicians with a lot of time and wire on their hands.

young guitar player and speakers 3d background

 

Figure 4: More is better. Performers are used to exorbitant numbers of speakers and amplifiers. In large outdoor settings, it might be necessary. But do you really want walls of sound? Or at some point, do you realize that better sound is better than louder sound? (Source: tommistock/Shutterstock.com)

The most modern-day implementation of immersive audio comes from Dolby Atmos designed for theater applications. So far, almost 5,000 theaters have been retrofitted to use 64 speakers to take advantage of this latest audio listening experience. As such, it supports a large array of up to 128 channels and can be fitted with full bandwidth, low-frequency woofers, and subwoofers, as well as high-frequency tweeters. 

Unlike regular audio, Atmos (and the competing Sony 360 standards) uses the concept of audio objects. An Audio Visual Receiver (AVR) will automatically know the number of speakers, their type, their location, and perform processing on each audio object’s spectral makeup, amplitude location, speed, and direction. It is not just audio. The objects contain metadata that helps an Object Audio Renderer (OAR) put the object in motion. Of the 128 channels, 10 are used for ambient stems, and the other 118 are available for audio objects.

Not every channel is a speaker. Channel information corresponds to objects, and object audio can be processed and combined with other object audio to be both directed to each speaker at the appropriate level. It is up to the AVR to process the signals in real-time using metadata to perform real-time mixing and distribution of sound.

As you can imagine, it is not like stereo where you just place a couple of speakers, and you are ready to listen. With Atmos and many of the surround-sound and 3D sound systems, speakers must be placed and calibrated to be an accurate part of the soundscape. The average home will not use all 128 channels. The standard seems to be a 34-speaker arrangement for home theater implementations.

Atmos is not brand-new. It was first used in 2012 in a theater in Los Angeles for a Disney movie premiere. Since then, large theaters, IMAX, planetariums, musicals, plays, and other sound applications have propelled it into the defacto standard used to capture audio for new movies and events.

Atmos also uses ceiling speakers to fully create a hemisphere of audio, making it easier to process in real-time while providing sound from above.

Once much too elaborate and expensive for the average audiophile, Atmos is now moving into the realm of the got-to-have-it enthusiasts who have the space and budget to wow their friends. It is also rather high on the gee-whiz index.

If you have already bit the bullet for other surround-sound technologies, you can get a Dolby Atmos converter and still use your existing speakers and amplifiers. However, you will want more, including ceiling speakers. Converters will take Dolby 5.1 and convert it with a 17-speaker surround sound 7.4.1 implementation.

Other Issues

With appreciators of pure audio, localized sources of noise can be a source of frustration. In many cases, you can just turn up the volume. If the quality of speech and music and sound effects are in balance, it should not be much of an issue unless you have complaining neighbors.

For virtual-reality enthusiasts, it is possible for the manufacturers to put noise cancellation into the headset if they add the expense and work to implement microphones and more signal processing for each ear. This is doable but probably not economically feasible.

For multiple speaker setups and calibrations, tools and apps that automate the process are needed to not require an expensive installer. If setup instructions are easy and somewhat automated, everyone should take advantage of real surround-sound quickly and economically especially as more speakers are added.

Conclusion

Keep in mind that if you are a content creator–whether writing and recording music, screenplays, seminars, events, or simply in Zoom learning sessions–you will want to have a recording system that effectively captures the audio in a surround-sound format that most can use. So far, that is 5.1, especially because converters can take it up to the full 128-speaker systems.

About the Author

After completing his studies in electrical engineering, Jon Gabay has worked with defense, commercial, industrial, consumer, energy, and medical companies as a design engineer, firmware coder, system designer, research scientist, and product developer. As an alternative energy researcher and inventor, he has been involved with automation technology since he founded and ran Dedicated Devices Corp. up until 2004. Since then, he has been doing research and development, writing articles, and developing technologies for next-generation engineers and students.

Profile Photo of Jon Gabay