Issue 28

Audio & Video Challenges in eSports

Cam O’Neill’s whitepaper outlines the audio visual challenge presented by eSports events.


2 September 2020

is essentially competitive video gaming, and it’s one of the fastest-growing areas in live production. The genre has existed in some form or another for 10-15 years, but in terms of top-level, live global events that are on par with other professional sports, it’s relatively new. And that newness introduces challenges that, until now, have never existed.


Let’s start by taking a look at what a modern eSports event is like today. There are typically three different groups involved, which are fairly analogous to what you’d see at a traditional sporting event.

  • The first group is what I like to call the sports group. This includes the competitors, who are augmented by a support team that includes backup players and coaches, similar to any other professional sports team. And since there are rules that have to be followed, just like any other legitimate sport, there are also judges in place to enforce those rules.
  • The next group is made up of the audience, announcers and live production crew. The audience needs to see and hear what’s going on, so a screen and sound system is required. And to run the screen and speakers, we need a front of house position. The announcers are called shoutcasters in most eSports, because they maintain a high level of tension during the fast-paced event (and they’re generally screaming).
  • The last group includes the broadcast and streaming personnel. Streaming is much more important in the eSports world than broadcasting is, and in general, it’s managed in-house. In most traditional sports, you’d have a broadcast team that would roll up next to the stadium, plug in their equipment and create the broadcast. But because a lot of these eSports have come from the online streaming world, they’re used to making their own production themselves.


eSports typically fit into one of four categories, which we’ll talk about in order of technical difficulty.

  • Sports simulation games. These games simulate something real like a soccer game or martial arts match.
  • First-person shooters. In these games, you’re looking directly through a character’s eyes.
  • Top-down real-time strategy games. These are the strategy games where you’re looking down from a high angle.
  • MOBAs. These are multiplayer online battle arenas, or battle royales, and they generally pose the most difficulty in terms of production requirements.

So now that we’ve had an overview, let’s dive into each category of games and discuss potential challenges.


Sports Simulator games are based around a ‘real’ sport and typically have two teams of 3-5 players. All players can see the same screen and there is a balance between realism and game play.

Sports simulators are what a lot of people think of when they imagine an eSports event. Anyone watching can see basically everything that’s going on in the game, so it doesn’t really matter if you place the screen behind those players or in front.

The shoutcasters can be as loud and as active as they want, and the audience is similar to an audience in a stadium — it’s encouraged to actually call out to your team.

This is a fairly typical sports environment, and everyone from the broadcast world would be used to this kind of event.


In first-person shooter games, you’re looking through the first-person view of the character that you’re playing, but each player in the game has their own individual view. Because you have your own point of view of the map, you can’t see everything that’s going on, and part of the challenge is to sneak around behind the other team so you can attack them before they see you.

Because of that, communication between the players is critical. You have to explain what you’re seeing to your teammates so they know to come and help you.

Also, the game gives audio cues to players. If you’re walking along and you hear someone stepping up behind you, you want to turn around and shoot them before they can attack. So the audio challenge here becomes a lot more important.

Team A and Team B can’t see what the other team sees, so the first thing we need to do is make sure they can’t see the big screen. We also have to make sure that the audience can’t yell out to the teams on stage. Since the shoutcaster is explaining everything blow-by-blow, they have information before the teams on stage will because they’re able to see all the screens plus the individual view of each players. So, you need to isolate them from the main audio system.


Top-down real-time strategy games (RTS) were the first eSports, and are still the most popular. The defining factor is that you’re looking down on the map from a high angle, and usually controlling multiple units. Instead of first-person, which involves one character, you’ve got one person controlling an army.

As the commander, you can only see what your units can see. Everything else is obscured by what’s called the Fog of War. Different games have different rules about this, but in general, you can’t see anything that’s in the Fog of War. Everyone’s Fog of War is different, so even if you’re on the same team, you and your teammate may have different views.

Some of these games, like League of Legends or StarCraft, have been running for 10 or more years. Because of that, there’s a really deep culture and fan base that knows what’s going on and can understand what people are doing even when it looks like the game is moving fast. That makes the interaction between players and fans much deeper. But at the same time, it also means that there can be a lot more information gleaned if the players are able to hear the audience.


The last group includes battle royale or MOBA games — multiplayer online battle arenas. These are similar to a first-person game, but instead of 10 people, we have up to 100 players competing simultaneously. The two that you’ve probably heard of if you’re following any eSports are Fortnite and PlayerUnknown’s Battlegrounds. Both of them are called Battle Royale because you start with 100 players and the one who is standing at the end is the winner.

So, you’ve got a lot to follow, and it’s very hard to know which player is going to come out on top. It’s not like you can say, “Let’s just focus on the favourite and two other people,” because they’re moving at such a pace that things could change really quickly.

The only problem is, with 100 players, you don’t just have Team A and Team B anymore —you may have all the way down to Team Z. With four or five people per team, that means you may have 20 to 25 teams that need their own individual communications and audio, all set up in such a way that the producer of the event can access any of it.


No matter what category of game is being played, the first thing the player needs is their game audio. They’re also going to want to talk to their teammates. Then, we’ll have the judges on a separate system. So, everything on that side of the ‘audio firewall’ is what we want the player to hear.

On the other side, we have the audience and the shoutcasters. And the main sound system in the venue might not only transmit the shoutcasters and the game audio, but also have some other kind of event program audio.

The key challenge here is to isolate audio that’s on the other side of the audio firewall from getting to the player(s). Nowadays you’ll often see participants using big, heavy headsets that were originally designed for Formula One or airports. The problem is, the audio quality is so low that you don’t get the fidelity you need to hear the audio cues of the game.

So how are these challenges currently being addressed? We’ve tried a lot of things to isolate the players from the audience. Active noise cancellation doesn’t really work, because it’s designed to filter out low background noise while allowing speech to go through, which bypasses our point of having people yelling out things from the audience.

That’s why it’s common now for players to wear a set of in-ear monitors, then a set of headphones on top. The audio engineers will pump in white noise so it’s less easy for anyone to say anything to the players. But when someone is listening to white noise at a high level for a long time, there’s a health and safety issue; and secondly, it’s quite annoying and you lose the sensitivity to your game audio.

Another thing that we’re seeing is the intercom systems are now actually becoming an integrated part of the main audio mix. We often see big digital matrices that are mixing all the audio from the players and from the games, giving the players what they’re supposed to hear and making sure that they don’t hear what they’re not. Then we’re stem mixing from that into the main front of house mixer.


The judges, shoutcasters, front of house team and broadcast team need to be able to see all of the video sources in order to make educated guesses on where to make cuts. But latency is a big issue for players, so a lot of games now have software that will let you choose to watch a player rather than splitting off the video between their computer and their monitor.

With first-person games, the engine is built to give you a view through someone’s eyes, so you can have spectators basically playing in the game. These ‘virtual camera operators’ are a common part of eSports, because you can have someone moving around the map and showing you the most interesting perspective, rather than a player’s point of view, which is generally hopping around so fast that it’s impossible to follow.

You also sometimes have real cameras there. You’ll have face cameras or team cameras to show the audience what the team’s actually doing or what their reactions are. We usually have a face camera for each of the players going to the judges so they can watch everyone to make sure there’s no cheating.

The shoutcasters need to see everything so they can explain to the audience what’s going on, and then all of that has to be combined and then put into the web stream. And of course, if the player can’t see the full map, we don’t want them to see the big screen in any way, shape or form.

The challenge is finding a way to have all these diverse systems controlled by one system that makes it easy for the operators to switch between them, but at the same time, deal with all of these different video sources and demands on the equipment. No easy task!


Because of all these different requirements, we see a lot of innovation happening in eSports right now, especially in remote production. One of the poster children for remote production is NEP Broadcast. They have remote facilities in Australia and Europe, so the stadium basically just has camera operators and sound engineers. All the mixing, video switching, graphics and such is done at a centralised offsite location.

There are also some advances in areas like Google Arena. While it’s not at a point where it can be used in eSports yet, the concept is that you’d use Google servers to run the game with lower-powered computers at the venue running the audio and video only. This means that anyone could do an eSports event, or you could distribute it across everyone’s homes.

And it’s not fantasy. It happened last year in what’s called the Mid-Season Invitationals from League of Legends. Half the team couldn’t get into Vietnam — visa problems — and a lot of the equipment was stuck in customs, so they were forced to do a full remote production. The only things that were on site were the game machines, the game server and the local projection screens. Everything else was remote-controlled from LA.


That was a quick introduction to esports and the challenges that we face on the audio/video side. Like all sports, it’s all about fairness. We want to make sure the game is played by top-level competitors at the peak of their abilities, and they aren’t being distracted or influenced by anything. But as soon as that legitimacy is threatened, just like a doping scandal or cheating scandal, everything falls apart.

That’s why AV is such a key factor to the leagues and the publishers who are making the games. There’s a lot of detail put into the AV, and while it used to be the last thing anyone would think of at any event, it feels great to be involved in a space where it’s actually first and foremost on everyone’s mind.

MadisonAV ( Install Products): 1800 00 77 80 or


Cameron O’Neill is a 20-year veteran of the event industry, having worked at the Sydney Opera House and for Riedel throughout Asia. Recently, he has helped many eSports companies in China build their AV systems, including major events, installed facilities, and a major company’s network studio system. Cameron is a Sales Director in the Professional Solutions division of Harman.


Leave a Reply

Your email address will not be published. Required fields are marked *

More for you

Issue 28