¡ 8 min read
How to use AI to improve your in-game voice-overs
Sarah Impey
Content Creator at GameAnalytics
Itâs clear that AI is entering every aspect of gaming. So where is the technology when it comes to voice-overs? Does it work? And is it worth using? We looked at the various tools out there to see the best ways to use it and whether the quality was up to scratch.
AI voice-overs wonât replace voice actors
First of all, we donât believe the quality of AI voice-acting is anywhere close to a real actor. Even AI tools that are focused on the gaming sector hover in that uncanny valley, where the voice just sounds robotic and stilted. Not awful. But it just doesnât have that same cadence a real person would give.
True, the technology will advance. But we donât see it replacing the need for a real person. For one, you need to model the AI after someone. But even then, an AI canât decide when to pause to show emotion or emphasise a point. It doesnât have that same awareness of the context of the situation.
A YouTube comment on Sonantic’s video about AI voices.Â
As models improve, itâll get better. But itâll never be perfect, and itâll take a lot of work from the developer to make it believable. A well-written scene, filled with character development and poignant moments, will always need a real actor to do it justice.
The results are rather stilted
We experimented with a few different voice-over AI tools to see how well they performed, such as ReadSpeaker, PlayHT, REsemble.AI, Lovo.AI, and Replica Studios.
Even just listening to the highlight reels on their websites, the examples sound robotic and somewhat lifeless. They might be passable for minor moments or tutorial text, but theyâre certainly not good enough for emotional scenes or believable characters.
Replica Studios’ digital voice studio.
There are more specialised tools, like Replica Studios, which allow you to change the emotion behind the text and adjust the settings. But even these fall flat when the text gets longer or more nuanced. Small snippets of text, like one-liners, tutorial hints or narration, can be okay. But some words seem to completely mystify the computer and it canât make the whole paragraph⌠flow.
So if the quality isnât up to scratch, whatâs the point of using it?
AI can speed up prototypes
There arenât many studios using AI for voice-over work. At least, not work thatâs out in the wild. It seems that most are using it to help speed up their development process, rather than using it for their final release.
Obsidian uses AI to make sure that the story is flowing properly and that the characters are behaving believably. And, as games become more and more customizable, itâs impractical to record those lines until the very end. AI can improve the quality of the prototype and testing build.
This seems to be a trend with most studios.
âWe use Replicaâs software to test scripts, dialogue, and gameplay sequences before engaging human voice actors to record the final lines,â said Chris OâNeill, the senior audio designer at PlaySide Studios.
Likewise, Ninja Theory said on X (Twitter):
âWe use this AI only to help us understand things like timing and placement in early phases of development. We then collaborate with real actors whose performances are at the heart of bringing our stories to life.â
This seems like a good way to think about AI in general. Use it as a placeholder or way to brief your creative team. It can help your director communicate what they want and speed the process along.
AI allows for âgeneratedâ content
There are already hundreds of thousands of lines of dialogue in modern games. Bethesdaâs Starfield has around 250,000 lines. Baldurâs Gate 3, even during early access, had well over 45,000 lines â and that was just the first act. Red Dead Redemption 2 reportedly had over 500,000 across 1,000 voice actors.
Games are just getting bigger and bigger. The main bulk of the dialogue probably wonât replace the need for human actors. But it can help tidy up the quality after itâs been recorded.
With so many lines of dialogue, itâs not always practical to record it all at once. Baldurâs Gate 3 has great writing and quality actors. But sometimes itâs clear the lines were recorded at different times. Using AI to just tidy it up and make it consistent could really help.
But thatâs just the written dialogue. The intentional dialogue. What players want is interactivity â to be able to talk to characters and have unique responses.
The next step is inevitably more âgeneratedâ or âdynamicâ dialogue. Dialogue thatâs powered by AI language models to respond to the player in real-time.
Replica Studios is already working on this, with their Smart NPCs plugin for Unreal Engine. And itâs pretty impressive.
AI will soon respond to players â and it canât all be acted
The idea is simple. Imagine you could walk around a world and talk to any NPC and theyâd respond like a real human being. It seems fantastical, but itâs within reach. We wouldnât be surprised if we see a game with AI NPCs in the next couple of years.
Replica Studios did a demo with Matrix Awakens using their Smart NPCs. Their official demo is a bit lacklustre, so hereâs a better example from YouTuber TmarTn2 trying it out.
As you can see, itâs pretty impressive. But janky. The novelty of saying anything to an NPC would likely wear thin after a little while and the responses arenât world-shattering. Mix in a real writer, coming up with scenarios and stories that the NPCs could draw from â and weâre sure itâll be mind-blowing.
The problem is that itâs all unique content. It needs an AI voice actor to speak the lines, because itâs literally impossible to record the dialogue.
We predict that studios will need to licence an actorâs voice to allow for this dynamic content. Pay the actor normally for the ârealâ dialogue and then an extra fee to model their voice for the generated content.
Sure, the generated content will never be as good as the parts the voice actor actually performed. But, you know what? Thatâs fine. As a player, Iâm willing to accept a bit of janky dialogue as an extra. I suspend my disbelief. It feels like the old days where the graphics werenât particularly good. After a while, your mind fills in the blanks.
AI could help accessibility
Text-heavy games are always a problem for those that canât read them. Whether the player is completely blind or just struggles seeing the tiny font â having a computer read out the text can be incredibly helpful.
Developers could use AI as a tool for accessibility. For example, you could have it narrate actions for blind people like âFrank enters the room.â Or just have it read out the in-game text and menus.
This is particularly useful for ports of old games. A game like Final Fantasy VII was purely text-based. Imagine Square Enix, when they ported it to PC, could just slap on an AI tool to read out all that text. Itâd open it up to so many more players.
Itâs possible to embrace AI and be ethical
If a developer wants to only use AI for their voice acting, itâs not really viable right now. Even in the future, itâs going to take a lot of effort to get to the quality youâd expect from an actor. Thereâs still a price to pay â time. For the most part, we imagine that developers will need a mix of AI and real people.
But how do we balance the two? Society, in general, has a lot to learn about how to work with AI. Regulations need to be set. Standards need to be made. Questions need answering.
With the right licences for voice actors, which pay them fairly for their talent, we can see a bright future for gaming. AI has the potential to become the private Game Master, helping run unique games for every individual player. Even if the voices do all sound the same.
But, then again, isnât that every Game Master?
If youâd like to stay in the loop about the latest news from the gaming industry, make sure you subscribe to our newsletter.