Behind the Scenes: How Film Dubs are Made More Convincing

Dubbed films are a great option for anyone who wants to watch a movie outside of the language it was filmed in. However, the audio is often misaligned with the mouth movement of the actors. Fortunately, there may be better dubbing systems coming soon. 

Stony Brook University collaborated with Amazon Prime Video on “LipNeRF: What is the right feature space to lip-sync a NeRF.” Our Stony Brook researchers include PhD candidate Aggelina Chatziagapi, PhD candidate ShahRukh Athar, and Professor Dimitris Samaras of the Stony Brook Department of Computer Science

LipNeRF is defined as a “novel method that performs lip-syncing in the expression space of a 3D morphable model.” In other words, LipNeRF uses 3D modeling techniques to make the actors in a film appear as if they are actually speaking the language it’s dubbed in. This falls under the field of computer vision. 

For example, consider a movie such as The Shining (1980) dubbed in Spanish. When a film is dubbed in Spanish, it’s usually just a Spanish audio overlaid with the original video. The result is unnatural looking mouth movements that don’t align with the words. 

With a system like LipNeRF, the original video is altered to make the actors’ mouths move to the dubbed audio. Therefore it is actually synced, and the dubbing is less obvious.

“We are targeting situations when you have an English video from a movie and you have the audio from a different language, and you want to lip sync the lips of the original video to the different language, let’s say Spanish. When you watch a dubbed movie, you always notice a misalignment of the lips, right? So, we want to make it more photorealistic,” says Chatziagapi. 

LipNeRF is unique in comparison to previous lip syncing technology because it aims to tackle the more difficult aspect of dubbing, that being the cinematic nature of film.

“In movies they use cinematic lighting– basically the face is half lit to give depth, so you don’t have this uniform lighting that we have on t.v., in the news, etc.,” says Chatziagapi. “Actors are also much more expressive– they want to convey their emotions. So all of this is much more challenging when you try to generate these deep fakes and make it look convincing.” 

Reeling back to The Shining, let’s ponder the scene when Jack (Jack Nicholson) is breaking into the bathroom that Wendy (Shelley Duvall) locked herself in. It’s an intense scene with a lot of movement and emotion. We see Jack on a terrifying, almost supernatural rampage as he swings his ax. On the other hand, we see Wendy’s build up of anxiety as Jack gets closer to breaking open the door. LipNeRF considers scenes like these. It aims to realistically dub the film while maintaining the powerful cinematic artistry. 

“This is a very important part of the paper we are working on: the cinematic data. Basically no one else has done this,” says Chatziagapi. 

Though 3D modeling allows amazing technology to bloom, it also brings up the ethical concerns behind deep fakes: AI-generated videos that are created using images of a person’s face to make it seem like they are saying or doing something that they actually aren’t. Essentially, this is how it appears that  Jack is mouthing “¡Aquí está Johnny!” instead of “Here’s Johnny!” While this is great for dubbing, it can be an issue regarding political slander, for instance. 

The LipNeRF research team acknowledges this. “It is important to develop accurate methods for fake content detection and forensics. In addition, appropriate procedures must be followed to ensure fair and safe use of videos if used for training or inference.” 

LipNeRF is more effective than previous dubbing systems because it tackles the challenges of 3D modeling in cinematic film, and there are developments in the making to improve it further. Additionally, the LipNeRF research team acknowledges the dangers behind deep fakes and urges proper detection of false media. 

For more information on the workings of LipNeRF and to view its video comparisons to other dubbing systems, visit the link here. These comparisons include speeches from movies such as Scent of a Woman and A Few Good Men

-Sara Giarnieri, Communications Assistant