Videogames and cinema are merging.
Interactive experiences and straight-up Hollywood blockbusters still have their differences, but they both share one vital part – one part that’s at the middle of every good story that’s ever been told: the characters.
3D Artist goes beneath the skin with Remedy Entertainment, DI4D and Audiomotion to figure out exactly how motion capture is evolving to fit into this new mixed media industry, and how on earth you make digital doubles feel more human.
A year or so ago, there was a threat of a strike in the videogames industry: a strike organised by voice actors who felt that they weren’t getting the recognition that they deserved. Whilst the strike was never carried out, there was a shift in the way that things work: we’re now seeing actors themselves appear on screen, in one way or another, through motion capture and the creation of digital doubles.
Popularised by games like Call Of Duty (which featured a lifelike rendition of Kevin Spacey), Star Citizen and the most recent triple-A release Quantum Break – from Remedy Entertainment – we’re seeing more and more studios opt to make A-list actors take centre-stage in their games.
Motion capture and lifelike performances aid in storytelling, in letting the viewer/player associate with their on-screen avatar. It helps the actor get their skills out to an audience that otherwise wouldn’t see what they’re capable of. But most of all, it’s a great excuse for motion-capture studios and game developers to show off the new technology that they’ve been perfecting. But how exactly does it work?
“Most facial-performance capture systems either capture a single video stream, which is interpreted to drive a rigged character model, or capture the 3D trajectories of only a sparse set of discrete facial markers,” explains Dr Colin Urquhart, founder and CEO of DI4D – the motion-capture studio behind the newest game to focus on lifelike digital doubles: Quantum Break.
“By contrast, the DI4D facial performance capture systems acquire a ‘3D scan’ per frame using passive stereo photogrammetry, then a dense mesh is tracked through the sequence using optical flow. Every vertex in the tracked mesh then effectively becomes a motion-capture marker. The result is much higher fidelity data, capturing much more of the subtlety and nuances of facial performance and expression.”
As games and VFX-based TV shows continue to push for quality, delivering lifelike reactions from actors is becoming a priority, and the fairly vague ways of capturing facial data that used to be the norm are becoming obsolete.
“Many projects we work on involve ‘digital doubles’ – virtual versions of real people. In order to create a ‘digital double’ of a real-life person it is necessary to obtain a realistic static 3D graphic representation of them. In theory an artist could sculpt such a 3D graphic representation, but in practice some kind of 3D scanning process is required to obtain the necessary level of detail, realism and likeness,” explains Urquhart.
“We partnered with Dimensional Imaging early on for facial animation since we were concerned about the quality and turnaround time of the FACS rigs we had looked into during the greenlight stage of Quantum Break,” explains Antti Herva, lead character technical artist at Remedy Entertainment.
“Given the quality bar that we set after the criticised facial animation in Alan Wake, we knew we had to up our game and we knew that it would be hard to afford the facial animation polish bandwidth usually needed for FACS-based approaches involving digital doubles in games.”
Alan Wake was released in 2010 and Remedy Entertainment was already in the market for a new kind of mocap pipeline: this is a fast-moving industry after all, where time is money. To that end, capture solutions that can grab as much high-quality data as possible in a short amount of time are incredibly valuable.
“[Using DI4D’s technology] means there is no need to generate FACS shapes, which cuts down modelling time,” continues Antti Herva.
“Time needs to be spent tracking the point caches, though. But facial capture data lessens the amount of facial animation and modelling needed in projects involving digital doubles, with some restrictions on accuracy.”
Using this kind of workflow saves time, at the expense of some of the super-specific animation you’ll see in hand-animated scenes.
But the time that artists can save initially by scanning faces into DI4D’s pipeline means that there’s more time for animators to ‘fine-tune’ their actor’s face in Maya later on, as Henri Blåfield (senior technical animator at Remedy) describes: “Being able to augment the facial data with both animator and runtime-systems-driven eye direction animation was very important, and for this we built a soft tissue eye rig and controls to sculpt and tune some of the key shapes on the fly.”
Both Remedy and Urquhart warned us that messing with the data once it had been captured could impact the realism of the actors’ performance and you could risk undermining the work of the actor and the tech.
“The whole point of using DI4D is to capture as much as possible of the detail, subtlety and nuance of the actor’s actual facial performance, and to do little or no facial animation in post-production,” explains Urquhart. “If attempts are made to edit or adjust the performance in post-production it can be very easy to impact realism and believability, and risk falling into the uncanny valley.”
“When treading the steep edges of the uncanny valley we found that often very little extra facial animation polish was warranted, suggesting that less was more,” continues Blåfield. “Most of the work [in post] was iterating on the stabilisation of the point caches, the orientation of the character’s head to ensure proper alignment and ensuring lip contacts in plosive phonemes.”
But that’s not to say that once the camera systems have done their work and the data is exported that the job is done – there’s still need for an art team, and the animators still need to look closely at each frame of what they’ll be shipping. It’s useful to note, too, that the vast majority of mocap data is easily exported to the most common 3D software used in the industry.
“Artists usually use tools like ZBrush or Mudbox to retopologise and add detail to the static hi-res 3D mesh that is created using 3D scanning,” explains Urquhart. “They may often use tools such as Photoshop if they want to add extra detail from high-resolution photographs of the subject.”
Bearing in mind that Quantum Break was being shipped as an experimental project in the videogames industry – with part of it being live-action TV-inspired drama and the other part a game played in a real-time engine – the team at Remedy Entertainment had to get creative with how it shaped its workflow and utilised the assets that DI4D provided.
“The scale of the project suggested very early on that tracking throughout was key to our success,” Herva tells us.
“We experimented with having multiple levels of detail in our tracking meshes while upscaling the results, but found that using a standard tracking mesh density had the least overhead. Ultimately, we shipped just under five hours of facial data with four trackers working full time, hitting up to 30 minutes of data per week for in-game lines and 20 minutes for cinematics.
“We optimised the process with deformers that allowed us to track fewer points for the lips and eyes [for] content we knew would likely not be seen up close. Despite these optimisations, this approach guaranteed us an unprecedented motion quality tier throughout the game compared to a FACS-based approach where the rig blendshape complexity and solve fidelity are usually a lot more varied for various reasons.”
So what does the future hold for DI4D and the wider motion capture industry as a whole? We return to Urquhart for the answer: “The industry is always looking to capture more and more detail,” he explains.
“The ability to capture very fine detail – for example fine wrinkles at the corners of the eyes – is fundamentally limited by the resolution of available cameras. This is a particular issue if the cameras also need to be small and light enough to be mounted on a helmet.
“We have only recently been able to mount 2K cameras on a helmet, but in future it will be 4K or even 8K cameras, and the level of the detail captured will increase correspondingly.”
As for where the most focused improvements will need to be made, it’s actually the eyes – it’s quite hard for cameras to image the lipid layer and the transparency of the eyeball surfaces as clearly as they do with skin.
“Long eyelashes can make it difficult to capture the eyelids precisely,” Urquhart reveals. “Eyes themselves are shiny and have transparency, [which] makes them difficult to capture.
“It can be difficult to see into the mouth to capture the tongue and even if the tongue is visible it is very difficult to track because it tends to appear, disappear and change shape rapidly.”
Motion capture has long been an important tool in creating powerful stories in television, film and games, and because of the relationship it requires between actor, animator and artist, it’s defied the standard pipeline any film studio or game developer is used to. And it’s still evolving, still changing.
But now that we’re seeing more A-list stars actively request motion capture work to round out their portfolios, we can be sure of one thing: it’s only ever going to get more important.