Behind the system Wētā FX invented for ‘The Approach of Water’.
One of many hardest issues to do when filming a scene that options live-action and CG characters is guaranteeing you’ll be able to visualize how they work together collectively. It may be troublesome to position real-world objects within the appropriate display house in what is going to finally be a CG setting.
On James Cameron’s Avatar: The Approach of Water, Wētā FX was in a position to clear up this drawback–and permit the director to extra precisely body and block scenes that had each live-action and CG characters–by implementing an on-set real-time depth compositing resolution.
The goals of the instrument have been to supply for a real-time composite in digital camera in 3D house with pixel good occlusions, with out the necessity for bluescreen or greenscreen. This was made potential through a hyperlink up between two pc imaginative and prescient cameras calibrated with the primary cinema stereo cameras and the event of a skilled deep studying mannequin, made up of hundreds of artificial pictures generated from character and set scans, to course of the stereo pictures and generate a helpful depth map.
By way of Wētā FX’s proprietary real-time compositing instrument Dwell Comp, the ensuing depth map may very well be mixed with live-action pictures and composited along with CG parts appropriately occluded proper there for the filmmakers on set.
On this excerpt from difficulty #11 of befores & afters journal, Wētā FX digital manufacturing supervisor Dejan Momcilovic and senior researcher Tobias Schmidt break down the system for befores & afters.
b&a: I need to return to the start. What have been the earliest incarnations of a real-time depth compositing system at Wētā FX?
Dejan Momcilovic: On The BFG we had among the first makes an attempt at this by way of depth compositing and the right sorting of the objects. BFG was distinctive for us then as a result of we had a live-action character and a CG character with quite a lot of interplay. We have been capturing BFG, scaling him up, after which having him work together with the character Sophie. In some unspecified time in the future you could be wanting behind BFG, which meant we shouldn’t be seeing Sophie by him. We have been simply making it work by, say, attaching the plate onto his hand and placing it into the scene, after which compositing so he would correctly occlude her along with his. It was one thing we did to only make it work.
Quick ahead to Approach of Water efficiency seize, which began in 2017, on September twenty sixth–I occur to recollect the date! Even years earlier than that we have been speaking about quite a lot of these items and testing redundancy and digital camera monitoring. We knew that the entire thing can be heavy on our instrument Dwell Comp and we would have liked to replace all this tooling, however there was no resolution but for depth compositing.
Jim was fairly decided that he wished a depth comp and there was no such factor. We have been type of phasing out and in contemplating what may very well be finished and did a couple of exams with the Z CAM, however the conventional strategies weren’t quick sufficient and it was too straightforward to interrupt.
Then we began contemplating an AI strategy. The live-action shoot was to start on March/April 2019. We had an concept that wanted to be evaluated and we bought Tobias to look into it for us.
Tobias Schmidt: It was very attention-grabbing. We really thought of another approaches first, similar to simply detecting people to work out which one is in entrance or behind, however this is able to in fact solely work for people. In the long run, the depth strategy with AI was the holy grail. If that labored and labored quick sufficient, it could not simply be restricted to things, however might work out of the field for something.
b&a: The place did that concept and analysis take you additional?
Dejan Momcilovic: There was a paper that we based mostly our preliminary work on referred to as StereoNet: Guided Hierarchical Refinement for Actual-Time Edge-Conscious Depth Prediction. In some unspecified time in the future Toby mentioned, ‘I believe it might work.’
Tobias Schmidt: We have been instructed by a few specialists within the area that this can’t work, so it was type of good to disprove that.
Dejan Momcilovic: We bought one thing prepared and went to LA to indicate Lightstorm. That labored nicely, however then we additionally wanted to indicate Jim. [Senior visual effects supervisor] Joe Letteri had a laptop computer whereas they have been doing all this different type of setup and mentioned, ‘Hey Jim, take a look at this.’ It was a clip of me sitting there after which one of many Avatars hugging me. His arms have been in entrance of me, his physique was behind me. Jim was like, ‘What’s taking place…?’ He mentioned, ‘Alright, that is what I’m asking for!’ From there on it was bought to him. We by no means regarded again.
b&a: Take me by, in the end, what you devised to allow the stay depth comp?
Dejan Momcilovic: On set, we had two pc imaginative and prescient cameras that have been calibrated together with the stereo pair of cinema cameras. We put a grid in entrance of the cinema digital camera, it took a minute or two to course of after which you have got the calibration. This fashion we knew the connection between the cinema cameras and our pc imaginative and prescient cameras.
What we derive from the hero digital camera is the place or the relative place to our system. For different knowledge like the main target distance and zoom–as a result of many of the movie was shot on zoom lenses–we have been getting this knowledge streamed by a systemizer, a part of the Cameron/Tempo rig.
Our community was then skilled based mostly off of the views from these two cameras to generate the depth within the left or proper digital camera. We have been selecting and choosing which lens, relying on if we have been the other way up or above the digital camera. We had skilled fashions for all these configurations and that might be generated in one of many pc imaginative and prescient cameras, then it could be re-projected right into a hero digital camera. It then crammed the gaps that have been attributable to stereo shadows after which we might receive depth for each pixel of the hero digital camera picture.
We’d then ship this into our compositing software program or into our scene. If we needed to do transparencies, we needed to flip this right into a full 3D object so we might see by transparencies and so forth. Or we might do an easier ‘depth sorting’ in an exterior utility. We had Movement Builder with our renderer and all our scene in there after which we had this exterior module that was grabbing the photographs from the render and mixing them by the gap to the digital camera. That meant we might simply very effectively stick a hero picture with the complete CG content material and mainly intertwine them appropriately based mostly on the depth.
Learn extra in difficulty #11.
Change into a befores & afters Patreon for bonus VFX content material