Clothing animation

May 31, 2021

Did you know that Luxo Jr., the lamp in the Pixar logo, was first shown at a computer science conference? Even as Pixar has grown to become much more than a computer hardware company, the company continues to present work at SIGGRAPH. In 2020, Pixar previewed Loop, a short film that subsequently won best in show. The year before that, they presented a talk on light pruning in Toy Story 4.

Other famous (and not-so-famous) companies showcase work and research at SIGGRAPH – Microsoft, Electronic Arts, and NVIDIA to name a few. In a sense, it’s a chance to learn about how the shows we watch and the games we play are made.

Today, I want to talk about clothing animation. People animate clothes in one of two ways: using physical simulation or using keyframes. In the simulation approach, an animator tries to tune several parameters – the stretchiness of the material, the amount of air resistance, or the effect of gravity – then simulates how the clothes would fall on the character model to achieve a desired effect.

In the keyframe approach, the animator works out how the clothes should look at several key points (i.e., key frames) in an animation, and manually positions the clothes at each of those key points. They then rely on the computer to smoothy transition from one key frame to the next.

Neither of these approaches is easy. Simulation is hard because figuring out the exact parameters requires substantial trial-and-error, and these parameters tend to change over time. Keyframe animation is hard because clothing is complex, so one needs many, many keyframes to ensure that the transitions between keyframes don’t end up looking unnatural. For example, a 100-frame animation lasting mere seconds may take more than ten hours to complete.

Recent research tries to make clothing animation far more efficient, by reducing the work required by more than a hundred-fold.

To oversimplify, the researchers borrow from the simulation approach to make the keyframe approach far more efficient: by automatically figuring out the right simulation parameters that result in the look achieved in a given keyframe, one can then apply these parameters to the rest of the animation. Watching the demo video, the results are pretty darned impressive, at least to my untrained eye.

This unified approach allows for an animator to iteratively make improvements to the animation. One keyframe alone is sufficient for generating realistic clothing animation, but more keyframes can be added to get the clothes to move in specific ways.

What’s the catch? The system is designed to work with a single piece of clothing on a single character model. Switching clothes or models would require the system to be re-trained, with this process taking 50 hours (!). So it's more useful in the later stages of the animation process, and less useful when one is experimenting with different models or different garments.

Of course it's deep learning-based, where a model tries to learn a latent representation of simulation parameters independent of body motion from a given keyframe, and then apply these simulation parameters to other body motions.
The character model used throughout the paper is that of an anime character with two pigtail buns. Her hair is animated, even though it has nothing to do with the research. Perhaps this should have been expected, given that two of the researchers are from miHoYo, a Chinese game company founded by three students who "decided to develop games based on their love of anime".
Taking a macro view, this approach embodies a couple of good principles around improving an existing workflow: (1) it doesn’t require people to change what they’re doing very much or at all – an animator is still composing keyframes, and (2) it guarantees that some time will be saved – in the worst case, an animator needs to animate the same number of keyframes as before.