Nvidia Created a Face Swapper for Pets That Learns From Just a Few Examples

Most AI that manipulates or morphs images requires a large amount of training data to serve as a foundation for its abilities. NVIDIA found a way to train a model with only one input image of any pet and a few examples of another animal.

In a recent paper, Nvidia explains the abilities and purpose for creating this new approach:

While remarkably successful, current methods require access to many images in both source and destination classes at training time. We argue this greatly limits their use. Drawing inspiration from the human capability of picking up the essence of a novel object from a small number of examples and generalizing from there, we seek a few-shot, unsupervised image-to-image translation algorithm that works on previously unseen target classes that are specified, at test time, only by a few example images. Our model achieves this few-shot generation capability by coupling an adversarial training scheme with a novel network design.

Nvidia calls the training framework of their network “Few-shot Unsupervised Image Translation” (FUNIT) because it learns from only a few examples of an image class (e.g. beagles, polar bears) without direct human guidance. The significantly reduced necessary size for training data sets will give this method a huge advantage over current ones as it improves in the future. At the moment, it requires an unobstructed pet face to yield the desired results. While it does technically function with human faces, the results often fall comfortably into the “creepy” category.

Although Nvidia’s method only serves a very specific purpose at the moment, and its limitations result in notable flaws, their work shows the promise of much more impressive results a few iterations down the line. With open-sourced code available for anyone to build upon (and a public demo for everyone else), FUNIT’s quality has a greater chance of achieving usable results a lot sooner.

Looking toward the bigger picture, FUNIT fits in nicely with the category of problems Nvidia has approached with artificial intelligence. Along with turning objectively terrible sketches into complete landscapes and generating complete 3D urban environments for game development, it appears that Nvidia hopes to create a toolset that will greatly reduce the time and expense required to create video game assets. When character assets for AAA titles can cost around $80,000, game studios stand to cut costs by a significant amount with the help of artificial intelligence.

Of course, FUNIT will need more than a reasonable aptitude for dog-swapping before it can help make up an AI toolkit capable of generating detailed and dynamic game assets on par with human talent.

Image credit: Adam Dachis (translated results by Nvidia)