OpenAI’s ‘DALL-E’ Generates Images From Text Descriptions
Artificial intelligence has gotten very good at some things — it’s even approaching the capability of people when it comes to recognizing objects and generating text. What about art? OpenAI has devised a new neural network called DALL-E (it’s like Dali with a nod to beloved Pixar robot WALL-E). All you need to do is give DALL-E some instructions, and it can draw an image for you. Sometimes the renderings are little better than fingerpainting, but other times they’re startlingly accurate portrayals.
OpenAI has made news lately for its GPT neural networks, which are sometimes referred to as “fake news generators” because of how well they can make up lies to support the input text. GPT3 showed that large neural networks can complete complex linguistic tasks. The team wanted to see how well such an AI could move between text and images. Like GPT3, DALL-E supports “zero-shot reasoning,” allowing it to generate an answer from a description and cue without any additional training. Unlike GPT, DALL-E is a transformer language model that can accept both text and images as input. DALL-E doesn’t need precise values and instructions like a 3D rendering engine; its past training allows it to fill in the blanks to add details that aren’t stated in the request.
Case in point: See below for some baby penguins wearing Christmas sweaters and playing the guitar. You don’t need to say the penguin has a Santa hat — DALL-E just comes up with that detail on its own in several renderings.
DALL-E also has a better understanding of objects in context compared with other AI artists. For example, you can ask DALL-E for a picture of a phone or vacuum cleaner from a specified period of time, and it understands how those objects have changed. Well, at least generally. Some of the images will have buttons in the wrong place or a bizarre shape. But these are all rendered from scratch in the AI.
That whimsical streak helps DALL-E combine multiple concepts in fascinating ways. When asked to merge a snail and a harp, it comes up with some clever variations on the theme. With more straightforward instructions such as “draw an emoji of a lovestruck avocado,” you get some artful and rather adorable options that Unicode should look at adding to the official emoji list.
The team also showed that DALL-E can combine text instructions and a visual prompt. You can feed it an image and ask for a modification of that same image. For instance, you could show DALL-E a cat and ask for a sketch of the cat. You can also have DALL-E add sunglasses to the cat or make it a different color.
OpenAI has a page where you can play around with some of the more interesting input values. The model is still fairly limited, but this is just the start. OpenAI plans to study how DALL-E could impact the economy (add illustrators to the list of jobs threatened by AI) and the possibility for bias in the outputs.