Dall-e, new challenges of Artificial Intelligence

Posted 11 de October de 2022

In Technological innovation and intelligent systems

This post is also available in the following languages: Euskara, Español

Create realistic images from naturally expressed concepts, such as “an astronaut on horseback” or “a bowl of soup that looks like a monster.” And anything you can imagine, however surreal. That’s what Dall-E 2 does, the latest development in artificial intelligence (AI) systems announced by OpenAI, a research and development company co-founded by Elon Musk.

In fact, we have seen similar apps and AI systems that generate images from text or keywords. But the images generated by the latest Dall-E demo do not leave people indifferent, thanks to its quality and realism, as well as its surreal style.

The name Dall-E combines the names of the Pixar character Wall-E and the surrealist master Salvador Dalí. The tool has just been enabled for the public recently, we just have to register and enjoy asking it to create all kinds of images for you.

The company shared examples of images that Dall-E creates by combining concepts, functions, and styles in a short sentence. Thus, the phrase “a bowl of soup that looks like a plasticine monster” would give rise to this image and its variations.

Image generated by the AI Dall-E when interpreting the phrase “a bowl of soup that looks like a monster made of clay.” Image: OpenAI

Whereas “a bowl of soup that looks like a monster knitted with wool” would result in this other image—and its variants.

Image generated by the artificial intelligence Dall-E in response to the phrase “a bowl of soup that looks like a monster knitted with wool.” Image: OpenAI

How Dall-E works

Dall-E’s neural network “already learns the relationship between images and the text that describes them,” the researchers explained. “Not only did he understand individual objects, such as horses or astronauts,” they said, but he also understood “how objects and motions relate to each other.” This is how Dall-E “knew” how to realistically depict astronauts riding horses. To generate the Dall-E image, it uses a process called “diffusion,” which first rearranges a pattern of random dots and modifies them until the desired result is achieved, creating a “map that didn’t exist before.”

For researchers, the development of Dall-E meets three basic conditions for the development of “useful and safe” AI:

It allows the public to express themselves in a way that was previously impossible. This reveals whether the AI system “understood” what was asked in writing or, on the contrary, if it simply repeated what it had learned.

It helps to understand how AI systems see and understand the world. Compared to the first version of Dall-E, released more than a year ago, Dall-E adds 2 new features and improves the understanding and quality and complexity of images, as well as the speed at which they are generated.

You can take existing photos and create complex variations, such as changing the angle and style of the portrait.

Allows you to edit an existing image to replace one object with another, add objects that are not in the original image, taking into account styles, shadows, reflections and textures. You can even change the meaning of the image.

In the academic context, it can be useful to be able to use in infographics and presentations images that fit in a high percentage with what we want to express and avoid tedious searches on the web to find images that may be subject to copyright.

Limitations on the use of Dall-E

Until recently, in addition to limiting its usefulness (now open under registry to the general public), OpenIA has imposed some restrictions on the use of its new AI models. These restrictions are intended to prevent harmful or abusive use of the tool.