It is certainly not the first time that we have heard of artificial intelligences able to create images starting from a textual description. One of the most famous is OpenAI's DALL-E (irresistible the name that pays homage to Dalì and the tender Wall-E!), Which creates images starting from captions through a specially trained neural network. Too bad it is not available to the public.
There are other services, more or less effective in the production of images through AI and text interpretation, such as Hotpot which creates works of art starting from a description for commercial purposes ( among other things, it is possible to create NFT), but nothing seems to compare to what Google defines as a system capable of offering "an unprecedented level of photorealism and a deep understanding of language".
Who can resist a cat on a skate with an AI hat? - Source: Imagen
Researchers have created DrawBench, a benchmark consisting of 200 textual descriptions entered into Imagen and other comparison models, including DALL-E 2 and VQGAN + CLIP. The images were evaluated by a group of people who, according to Google, preferred Imagen to the other models, both for the quality of the samples and for the correspondence between text and image.
Imagen is not available at the moment. to the public, especially because the model, like many others, was trained on large sets of data obtained from the web and not entrusted to curators, consequently, some datasets have introduced unwanted elements, including pornographic images, bad language, negative social stereotypes and racism. Therefore, in order to prevent improper use of the model, the researchers have decided not to make it publicly available, at least for the moment.
In the future, the group hopes to be able to create a framework that allows for a responsible use of the model. model, able to balance external checks and the possible risks of open and unlimited access.
There are other services, more or less effective in the production of images through AI and text interpretation, such as Hotpot which creates works of art starting from a description for commercial purposes ( among other things, it is possible to create NFT), but nothing seems to compare to what Google defines as a system capable of offering "an unprecedented level of photorealism and a deep understanding of language".
Who can resist a cat on a skate with an AI hat? - Source: Imagen
Researchers have created DrawBench, a benchmark consisting of 200 textual descriptions entered into Imagen and other comparison models, including DALL-E 2 and VQGAN + CLIP. The images were evaluated by a group of people who, according to Google, preferred Imagen to the other models, both for the quality of the samples and for the correspondence between text and image.
Imagen is not available at the moment. to the public, especially because the model, like many others, was trained on large sets of data obtained from the web and not entrusted to curators, consequently, some datasets have introduced unwanted elements, including pornographic images, bad language, negative social stereotypes and racism. Therefore, in order to prevent improper use of the model, the researchers have decided not to make it publicly available, at least for the moment.
In the future, the group hopes to be able to create a framework that allows for a responsible use of the model. model, able to balance external checks and the possible risks of open and unlimited access.