A homebrew re-creation of OpenAI’s DALL·E model is the latest internet sensation.
What’s new: Craiyon has been generating around 50,000 user-prompted images daily, thanks to its ability to produce visual mashups like Darth Vader ice fishing and photorealistic Pokemon characters, Wired reported. You can try it here.
How it works: U.S. machine learning consultant Boris Dayma built Craiyon, formerly known as DALL·E Mini, from scratch last summer. It went viral in early June following upgrades that improved its output quality.
- Dayma fine-tuned a pretrained a VQGAN encoder/decoder to reproduce input images and to generate images that fooled a separate discriminator (a convolutional neural network) into classifying them as real images.
- He trained a BART, given a caption, to generate a sequence of tokens that matched VQGAN’s representation of the corresponding image. The training set comprised 30 million captioned images from public datasets that were filtered to remove sexual and violent imagery.
- At inference, given input text, BART’s encoder produces a sequence of tokens. Given that sequence, its decoder predicts a probability distribution for each successive image token. It uses those distributions to generate several representations for possible images.
- Given the representations, the VQGAN decoder generates images. CLIP ranks them on how closely they match the text, and outputs the top nine.
Behind the news: Fans of the word-guessing game Wordle may enjoy Wordalle, which shows players six images generated by Craiyon and asks them to guess the prompt.
Why it matters: Advances in machine learning are unlocking new ways for people to amuse themselves, from generating images of imaginary pizzas to making superheroes lip-synch popular songs. Enabling the internet audience to remix popular culture in unprecedented ways unleashes imagination and good humor worldwide.
We’re thinking: OpenAI says it controls access to DALL·E out of concern that people might use it to indulge their worst impulses. Craiyon’s deluge of delightful output is an encouraging counterpoint.