Cover photo

Is it AI art if a human designed it?

A generative artwork that both is and is not AI art

This article was previously published on fx(text).

I made an end-run around an AI generative art platform to release art that doesn't actually use AI for its image generation. However, because I couldn't entirely turn off the AI, it leaves a trace in visible image artifacts. Is the result AI art? Is it not? Should you collect it if you collect AI art? Should you collect it if you don't collect AI art but do collect code-based art? Read further to learn everything about this collection.

The long-form generative AI art platform EmProps recently opened its OpenMarket Beta, allowing a larger number of artists to explore the platform, mint their projects, and see what they can do within the framework of long-form generative AI. In many ways, the OpenMarket Beta feels like early days of fx(hash), with tons of releases, priced extremely cheap, many collections not minting out or minting slowly, and plenty of platform bugs that lead to slow reveals, occasionally missing tokens, and everything else you can imagine happens when a new platform first starts scaling.

I got my hands on a beta membership and started exploring also. And while I was exploring and talking to people, I was asked whether EmProps mandates that all released art must be AI. Thinking about this for a bit, I realized that even though the EmProps minting pipeline tightly integrates the AI model Stable Diffusion and we can't turn it off, we can actually choose Stable Diffusion settings that mostly (but not entirely!) make an end-run around the AI. And then, in combination with EmProps' ability to generate code-based images using the p5.js framework, we can make traditional code-based long-form generative art on EmProps. This line of thought let me to create and release the collection "Is it AI art if a human designed it?"

Here, I'll explain in detail the ideas behind this collection. To start, let's review the basics of generative AI art. There are two main approaches, text-to-image and image-to-image. In text-to-image, we provide the AI with a written description of the type of output that we want, and the AI generates an image based on this description. In image-to-image, we provide a written description and in addition we also provide an image that serves as a visual guide for the output. You can see in the example I have provided how in the text-to-image approach the AI freely chooses the colors and geometric arrangement, but when we use the same prompt in image-to-image the output is now very different and clearly constrained by the provided image.

Text-to-image and image-to-image generation modes of generative diffusion models. (AI generated images were created with Stable Diffusion SDXL v1.0 on the EmProps OpenStudio platform.)

Importantly, the image generation process is in part driven by random noise, which means the AI can generate many different outputs from the same input prompt. The EmProps platform enables artists to create long-form collections of AI art by creating many different images from the same prompts on the fly as the user mints the art. (EmProps also allows artists to use different text or image prompts within the same collection, for added variety, but this is less relevant for the present topic.)

In the image-to-image mode, there doesn't need to be any particular relationship between the type of prompt image used and the desired output. For example, above I used a photograph as prompt to generate abstract oil paintings. Similarly, I could use a hand-drawn sketch to generate photorealistic images. Because the AI can produce interesting outputs on the basis of virtually any prompt image, a popular technique is to use code-based generative art as prompt. The code-based prompt images don't have to be particularly complex or sophisticated, as the AI will handle textures and subtle color variations. This technique is particularly attractive for long-form AI art, where we can use a different, code-based image prompt for each output. I call this "generative image-to-image," and the EmProps platform supports this mode also. We can upload any sketch written in the p5.js framework and EmProps will run the code with a different random seed for each iteration, capture the generated image, and use it as image prompt for the AI.

Generative image-to-image generation available on the EmProps platform. The artist provides a text prompt and p5.js code, and the platform runs the code for each iteration, produces a different output image, and uses it as image prompt for the AI. (AI generated images were created with Stable Diffusion SDXL v1.0 on the EmProps OpenStudio platform.)

Whenever we use image-to-image, we need to set a parameter called "denoising strength" that determines how closely the AI generated image adheres to the image prompt. At a denoising strength of zero the output image looks nearly (but not exactly) identical to the prompt image. At a denoising strength of one, on the other hand, the AI has wide latitude to deviate from the prompt image. Colors, arrangements, textures, types of objects shown, anything can change with a denoising strength of one. The term "denoising strenth" may sound weird but it makes a lot of sense when you look into what diffusion models such as Stable Diffusion (and Midjourney, DALL-E, etc.) do. They generate novel images by attempting to successively remove noise from initially random input. So, a denoising strength of one means maximum amount of noise removal relative to the input image, which means maximum creativity. By contrast, a denoising strength of zero means no noise gets removed, and we get the input image back as our output.

Effect of the denoising strength. At a denoising strength of zero the output image looks nearly identical to the input image, whereas at a denoising strength of one the AI has wide leeway in its output generation. (AI generated images were created with Stable Diffusion SDXL v1.0 on the EmProps OpenStudio platform.)

You may wonder why I said that at a denoising strength of zero the output image is not exactly identical to the prompt image. In my example provided here, they certainly look the same! To understand this, we need to know a little more about how diffusion models operate. They all have a compressed internal image representation called the "latent space." This latent space does not store every single color of every single pixel of the output image, but instead stores something we can think of as a description of what the image should look like. You can imagine it providing descriptions such as "there are sharp blue-yellow edges in the bottom right of the image, and a green-blue gradient across the top, and something circular and red in the middle." The AI then uses a system called a decoder to turn this latent space description into the actual output image.

When Stable Diffusion generates images, it operates entirely in the latent space. Because the latent space is much smaller than a typical image (think about 50 times smaller), computations in latent space are more efficient. In fact, the idea of operating in latent space rather than on images directly was the main novel contribution of the original Stable Diffusion model. So how do we get our image prompts into the latent space? For that, Stable Diffusion uses a module called "encoder." It is the inverse of the decoder. The encoder takes an image and converts it into latent space, and the decoder takes the latent space representation and converts it back into an image. The combined system of encoder, latent space, and decoder is also called a "variational autoencoder" in the machine learning literature. You can learn more about this latent space and how Stable Diffusion works in detail from this article.

If you've followed so far, you should ask yourself whether any arbitrary image can be accurately represented in latent space. If the latent space is 50x smaller than the actual image, surely there must be something that gets lost in the encoding? This is indeed the case. While latent spaces are surprisingly good at encoding images, some fine details or subtle aspects of the image may get altered or distorted. This is very similar to what happens in lossy image compression methods such as JPEG. JPEG is great for storing images, but when you turn the quality slider all the way down to get a small file size you can end up with ugly artifacts. For more detail, see this Wikipedia article or the chapter on compression of bitmap graphics in my dataviz book.

Illustration of JPEG artifacts. (a) The same image is reproduced multiple times using increasingly severe JPEG compression. The resulting file size is shown in red text above each image. A reduction in file size by a factor of 10, from 432KB in the original image to 43KB in the compressed image, results in only minor perceptible reduction in image quality. However, a further reduction in file size by a factor of 2, to a mere 25KB, leads to numerous visible artifacts. (b) Zooming in to the most highly compressed image reveals the various compression artifacts. Reproduced from "Fundamentals of Data Visualization" by Claus Wilke.

So now we can understand what happens if we use the image-to-image generation method and set the denoising strength to zero. The image prompt gets encoded into the latent space, Stable Diffusion retains this exact latent space representation, and then the decoder converts it back into the output image. So in practice we haven't any done diffusion or image generation. We have just compressed the image using the AI's latent space.

When we combine this latent encoding step with traditional code-based long-form generative outputs, the result is code-based art that uses AI but the AI has no meaningful influence on the design of the final outputs. It just adds latent space artifacts. Conceptually this is similar to taking the code-based outputs and storing them as JPEGs with low quality settings. This is exactly what my art project "Is it AI art if a human designed it?" does. All the key visual features of the outputs, such as colors, arrangement, background textures, blurriness, etc., are decided by the p5.js code. The AI only adds minor artifacts. If you look closely, you can see that the AI contribution consists mostly in distorting small letter shapes and making some lines sharper than they were in the p5.js output. The following animation shows the before/after of one whole image and of various parts of the image zoomed in.

Unfortunately, on EmProps we cannot see the p5.js output for individual iterations, we can only see the final outputs. So you have to somewhat take my word for it that the p5 outputs look just the same. To make this argument plausible, here I have provided a set of ten outputs straight from p5.js. At first glance, the outputs look just like what you see on the EmProps platform for the final mints. However, if you zoom in, you can see that all the letters are properly shaped and look identical. There are no AI artifacts in these images.

Ten example outputs of the p5.js code used in "Is it AI art if a human designed it?"

There is one additional step in the AI pipeline that I haven't mentioned yet. Diffusion models typically generate relatively low-resolution images, on the order of 1000x1000 pixels. So to obtain images with the resolution we would expect for modern displays or for printing, we take the generated images and then upscale them, using yet another AI model. How exactly upscaler models work is beyond the scope of this article. Suffice to say that they aim to reconstruct what a higher-resolution version of the same image would have looked like, and this works well if the image looks similar to images the upscaler was trained on. For my particular artwork, I find that most artifacts are introduced at the Stable Diffusion stage, and the effect of the upscaler is mostly limited to making the overall image sharper.

Zoomed-in comparison of p5.js output, Stable Diffusion output before upscaling (labeled SDXL v1.0), and the final output after upscaling. We can see that Stable Diffusion introduces substantial artifacts, mostly by deforming the small letters, whereas upscaling just tends to sharpen the image, which tends to amplify the visual appearance of the artifacts introduced by Stable Diffusion.

Now you know everything you need to know to understand the collection. What you do with it is up to you. Is this AI art? Is it not? Should people interested in AI art care? Should people interested in code-based art care? Should anybody care? I'll leave this to you. As of this writing the collection is only about half-minted, so if you like it you can still mint your very own iteration right now, over here.

Loading...
highlight
Collect this post to permanently own it.
Claus Wilke logo
Subscribe to Claus Wilke and never miss a post.
#ai art#generative art#compression artifacts#emprops