Introduction
In today’s digital world, AI-driven text-to-image technologies have become increasingly important for businesses. Text-to-image technology enables people to create complex visuals from unstructured text, enabling them to communicate their message more effectively, showcase their products, or even develop AI-driven visual stories
In this tutorial, we’ll walk through the basics of text-to-image technology, from understanding the basic concepts to implementing it in practical scenarios. First, let’s look at the definition of text- to-image technology:
Text-to-image (T2I) is a type of artificial intelligence technology that uses natural language processing (NLP) to convert textual data into images. This technology allows users to communicate visually with the help of AI. Images generated by AI can be used for various purposes, such as for product showcase, marketing campaigns, storytelling, and more. The process is similar to Chatgpt3, but in this case the result of the text prompt is an image and not a conversation.
The process of creating a text-to-image solution includes gathering relevant data and inputting it into an AI-powered tool. This tool then processes the text and creates a custom image based on the data. There are various AI-driven tools available for text-to-image conversion, which we will describe below.
You can use a text-to-image prompting tool to structure the text prompt that you want to convert to a picture. Depending on the tool, you may need to provide additional information such as subject, language, or specific visuals. After providing the inputs, another tool can be used to process the text and generate the image.
Step 0 — Learning and tutorials
There are lots of guides, documentation, and tutorials available:
Stable Diffusion: Tutorials, Resources, and Tools
For the purpose of this tutorial, I will use Stable diffusion as an example, but all the concepts apply to the other AI models as well. I chose Stable diffusion because it is free and opensource. But I encourage you to experiment with the others if you want to compare. It is also possible to install Stable diffusion locally on your own machine, or a cloud server. But unless you have a very powerful graphics card which is nvidea compatible, and a lot of time and expertise, you are much better off starting with the process and services that I describe below. We won’t get into development tools like huggingface.com, which is for a much more technical audience. You don’t need them to experience the basics.
Here are the most popular models with easy to use front -ends:
Stable Diffusion (many free easy to use front ends — see below)
Midjourney — a little more complicated to get up and running, and requires a monthly fee for the paid version after the free trial. But this one is probably the best overall in quality.
Blue Willow — Similar to mid journey, and uses the same prompts, but it is free for now
Dall-E — Good quality, but requires you to purchase credits to use
Many other variants which have their own feature sets, and have different pricing models. I won’t cover them in this tutorial because there are way too many.
Step 1 — Creating the prompt
There are lots of guides which explain the prompts in more detail for each type of model. Here are some for Stable diffusion:
I. Promptmania is the easiest and least expensive (free) tool to use for crafting sophisticated prompts. I will describe how to use it for Stable Diffusion:
Main prompt builder sections
This is where you put in the main subject of your proposed image, whatever it happens to be. For example “a painting of a musician”. This is also where the prompt gets built above. It will happen automatically as you make your choices.
You can also add another image for reference or inspiration, but you will have to upload it to the web to a service such as imgur, and enter in the URL. I suggest you start off without adding other images.
Base image
Here, you would chose the class of image — a face, a sphere, or a landscape
Add details
Here you would chose from the various details such as medium, camera, color etc. As you chose options, you will get various sub-options to chose from, There are many alternatives. Don’t chose too many for now, and see what effect it has. You can always add more.
Styles from known artists
Here you can add an artist to base your style on. You can chose more than one, but don’t chose too many for now. This is optional. For example:
II. Another way to generate prompts is to ask Chatgpt3. For example, I asked chatgpt ”stable diffusion prompts for portraits”, and it gave me this:
“Here are some prompts for creating portraits using Stable Diffusion:
A classical portrait of a majestic monarch
An intense and powerful close-up portrait of a strong warrior
A captivating portrait of a beautiful mermaid
A striking portrait of a daring pirate
A haunting portrait of a mysterious wizard
An intimate portrait of a wise sage
A vibrant portrait of a romantic artist
A delightful portrait of a cheerful jester
A surreal portrait of a mysterious creature
A vivid portrait of an inspiring hero”
You can copy and paste any of these prompts into any of the services below. Or you can modify them any way you like. Here is a more detailed article on how to use chatgpt to generate prompts.
Step 2 — Entering and running the prompt
After you have built the prompt, it is time to try it out. I suggest using one of these services below for stable diffusion, in no particular order. Each one of them has tutorials on youtube which shows you how they work in more detail. But for now, we are just going to copy and paste the prompt that we created above, and watch the magic. You can experiment since they all have free versions. Eventually, you might have to pay but that will only be when you need lots of images or premium features. None of these require you having to register and learn Discord like Midjourney, which is a more complex tool. Stick with Stable diffusion, especially if you are a beginner.
Here is a tutorial if you need it. Its pretty easy to use and produces great results. Paste your prompt from step 1, or use one from a sample image
Paste in your prompt from step 1, and try it out. There are a lot of options to chose, and quality is very good. For now you can use the default settings here, or experiment if you are more adventurous.
Similar to the ones above, but it will let you do a text search thru the images. You can try out your prompt from step 1, or use one from an image that you like.
Similar to Lexica, it allows you to search existing generated images with text. You can use prompts from these images, or the ones that you generated in the previous step.
This one is similar to the others, but it also has a Photobooth (similar to Dreambooth) which allows you to upload your own photos to create a model with your personal pictures. This will be the subject of another article. I would stick to the main model and features for now. This site also has interesting preset templates which you can use for prompting image generation.
You.com search engine — This is yet another alternative. It has a search engine, which also combines AI image generation for SD and Midjourney. It is also a way to use Midjourney for free. It doesn’t have all of the features of the paid one, but it is simpler to use since you don’t have to learn discord in addition to a lot of the arcane commands. Its good enough for most beginners, and is a good way to try it out. The search engine is also really interesting. I talked about it in my previous blog post.
The nice thing about using free services like these is that you can experiment, and try modifying your prompt to see what results you get. Its also a good idea to try using negative prompts such as the one below to exclude weird and deformed images.
Sample negative prompt: disfigured, kitsch, ugly, oversaturated, grain, low-res, Deformed, blurry, bad anatomy, disfigured, poorly drawn face, mutation, mutated, extra limb, ugly, poorly drawn hands, missing limb, blurry, floating limbs, disconnected limbs, malformed hands, blur, out of focus, long neck, long body, ugly, disgusting, poorly drawn, childish, mutilated, , mangled, old, surreal
Step 3 — Edit, Upscale, download
Most of these services allow you to edit, upscale and download the image. Just hover over the generated image and you should see these options.
Step 4 — Display images in a gallery or sell in a marketplace
If you want to display the generated images publicly, or even privately there are lots of galleries available like deviantart, unsplash, wallhaven.cc, artstation.com and others. Many allow you to sell images as well. These images don’t include prompts.
I prefer to mint the pictures that I create as NFT’s. You have the option to list them for sale also for payment in crypto currency. I prefer to use these marketplaces to display the collection/images, even if I am not going to sell them. Here are the ones that I prefer most:
Opensea.io for (eth and polygon)
Rarible.com for (eth and polygon)
Solsea.io for Solana
As you can see that there are markets for each blockchain, but eth and polygon are the biggest in terms of traffic. Most of them don’t require a wallet for you to view the collections and images of various creators. You will only need a wallet if you want to buy (collect) or sell.
Conclusion
This has been a brief introduction to stable diffusion (SD), how to generate prompts and images. Its impossible to cover everything, but this a good place to start, and you can get more advanced as you improve. And new services and features are coming out at a rapid pace. Its hard to keep up, so I would stick to the major trends.
There are also lots of alternatives and approaches. You can experiment and find out what works best for you. You could even create a unique style of your own. Stable diffusion is a powerful tool that can be used in many different ways, limited only by your imagination. Above all — have fun.