From Interaction Station Wiki
Jump to navigation Jump to search


While VQGAN+CLIP has its own aesthetic, a newer, "better" algorithm named stable diffusion has been released to generate images from a sentence. You can find a tutorial for stable diffusion on this page

Generate Images from a text prompt

In this tutorial we will create images by typing text.

If you want to do this from your home you can use this colab: https://github.com/justin-bennington/somewhere-ml/blob/main/S2_GAN_Art_Generator_(VQGAN%2C_CLIP%2C_Guided_Diffusion).ipynb How to use it is described here: https://docs.google.com/document/d/1Lu7XPRKlNhBQjcKr8k8qRzUzbBW7kzxb5Vu72GMRn2E/edit

However, it is much faster to work on the computers at the Interaction Station, where we installed everything you need on the pc's in WH.02.110.

Step 1: Boot the PC to Ubuntu

We need to start the computer in the Linux Ubuntu Operating System, so when you start the computer keep an eye on the screen. When it lists some options, select "Ubuntu". If the computer starts Windows, you missed it. Just restart the computer and try again. When it starts Ubuntu and asks you for a login, select user "InteractionStation", and type the password "toegangstud"

Step 2: Start the conda environment in a terminal

Click the "Show appplications" icon in the bottom left of the screen and type "terminal" in the search box. Select the terminal icon that pops up. This will open a black window that will allow you to type commands. type the following:

cd MachineLearning/VQGAN_CLIP-Docker

and then

conda activate vqgan-clip

You are now ready to run things but first we must modify the configuration file to your wishes.

Step 3: Modify the configuration file

Click on the "Files" icon in the top left of the Ubuntu screen and navigate to MachineLearning - VQGAN_CLIP-Docker - configs. Now open the file called "local.json" Some of the things you can change here are a bit mysterious but let's have a look at some that are interesting to us for sure


Here you can type what image to generate.
For example:
“prompts”: [“Wild roses near a river”],

1 wild roses near the river.png

You can enter multiple text prompts like this:
“prompts”: [“Wild roses near a river”, “Giger”],

2 wild roses near the river giger.png

It is also possible to give weights to each text. All weights should add up to 100. e.g.:
“prompts”: [“Wild roses near a river:80”, “Giger:20”],

3 wild roses near the river-80 giger-20.png

This will result in less Giger, and more Wild roses near a river

Choosing keywords wisely for your prompts can make a huge difference!
You can see this clearly from the grid posted here https://imgur.com/a/SnSIQRu

Image Prompts

You can also give the algorithm one or more image prompts. The computer will try to make images that are similar to the image prompts specified.
Copy the images that you use in the VQGAN-CLIP-Docker folder

A drawing in the style of Giger imgprompt 250 it.png

This was generated after 250 iterations using prompt: "A drawing in the style of Giger" and the following image prompt:


Max Iterations

Here you specify how many steps to take. More iterations will lead to more detail, and will take longer to process. 250 is usually a nice number to see if it is going in a direction you like. The more iterations the more detail, but this will also take longer to process.
A low number can be nice if you also specify an init_image, and will function like style transfer. “max_iterations”: 250,

Save frequency

This determines after how many steps the output image will be updated. So if you want to update your generated image after 50 steps put: “save_freq”: 50,


Here you specify the resolution of the image you are generating. Currently on our computers this can be a maximum of 576 x 320. If you specify higher numbers the scripts will crash due to out of memory errors. “size”: [576, 320],

Init Image

You can specify an image to start from. This will give you some control of where things will be placed.
“init_image”: “./hanhoogerbrugge.jpg”,
Copy the images that you use in the VQGAN-CLIP-Docker folder
Init image with few "max_iterations" will result in sort of a style transfer

A drawing in the style of Giger init img 80 it.png

This was generated after 80 iteration with prompt "A drawing in the style of Giger", and the image below as init_image


Step 4: Save the configuration and start generating

Hit ctrl+s on the keyboard or click "save" in the top right of the window. Go to the terminal window and type the following command:

python3 -m scripts.generate -c ./configs/local.json

If all is well stuff comes =scrolling by as the program is moving along until it's done.

Step 5: Check the results

The final image will be saved in the folder named "outputs" in VQGAN-CLIP-Docker. Whatever you put in the "prompts" will be used to name the fil! In the folder "steps" you can find the image per iteration. In case you did a lot of steps, plz delete these!

Bonus Step: Upscaling the Image

The website https://bigjpg.com/ allows you to upload your image and use machine learning to enlarge you image 4 x the original size for free.

Note to make life easier

if you are playing around and running the script again and again you don't have to type the command everytime. By hitting the Up Arrow on the keyboard you can scroll through the terminal history and thus select your earlier command.

Original Config file

You can find the original config file at https://github.com/kcosta42/VQGAN-CLIP-Docker/blob/main/configs/local.json