Editing Images Via A Prompt With Python And Pytorch


Hello! πŸ˜ƒ In this tutorial I will show you how you can use a pre-trained machine learning model to modify an image based on the user's input prompt. The model uses an image editing technique called "instruct-pix2pix" and is implemented in Python using the PyTorch module.

Well then let's get started. 😎


  • Basic knowledge of Python
  • A decent spec computer

Creating The Virtual Environment

First we need to create a virtual Python environment for the project. Open up the terminal and run the following command in the project's root directory:

python3 -m venv env

Next we need to activate the environment which can be done via the following command:

source venv/bin/activate

Next we need to install the dependencies. πŸ’«

Installing The Dependencies

To install the dependencies, open up a file called "requirements.txt" and add the following modules:


Next run the following command:

pip install -r requirements.txt

Now we can finally start coding! ☺️

Coding The Application

Next we can finally start writing the source code, open up a file called "main.py" and import the following:

import PIL 
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline
import argparse

Next, we need to initialize some constant variables:

MODEL_ID = "timbrooks/instruct-pix2pix"
#PIPE = StableDiffusionInstructPix2PixPipeline.from_pretrained(MODEL_ID, torch_dtype=torch.float16).to("cuda")
PIPE = StableDiffusionInstructPix2PixPipeline.from_pretrained(MODEL_ID).to("cpu")

Here we define the model to use. (in this case instruct-pix2pix) The repo for this can be found here: https://github.com/timothybrooks/instruct-pix2pix

We also initialize the pipeline, if your machine has a decent amount of GPU VRAM I highly recommend using the commented out line. My machine isn't that great of spec so I opted to use the CPU over GPU. πŸ₯Ί

Next we will create the main method:

def main(prompt, imagePath):
    image = PIL.Image.open(imagePath)

    images = PIPE(prompt, image = image, num_inference_steps = 30, image_guidance_scale = 1.5, guidance_scale = 7).images

    new_image = PIL.Image.new("RGB", (image.width * 2, image.height))
    new_image.paste(image, (0, 0)) 
    new_image.paste(images[0], (image.width, 0)) 


What this method does is open the image file from the image path that was passed to it, which will then use the pre-trained model to modify the image based on the provided prompt.

Finally we combine both the original image and the new image side by side so that we can compare them and then save the image to a file called "output.png".

Next we add the following in order to call the main method:

if __name__ == "__main__":
    ap = argparse.ArgumentParser()
    ap.add_argument("-i", "--image", required = True, help = "Path to image file")
    ap.add_argument("-p", "--prompt", required = True, help = "Prompt for image editing")

    args = vars(ap.parse_args())

    main(args["prompt"], args["image"])

All the above does is take an image file path and a prompt from the command line and then passes them both to the main method.

All done! πŸ˜„

You can try the program with the following command:

python main.py -i [path to image file] -p [prompt]

Depending on the spec of your machine you may need to wait a while for the image to be processed. If you run into any out of memory issues try decreasing the size of the image or the amount of num_inference_steps. πŸ‘€


Here I have shown how to edit images with Python, PyTorch and by using a pre-trained model.

I hope you learned something from this tutorial as much as I did writing it. πŸ˜†

You can find the source code for the tutorial via my Github: https://github.com/ethand91/python-pytorch-image-editor

As always happy coding! 😎

Like me work? I post about a variety of topics, if you would like to see more please like and follow me. Also I love coffee.

β€œBuy Me A Coffee”

If you are looking to learn Algorithm Patterns to ace the coding interview I recommend the following course