Make StableDiffusionXL 50% faster on RTX 4090

⚡AI Technology

The Problem

Pytorch2 has a lot of optimization improvements but the upstream version when running

pip3 install pytorch

will pull pytorch2.0.1 with cu117.

The problem with this is cu117 does not properly support newer GPUs like the RTX4090 or H100, infact cu117 wont even run on a H100.

Popular cloud gpu providers that are used for deep learning often carry the cu117 default, hurting performance.

The Fix

To fix this we should be targetting cu118 until its adapted in upstream pip packages.

This means upgrading to an image cuda version of atleast 11.8, this can be done in docker by using

docker.io/nvidia/cuda:12.2.0-devel-ubuntu22.04

as our base image. Or by upgrading our NVIDIA driver + CUDA version.

Also we need to pull pytorch with cu118 by doing

pip3 install torch==2.0.1+cu118 \
--extra-index-url https://download.pytorch.org/whl/cu118

We can grab torchvision and torchaudio as well

pip3 install torch==2.0.1+cu118 \
torchvision==0.15.2+cu118 \
torchaudio \
--extra-index-url https://download.pytorch.org/whl/cu118

The Reward

Enjoy an overall 50% speedup on all ADA LOVELACE workloads now!

-- cu117 -- 
68%|███████▏  | 34/50 [00:04<00:01,  4.88it/s]
70%|███████▍  | 35/50 [00:04<00:01,  4.88it/s]
-- cu118 -- 
72%|███████▏  | 36/50 [00:04<00:01,  7.78it/s]
74%|███████▍  | 37/50 [00:04<00:01,  7.78it/s]
76%|███████▌  | 38/50 [00:04<00:01,  7.78it/s]
78%|███████▊  | 39/50 [00:05<00:01,  7.78it/s]
80%|████████  | 40/50 [00:05<00:01,  7.78it/s]

Building an AI App?

Make StableDiffusionXL 50% faster on RTX 4090

The Problem

The Fix

The Reward

WINDOWS 10

LINUX OS

GITHUB