Linux, 工作, 生活, 家人

ARM, Software, Ubuntu

How to Install Stable Diffusion GUI on ARM64 Nvidia RTX platform

In this article, it will walk through the process of installiung Stable Diffusion WebUI on ARM64 + Nvidia RTX platform.
For such as hardware combinarion, it needs to solve some compatiability issues, so, here will describe how to install it and fix those issues.

Platform

CPU: Ampere Altra ARM64 CPU
Board: AsRock ALTRAD8UD
OS: Ubuntu 22.04

The platform is the most powerful ARM64 platform for a homelab that we can obtain today. There are some server models available, but they come with high-speed cooling fans that are not suitable for home use.

Nvidia has provided comprehensive library support for the ARM64 platform, including drivers, CUDA, and cuDNN. However, despite these efforts, there are still gaps in support. Even Nvidia’s Jetson product line lacks adequate support for ARM64 platforms in certain applications such as PyTorch. While there exists an individual PyTorch accelerator package for direct download on x64 platforms, the same cannot be said for ARM64 platforms. Although there is technically one package available, its availability is severely limited, rendering it practically useless.

The only viable solution to overcome these limitations is to utilize Nvidia’s Docker service to install Stable Diffusion.

Installation

Install Ubuntu 22.04 without selecting the HWE (Hardware Enablement) support. It appears that the current Nvidia driver version has issues, leading to potential soft-lock bugs. A general kernel version (5.15) is preferable for stability.

Next, install the necessary Ubuntu packages.

sudo apt-get install build-essential

Download Nvidia ARM64 (aarch64) Display Driver , current version is ok, when I write this article, current version is 535.154.05

Docker


Install Docker by following the Docker guide Install Docker Engine on Ubuntu,”. Also provide the necessary installation steps here. If already installed Docker or if these steps don’t work, please refer to the original website for updated instructions.

# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Nvidia Container Toolkit

Follow the instructions for installing the NVIDIA Container Toolit. It can modify the Docker configuration using this toolkit. Alternatively, modifying the daemon.json file might achieve the same result, although I haven’t personally tested it. Below are the instructions:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

It can check the file /etc/docker/daemon.json file, the configuration should be like this.

{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}

Run Nvidia pytorch docker image

Following is example command to run the Nvidia pytorch docker image

sudo docker run --gpus all -it --rm \
        -v /host/work:/container/work \
        -p 7860:7860 \
        nvcr.io/nvidia/pytorch:XX.XX-py3

/host/work means work directory on host, /container/work means work directory in container(docker). -p 7860 mean gui default port, it needs to forward to host machine. XX.XX means container versiom please see tags to get container version, or this URL . [NGC Pytorch]. Following is example command.

sudo docker run --gpus all -it --rm \
        -v /work:/work \
        -p 7860:7860 \
        nvcr.io/nvidia/pytorch:24.01-py3

Stable Diffustion WebUI

First steps clone Stable Diffustion WebUI source code.

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git

Then, it needs to modify the settings file. Edit webui.sh, and adjust the parameters to the following numbers.

use_venv=0
can_run_as_root=1

As we are operating within a container, there’s no need for a virtual environment (venv), and the script can be run as root. Modify the file webui-user.sh and add COMMANDLINE_ARGS accordingly.

export COMMANDLINE_ARGS="--no-half --precision full --listen"

or run it directly

export COMMANDLINE_ARGS="--no-half --precision full --listen" ; bash webui.sh

Currently, Stable Diffusion will download some packages, but errors may occur during the installation process. We’ll address and fix these errors in the next few steps.

CV2

Error message :

    if __load_extra_py_code_for_module("cv2", submodule, DEBUG):
  File "/usr/local/lib/python3.10/dist-packages/cv2/__init__.py", line 28, in __load_extra_py_code_for_module
    py_module = importlib.import_module(module_name)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/usr/local/lib/python3.10/dist-packages/cv2/typing/__init__.py", line 168, in <module>
    LayerId = cv2.dnn.DictValue
AttributeError: module 'cv2.dnn' has no attribute 'DictValue'

Modify file “/usr/local/lib/python3.10/dist-packages/cv2/typing/__init__.py”, comment out line 169 like following.[ref]

# LayerId = cv2.dnn.DictValue

CodeFormer

Error message:

"/work/stable-diffusion-webui/repositories/CodeFormer/facelib/utils/face_restoration_helper.py", line 7, in <module>
        from facelib.detection import init_detection_model
      File "/work/stable-diffusion-webui/repositories/CodeFormer/facelib/detection/__init__.py", line 11, in <module>
        from .yolov5face.face_detector import YoloDetector
      File "/work/stable-diffusion-webui/repositories/CodeFormer/facelib/detection/yolov5face/face_detector.py", line 20, in <module>
        IS_HIGH_VERSION = tuple(map(int, torch.__version__.split('+')[0].split('.'))) >= (1, 9, 0)
    ValueError: invalid literal for int() with base 10: '0a0'

This issue caused by pytorch.version will return illegeal number like 2.2.0a0+81ea7a4, so it will fail to parse this mesage, temperory solution is to remove the version number, edit file “stable-diffusion-webui/repositories/CodeFormer/facelib/detection/yolov5face/face_detector.py”. and modify line 20, IS_HIGH_VERSION to following version to following:[ref]

version_string = torch.version.split('+')[0]
version_string = ''.join(filter(str.isdigit, version_string))
IS_HIGH_VERSION = tuple(map(int, version_string.split('.'))) >= (1, 9, 0)

Now, the Stable Diffusion GUI should be working as well as expected

發佈留言