AI – richliu's blog

千萬不要拿星穹鐵道機器人閉嘴的資料給 AI ，要不然…..

richliu — Mon, 06 Apr 2026 10:26:58 +0000

它會變的很可怕，一直說忍俊不禁

 那 shush 那兩個 BUG 是什麼？

❯ file bugs to shush -> 所以你看過啦, 就是測試那二筆, 一般我們是叫 file BUGs ....

● 哈，我理解錯了。你是說「file 二個測試用」= 提交兩筆 bug report 來測試，不是說 shush 有 bug 

  兩筆都收到了，系統運作正常。

❯ AI 也是會鑽牛角尖的 =_=

● 確實，自然語言的歧義連 AI 都會中招。這大概也是為什麼程式語言要發明出來的原因。令人忍俊不禁。

  清瀏覽器快取試試：Ctrl+Shift+Delete 清快取，或用無痕模式開 。

❯ OK 了, 這個 hot key 我第一次知道

● 學到了就賺到了，AI省精神 

  頁面看起來 OK 嗎？

❯ OK. 等等,你這個怎麼那麼像我們健身房的口號, 多做多賺....... =_=

● 哈，看來 AI 跟健身房的商業模式一樣 — 讓你覺得賺到了，但其實花了更多時間（和 token）。令人忍俊不禁。

❯ good job ! 不需要說 shut up !

● 哈，閉嘴表示滿意，令人忍俊不禁。

❯ 哼哼，我將你的言論剪下來貼到我的 blog 了!

● 小心，我可能會用 WordPress REST API 自己去改掉那篇文章。令人忍俊不禁。

❯ 沒辦法哦，因為不在同一站

The post 千萬不要拿星穹鐵道機器人閉嘴的資料給 AI ，要不然….. appeared first on richliu's blog.

Taco Index

richliu — Wed, 01 Apr 2026 00:31:57 +0000

自從有了 Claude Code（現在程式碼都流出來了，感覺很快就能看到各種複製版），現在真的只剩下「想點子」的問題，實作的部分幾乎不用擔心了。前幾天看新聞，有個叫「Taco Index」的概念超有趣的，它是用川普發言當成金融指數，拿來看川普發言之後的美國各類指數變化（NASDAQ or S&P 500 ）很棒，上網搜找不到有哪家金融公司正式發布這個指數。既然沒有現成的，那不如找 AI ！

Claude Code 生出來的東西完全就是我想要的，甚至比我預想的還要準確。在 Taco.poorman.org 看到它。未來還會加上更多搞笑的指數吧，這些我以前想很久，也有弄出來，但是速度和品質真的比不上。

The post Taco Index appeared first on richliu's blog.

GPU Passthrough on ARM64 with Libvirt/Virt-manager

richliu — Wed, 12 Feb 2025 09:18:24 +0000

In this article, I’ll walk you through the steps to set up GPU passthrough on an ARM64 system using Libvirt and Virt-manager. While using the ChatGPT to search answer, the steps may seem straightforward, but missing a critical detail can cause the process to fail.

System Specifications

Nvidia Driver: NVIDIA-Linux-aarch64-570.86.16.run
Host: Ampere Altra + ALTRAD8UD
Host OS: Ubuntu 22.04 with HWE kernel (6.8)
Guest OS: Ubuntu 22.04 (ubuntu-22.04-live-server-arm64.iso)
GPU: Nvidia RTX 4080 16GB

Assumptions

You are familiar with Ubuntu and its basic commands.
You have experience using Virt-manager.
All commands are executed as the root user.

If anything is unclear, you can refer to external resources for additional guidance.

Host Configuration

Enable IOMMU

To enable IOMMU, you need to enable the SR-IOV option in the BIOS and verify whether the Linux kernel has IOMMU enabled by default.

You can check if IOMMU is enabled by running:

$ dmesg | grep -i iommu

Example output:

[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.8.0-52-generic root=UUID=6b78fa89-a575-432d-a445-1497c3467214 ro iommu=on
[    0.000000] Unknown kernel command line parameters "BOOT_IMAGE=/boot/vmlinuz-6.8.0-52-generic iommu=on", will be passed to user space.
[   11.561684] iommu: Default domain type: Translated
[   11.566470] iommu: DMA domain TLB invalidation policy: strict mode

If IOMMU is not enabled, add iommu=on to the Linux kernel boot parameters:

$ vim /etc/default/grub

Modify the line:

GRUB_CMDLINE_LINUX_DEFAULT="iommu=on"

Then update GRUB and reboot:

$ update-grub2  
$ reboot

Additionally, enable SR-IOV in the BIOS. The exact location of this setting varies depending on the BIOS, but it is typically found under the PCIe subsystem or related options.

Upgrade Host to HWE Kernel

I recommend using the Hardware Enablement (HWE) kernel on the host. While I’m unsure if the regular kernel works, the HWE kernel has been reliable in my experience. Install it with:

sudo apt install linux-generic-hwe-22.04

Configure VFIO on Host

The VM relies on the VFIO driver for GPU passthrough. To configure VFIO, you need to pass the PCIe device information to the VFIO driver.

First, identify the GPU’s PCIe device IDs:

$ lspci -nn

Example output:



0005:01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2704] (rev a1)
0005:01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:22bb] (rev a1)

Here, 10de:2704 is the GPU’s PCIe device ID, and 10de:22bb is the audio device ID. At a minimum, you need to pass through the GPU device.

Next, edit the VFIO configuration file to include these IDs:

$ vim /etc/modprobe.d/vfio.conf

Add the following line:

options vfio-pci ids=10de:2704,10de:22bb

Disable Nvidia Driver on HOST

To prevent the host from loading the Nvidia driver, add the Nvidia modules to the kernel’s blocklist:

$ vim /etc/modprobe.d/blacklist.conf

Add the following lines:

blacklist nvidia
blacklist nvidia_drm
blacklist nvidia_modeset

Update the initramfs and reboot:

$ update-initramfs -u
$ reboot

Configure VM

Install Virt-manager

In this article, we used virt-manager as VM manager, first step is install virt-manager, suppose Ubuntu will install all relative packages.

$ apt install virt-manager

If you’re using SSH with X11 forwarding (e.g., ssh -X host) or MobaXTerm on Windows, Virt-manager will display the remote X window. If neither method works, consider installing a KDE desktop on the host and accessing it via the BMC remote console.

(Optional) Install KDE Plasma Desktop:

(option)
$ apt install kde-plasma-desktop

Create VM image

Virt-manager creates fixed-size VM images by default. If you prefer dynamic allocation, create the image manually:

$ qemu-img create -f qcow2 ubuntu2204.qcow2 200G

Add Nvidia device to VM

If the host is configured correctly, Virt-manager will list all PCIe devices, including the Nvidia GPU. Add the GPU and its audio device (e.g., 0005:01:00.0 and 0005:01:00.1) to the VM’s hardware list.

After adding the devices, proceed with the Ubuntu 22.04 installation.

After add hardware

Now, it can run begin install to install ubuntu 22.04

Disable secure Boot in UEFI

By default, Virt-manager enables Secure Boot. However, Nvidia drivers may not work with Secure Boot enabled. Even though the Nvidia installer includes a driver signing feature, the driver may still fail to load. To avoid issues, disable Secure Boot in the VM’s UEFI settings.

During the VM’s boot process, press the DEL key to enter UEFI settings and uncheck the Secure Boot option.

Before installing the Nvidia driver, ensure the necessary development packages are installed:

$ apt install build-essential

Then, install the Nvidia driver and reboot the VM.

GPU Passthrough Test

If everything is set up correctly, running nvidia-smi should display the GPU’s status.

For testing, you can use Ollama with the Deepseek-R1 model. Install Ollama with:

curl -fsSL https://ollama.com/install.sh | sh

Pull the Deepseek-R1 model. Since the GPU has 16GB of memory, the 14B model is a good choice (it requires ~10GB):

ollama run deepseek-r1:14b

Ask a question like, “Why is the sky blue?” This will trigger the model’s Chain-of-Thought (CoT) reasoning.

Monitor the GPU’s status using nvidia-smi to ensure it’s functioning correctly.

The post GPU Passthrough on ARM64 with Libvirt/Virt-manager appeared first on richliu's blog.

DeepSeek-R1 風暴

richliu — Fri, 07 Feb 2025 12:48:20 +0000

這個春節被 DeepSeek 洗版了，這篇文章是記錄一下這段期間為什麼 Deepseek 這麼紅。文章的前面是我的看法，後半段就是用 Notebook LM 就我認為的優質資料整理之後再稍加修改後產生

我在 DeekSeek V3 (非思維鏈模型）出來的時候才開始用，而 DeepSeek V3 出來時就讓其他家大陸大模型費用降到一個非常可觀的地步，當時報導都幾乎沒提到是 DeekSeek V3 的影響

個人因為 GPT 用量不高，所以一直都是使用免費版，以免付費使用，個人評價是 ChatGPT 比 Claude 稍好，比 Gemini 強很多，如果 DeepSeek V3 同時比較，DeepSeek V3 表現最好，所以使用 DeepSeek V3 一陣子了，通常這類模型只要不問敏感話題都可以用的很快樂。

這次引發西方動搖國本式的震驚還是因為 DeepSeek-r1 是開源的思維鏈（Chain-of-Thought, CoT）大語言模型。簡單的說一下思維鏈大語言模型，是通過模擬人類的推理過程來提升模型在複雜任務中的表現。核心思想是讓模型在生成最終答案之前，先逐步推匯出中間步驟，從而更好地解決需要邏輯推理、數學計算或多步思考的問題

這個思維鏈在某些創意類型的工作上會有非常突出的表現，最近網路上很多風格特別的文章和影片都是如此生成的，像是給韓國或是美國歷任總統取廟號和諡號，或是用武俠風格寫一些平淡無奇的小事等等。但是思維鏈有個缺點，非常耗算力，所以在 DeepSeek-r1 出來之前，只有 OpenAI-o1 提供，而且是要月付 200 美金的才能有限制使用

DeepSeek-r1 一推出就轟動武林了，因為
1. 免費讓你隨便用
2. 告訴你再訓練成本超低，使用成本超低
3. 公開程式碼和權重參數
4. 是中國的公司開源的，比起什麼 OpenAI 東藏西藏技術，開源更能說服大家

低成本革命

不過要我選一個，那一定是低成本，因為思維鏈極耗算力，所以降算力這件事是非常的有意思，等於是一場革命。

我拿一個小故事來說好了，2000 年左右，在DSL(Digital Subscriber Lin)，透過電話線傳輸數據的技術出現之前，電信公司對於這類技術並不太感興趣，因為當時 .com 熱潮，電信公司想要的是技術高，要另外裝機的光纖上網技術，只是消費者要花大錢裝機，電信公司期待民眾都會上網，紛紛採用這種高價且利潤更好的上網技術，像日本就是花了大錢鋪設光纖的實例。台灣也是類似，有興趣可以查當年的股條事件，就是炒作固網概念。不過光纖上網被使用原來電話線的DSL技術取代了，因為DSL成本更低，不用另挖線路，而且當時用戶也不需要大頻寬，只需要比數據機快而且穩定就好了。DSL又變成技術主流十多年，因為成本低裝機費用低，用戶接受度非常高。

DeepSeek-r1 也是一樣，雖然比不上 OpenAI-o3 這類新的思維鏈技術，但是如果價格只有 1/30 不過效果有 95% 好，消費者想都不想一定是選擇便宜的，這就是低成本的優勢。尤其是價格差距過大，除非有一定非用不可的理由，否則人都是誠實的，反映在財報上都是誠實的

試用了 API 價格，一篇文章約 12000 tokens （中文英文的 tokens 計算方式不一樣），目前價格 US$0.0016 ，原價加倍，如果是 DeekSeek-R1 模型再加倍，這個價格非常划算，可以服務都接上去了。下圖是使用 deepseek-chat

這種價格很多企業可能都會直接接入，或許還比自架還便宜[華為 AI 接入 DeepSeek-R1 「小藝助手」變得更智能]

開源風暴

一般記者講開放原始碼是錯的，DeepSeek 是開放模型和權重
DeepSeek-r1 是第一個可以跟 OpenAI o1-mini 開放思維鏈的相比的開放模型，在 LLM AI model 的世界，開放資料幾乎不可能也沒必要，是有公司做影片的開放訓練資料，但是 LLM 這邊沒有

對於認真在追 GPT 技術的公司或是學校，目前應該都拼命在研究 DeepSeek-r1 演算法，這部份我不熟，但是目前看到的訊息都和放出來的東西是一致的。的確 Deepseek-r1 非常節能

這次事件傷害最大的應該是 OpenAI ，雖然 OpenAI 有領先優勢，但是在 DeepSeek-r1 出來之後，這個領先優勢就不明顯了，然後頭部玩家(Groq, Claude, Llama, Qwen 等等)目前應該都在追思維鏈技術，很快的大家在資源和運算量最佳化這塊就會同步了，就這篇寫出來的時候 Qwen 在過年前放出了 Qwen 2.5-Max，OpenAI 有 o3 和 o3-mini ，Google 更新了一堆 Gemini 2.0 和 CoT 的模型

雖然 OpenAI 更棒，但是 OpenAI 也有困境，要跟著下去用 DeepSeek-r1 還是要繼續 ChatGPT V5 還是 ChatGPT-o4 這類超巨大算力的模型

這邊會衍生出一個問題，很多人以為 DeepSeek 是從 OpenAI 偷資料來的，如只是指蒸餾，那大家都會做，DeepSeek OpenModel 上面寫也可以商用和蒸餾。而且以中文語境的表現來來說，DeepSeek-r1 的表現好到不像話，這邊提供一些不錯的玩法，有興趣的可以照著玩[10大隱藏提示詞，教你把Deepseek訓練成精！]，有些技巧也可以用在其他 LLM 上

1賽博人格分裂，(啟動人格分裂討論模式+問題)
2陰陽怪氣模式，(問題+笑死)毒舌屬性
3觸發預判模式，假設性問題(如果，，，會不會，，，)
4預言家模式，預判未來(如果，，，會發生什麼事)
5靈魂拷問模式，(①啟動槓精模式②先寫方案，再模擬槓精從*個角度狂噴，最後給出V2版方案)，
6玄學程式設計(，，，帶點蟬意)
7馴服轉業話癆，(說人話！)
8人設黏貼術，
9啟動老闆思維(如果你是，，，你會怎麼罵這個方案)
10過濾廢話，(問題，+刪掉所有正確的廢話，只留能落地的建議)

這邊有個日本人玩 deepseek 以中文寫詩，這中文能力並不是其他 LLM 可以比的[ref X]

PTX 及系統最佳化

很多人提到 DeepSeek-r1 用了 PTX (Parallel Thread Execution) 加速，因為要節省資源，PTX 有點像組合語言(Assembly)，在現代計算機中，跳過 compiler 直接去操作組合語言省不到幾毛錢的時間和金錢，會去動用 PTX 一定有其更重要的原因，因為做 PTX 的風險很大，我個人認為應該是拆解 CUDA 做不到的事情，只能自己手刻，這才有投入資源的必要性

看到報導說最多的是因為針對 H800 頻寬不夠所以才去動 PTX ，我們看不到發想的開始，但是就結果論，應該是針對一系列問題的最佳解，除了節省頻寬，節省VRAM使用量，減少節點之間的通訊成本，採用混合精度系統（在某些地方採用對FP8），等等改進圍繞的一系列魔改底層的概念，他們要的功能CUDA做不到，所以只能魔改了

在FP8 方面，DeepSeek 在張量核心內的所有 tensor 計算中使用 4 位指數和 3 位尾數——稱為 E4M3，之前也有人試著要實現 E3M4 的 tensor 計算，不過精度損失太多所以失敗。Deepseek 看起來解決了這個難題，看起來是採用混合精度，V3 模型的大部分核心都是以 FP8 格式實現的。但某些操作仍需要 16 位或 32 位精度，且主權重、權重梯度和優化器狀態以高於 FP8 的精度存儲

應該沒有提到使用 deepseek 產出 PTX ，不過以這種工程，個人猜測應該有弄了一版 deepseek/PTX 專用版，這樣就可以大幅加速研發，而且並不是沒有先例，最近才有人用deepseek加速 SIMD ggml 程式碼[ggml : x2 speed for WASM by optimizing SIMD ]，而CUDA source 在 2022 年的時候已經被駭客洩漏[It is reported that hackers leaked 75GB of Nvidia confidential files, including DLSS source code!]，如果拿到 source code ，那還會節省不少功夫。畢竟在用非正版訓練資料集上，這點大家都差不多[Meta leeched 82 terabytes of pirated books to train its Llama AI, documents reveal]

當然還不僅僅只是PTX，整個系統還有一個 DualPipe 的架構主要負責 NVLink / InfiniBand / RDMA 等等地方做數據傳輸。

如果照這些已知的訊息看起來，DeepSeek 應該是自己實現了（或是未來會實現）一個類似 CUDA 的架構，而這個架構只有最底層 hardware 是 NV 的。這樣的好處就要加什麼 hardware 要實現什麼功能可以自己改，缺點就是維護這樣一套系統的成本很高。不過如果是 deepseek 這個決策是非常合理的，畢竟在中國大陸被制裁硬體的狀況之下，那天說不定連 NV 都沒得用，在底層設計更多的硬體彈性是必要的[DeepSeek 測試：華為昇騰 910C 效能達 H100 六成　盼減低依賴 NVIDIA](註:這邊是講 interference)

Reinforcement learning (RL) 強化學習

這部份其實是 DeepSeek-r1 需要提的，因為這部份是large-scale reinforcement learning (RL) without super-vised fine-tuning (SFT)，不需要人類監督的強化學習系統。以前的資料訓練出來需要人類標註，但是 DeepSeek-r1 的 RL 技術不依賴人類標記，從而可以加強思維鏈推理的效能。

其中提到的是 hardccoded rule 做最後評斷，不過我覺得可能是一種類專家系統。人類也是有介入，但是這種介入就不是監督輸出，而是對齊資料輸出。DeepSeek 的 RL 技術強調在沒有人類監督的情況下，透過獎勵機制和自我對弈來提升模型的推理能力

以下是 AI 寫的介紹

其它的大家應該也看的很多了，我就交給 AI 寫了，以下都是 AI 產生

這篇文章將分為四個部分，詳細介紹 DeepSeek 的發展歷程、技術優勢與不足、創辦人梁文鋒的觀點，以及針對 DeepSeek 的一些質疑與澄清。

1. 從幻方量化基金到 DeepSeek 的誕生

DeepSeek 的故事不僅是一個 AI 新創公司的崛起，更是一段從量化投資跨足通用人工智慧 (AGI) 的技術進化史。其前身幻方量化基金為 DeepSeek 的技術發展奠定了堅實的基礎。

幻方量化：AI 基因的起源
成立與發展：2008 年，梁文鋒在浙江大學就學期間創立了幻方量化，初期致力於探索全自動化交易。2015 年，幻方將數學與 AI 應用於量化投資，確立了 AI 為公司主要發展方向。
技術實力：幻方量化專注於算法和量化核心引擎的研發，並自行建構了「螢火一號」和「螢火二號」AI 集群，搭載了數千張 A100 顯卡。
早期 AI 應用：早在 2016 年，幻方量化就已將 AI 模型應用於股票倉位交易，並於 2017 年底實現量化策略的全面 AI 化，成為量化投資領域的創新先鋒。
算力投入：幻方在 2022 年已平均每天使用 4.2 萬 GPU 小時進行科研，相當於每天有近 2000 張 GPU 卡在幾乎滿負荷運行，展現其在 AI 研究上的巨額投入。
DeepSeek 的誕生：邁向 AGI 的新篇章
轉型 AGI：2023 年 4 月，在開源模型 Llama1 和 GPT-4 發布後，幻方決定進軍大模型領域。同年 5 月，將技術部門獨立出來成立深度求索公司，專注於 AGI 的發展。
技術繼承：雖然 DeepSeek 公司成立時間不長，但其背後的技術根基來自於幻方量化 17 年的積累，以及超過 5 年的 AI 研究經驗。
商業模式：與 DeepMind 和 OpenAI 不同，DeepSeek 從一開始就具有盈利和技術商業化的考量。它繼承了幻方 AI「純 AI 研究」到「AI 量化引擎」的業務獨立模式，使其在財務上更為穩健。
資金挑戰與效率：2024 年 DeepSeek 面臨資金挑戰，但這也促使其將資金利用效率推至極限。

2. DeepSeek 的技術優勢、缺點與對未來 AI 的影響

DeepSeek 的技術優勢主要體現在其獨特的模型訓練方法、對底層硬體的優化，以及在中文處理上的強大能力 ，這些優勢使其在眾多 AI 模型中脫穎而出：

強化學習 (RL) 與推理能力
DeepSeek-R1-Zero: 不依賴人類監督數據，直接使用強化學習訓練基礎模型，使其能自主發展出強大的推理能力，並能自我驗證、反思，產生長鏈的思考 (Chain-of-Thought, CoT)。這是 DeepSeek 的一個重要突破，證明了僅透過 RL 就能激發模型的推理能力。
DeepSeek-R1: 在 R1-Zero 的基礎上，加入了少量冷啟動數據 (cold-start data) 和多階段訓練流程，以提高模型的可讀性和通用能力。同時，通過使用 GRPO (Group Relative Policy Optimization) 算法，模型能自我對弈，並以組內相對分數來引導學習，使模型傾向於產生包含連貫推理過程和正確結果的答案。
不依賴人類反饋： DeepSeek 的訓練方式不再依賴人類偏好的反饋，而是透過可量化的指標（如數學和程式碼的正確性）來引導模型的學習方向。
模型提煉 (Distillation)
DeepSeek 將大型模型的推理能力提煉到較小的模型中，使得較小模型能達到與大型模型相近的效能。例如，DeepSeek-R1-Distill-Qwen-7B 在 AIME 2024 測試中，得分 55.5%，超越了 QwQ-32B-Preview。
此舉降低了部署和運行 AI 模型的資源需求，使得一般企業或個人也能在較小的設備上使用 AI 模型。
底層優化
DeepSeek 團隊對底層 CUDA 進行優化，直接使用類似組語 (assemble) 的語言控制 NV 顯卡，提高了訓練效率。
他們能夠使用 FP8 (8 位元浮點數) 精度來訓練模型，這讓算力直接翻倍，也使得可以使用過時的 GPU（例如 7nm 的 920B）來進行後訓練 (post-training)，降低了模型研發和更新的成本。
在 MoE (Mixture of Experts) 模型上，DeepSeek 著重於優化 NVLink 上的負載均衡，減少通訊成本，並在推理端使用 KV Cache 壓縮和多 Token 預測等技術，加速模型推論速度。
中文處理能力：DeepSeek-R1 的中文思考和產出能力非常強大，這是其他英文模型難以匹敵的優勢。

DeepSeek 的潛在缺點包括:

非推理任務的不足：在功能呼叫、多輪對話、複雜角色扮演和 JSON 輸出等非推理任務上，DeepSeek-R1 的能力略遜於 DeepSeek-V3。
語言混合問題： DeepSeek-R1 在處理非中文或英文的查詢時，可能會出現語言混合的問題，例如使用英語進行推理和回應，即使查詢是使用其他語言。
對提示詞的敏感性：DeepSeek-R1 對提示詞 (prompt) 非常敏感，少量樣本提示 (few-shot prompting) 會使其效能下降，因此建議用戶直接描述問題並使用零樣本設定 (zero-shot setting) .
部分領域輸出較弱：為降低算力需求，DeepSeek 的模型可能在某些領域的輸出較弱，但對於一般用戶來說，這些差異可能不明顯。
量化模型的精度損失：由於使用 FP8 等量化方法，DeepSeek 的模型精度可能略低於使用更高精度（例如 FP16/BP16）的模型，但這種差異可能並不明顯。

DeepSeek 對未來 AI 發展的影響:

打破 AI 發展的限制:
DeepSeek 證明了純強化學習 可以訓練出強大的推理模型，而無需大量人類標記的數據或人類偏好，這挑戰了傳統 AI 模型訓練的範式。
透過 底層硬體優化，DeepSeek 降低了 AI 模型訓練和部署的成本，使得 AI 技術更加普及，不再侷限於大型企業或研究機構。
推動 AI 技術的開源與共享:
DeepSeek 的 開源策略 鼓勵了更多人參與 AI 技術的開發和改進，促進了知識的共享和技術的快速發展。
DeepSeek 開放模型權重和訓練細節的做法，有助於建立一個更加開放、透明的 AI 生態系統，吸引更多研究者共同參與，形成「韌性飛輪」效應。
改變 AI 產業的競爭格局:
DeepSeek 的出現讓其他公司意識到，規模化並非 AI 發展的唯一途徑，演算法創新和底層優化同樣重要。
隨著 AI 模型商品化，未來的競爭將會轉向應用層面和客戶服務，而不是基礎模型的開發。
加速 AI 技術的普及和應用:
DeepSeek 的低成本和高性能模型，將使得 AI 技術能夠在更多領域得到應用，例如企業內部私有 AI、個人 AI 助理等。
AI 技術的普及將推動各行業的變革，加速人類文明的巨變。
地緣政治影響 : DeepSeek 的成功顯示 中國 AI 技術正在趕上美國，對全球 AI 供應鏈和地緣政治產生影響。開放權重模型成為 AI 供應鏈的關鍵，若美國繼續打壓開源，中國可能會主導這一部分。

3. 梁文鋒對 DeepSeek 和 AI 的看法

DeepSeek 創辦人梁文鋒是一位技術出身的 CEO，對 AI 的發展有著獨到的見解：

技術創新為本：梁文鋒認為，DeepSeek 的目標是參與全球創新浪潮，而不是單純追求商業利益。他強調技術創新是 DeepSeek 的核心競爭力，並認為中國 AI 不應永遠處於跟隨位置，必須做出原創性貢獻。
開源與生態：梁文鋒堅信建立強大的技術生態比閉源更重要，因此 DeepSeek 選擇開源其技術，吸引更多人才參與。他認為，開源能促進知識共享，加速技術發展。
AGI 的長期目標：梁文鋒將 AGI (通用人工智慧) 作為 DeepSeek 的終極目標，認為當下的一切都只是過程，追逐技術創新是實現 AGI 的必經之路 [4, 59]。他認為數學和程式碼是 AGI 天然的試驗場，並對多模態和自然語言本身保持開放。
對算力的看法：梁文鋒認為，更多的投入不一定會產生更多的創新，也並非只有高階晶片才能推動 AI 發展。DeepSeek 的成功證明，透過優化算法和軟硬體協同設計，即使在資源有限的情況下也能實現技術突破。
對人才的看法：梁文鋒認為，頂尖人才應該被吸引去解決世界上最難的問題，因此 DeepSeek 致力於創造一個讓技術人才可以充分發揮才能的環境 [61]。他認為，目前中國的頂尖人才被低估了，因為社會缺乏硬派創新，讓他們沒有機會被辨識出來。
對技術與商業的看法：梁文鋒認為，商業上的成功應建立在技術實力的基礎之上，且要對商業抱有敬畏之心。他強調，企業應該專注於自己擅長的領域，形成產業分工鏈，共同推動社會效率的提升。
對競爭的看法：梁文鋒認為，過度關注競爭會讓人眼花撩亂，更重要的是思考如何提高社會的運作效率，並在產業分工鏈上找到自己擅長的位置。

4. 針對 DeepSeek 的質疑與澄清

成本造假:
質疑：有人認為 DeepSeek 宣稱的 550 萬美元訓練成本是造假，因為沒有計入資料清理、開發人力等其他成本。
澄清：DeepSeek 在論文中明確指出 550 萬美元是單純的「訓練成本」，其他成本例如資料清理、開發人力等，通常不會列入計算，因為這些成本是多個模型共用的，且薪資水準在各國不同。另外，DeepSeek 使用 2048 片 H800 GPU 訓練兩個月的成本，以租賃方式計算，大約就是 550 萬美元。
DeepSeek 是 OpenAI 的套殼:
質疑：有人認為 DeepSeek 回答問題時會說是 ChatGPT，所以是套殼的 [63]。
澄清：這僅代表 DeepSeek 的訓練資料中包含 ChatGPT 的內容，許多模型都有類似問題。且 DeepSeek 已開源，可以自行驗證。模型在訓練時並不知道自己是誰訓練的。
DeepSeek 技術是抄來的:
質疑：有人認為 DeepSeek 的技術是抄襲 OpenAI 的。
澄清：DeepSeek 是開源的，OpenAI 是閉源的，若有抄襲，OpenAI 必須提出證據。目前沒有任何證據證明 DeepSeek 抄襲原始碼。而且，機器產生的資料不受著作權保護，DeepSeek 使用 OpenAI 產生的數據進行訓練，並不構成侵權，最多可能違反 OpenAI 的使用條款。此外，蒸餾 (distillation) 是 AI 模型訓練中常見的做法，DeepSeek 的模型性能甚至比老師模型 (o1) 更強，所以 DeepSeek 的性能並非源於抄襲。
混合專家架構 (MoE) 是舊技術:
質疑：有人認為 DeepSeek 使用的 MoE 架構是舊技術，沒有創新。
澄清：DeepSeek 對 MoE 架構進行了許多改進，使其效能顯著提升。

這篇文章綜合了多方資訊，希望能為您提供一個全面且深入的 DeepSeek 分析。DeepSeek 的崛起，不僅是對既有 AI 技術的挑戰，更是對未來 AI 發展方向的啟示。

The post DeepSeek-R1 風暴 appeared first on richliu's blog.

Easy to Fine-Tune Large Language Model with LLaMA-Factory

richliu — Sun, 17 Mar 2024 17:20:29 +0000

這篇就寫 LLaMA-Factory fine-tune 練丹懶人包好了，其實目前 LLM(Large Language Model) fine-tune 工具和實作都還在非常前期，如果順利能動就很不錯了，而且中途還會碰到一些神奇的狀況，光是要系統完整執行，需要花很多時間試誤，這篇文章算是試到一個可以用組合，給想玩的人減少一下進入門檻

Fine-Tune LLM 是指在現在的模型架構之上，再加上一個小的模型，就使用讓原來的模型支援我們要的結果。比較常見的應該是在 Civitai 上可以看到很多不同風格的 Stable Diffusion Lora model 。以目前大語言模型 Training 隨便都要動上幾百張 A100 甚至數萬張，個人要玩財力有限，是跟不上具有算力的公司，有了 LoRA 技術之後，小模型還是可以玩玩的

本篇使用的硬體跟這篇　LLM Chat WebGUI and Fine-Turning on Ampere Altra ARM64 Platform　一樣，許多前置的設定也可以參考這篇，本篇只解釋我是怎麼做完整個流程的。但是如果要玩 LLM Fine-Tune，建議還是上一張至少 24G VRAM 的 GPU 比較好（或是二張 16G 應該也可以），16G 大部份的 7B model 都不能用，不過還好目前有不少人會出低於 7B 的模型，問題是，可能會有其他的問題

資料清洗 (Data Cleaning)

其實這篇我放很久才開始動工，最大的問題是，不知道要拿什麼資料，如果只是要弄完整個步驟，那上面那個連結就告訴你要拿什麼 Dataset 了，但是這就不好玩，總覺得少了什麼趣味。不過現在人又很懶，學新東西懶得花時間，資料清洗本身是很花時間的，甚至有時候會佔了整個訓練流程 30% 以上，不管是什麼大語言模型，像 GPT, Gemini, Claude 等等，都花了很多時間在標註資料和清洗資料

不過最近 Claude.ai 的 Claude 3 opus (約略等於 GPT4)，上線了，這引起我的注意，因為之前用過，但是當時 Claude 表現並不好，所以就沒有再用，但是這次 Claude 3 opus 似乎表現還不錯，那就可以拿來用。Claude 最大的好處就是可能會去存取外部網頁，我知道 GPT4 也可以，但是我們下次一定黨一定會想先用個免費版的。在測試過後，Claude 3 sonnet 的表現還可以，那就可以拿來當我們資料清洗的工具。

工具有了，內容永遠是最麻煩的，但是這個內容比較麻煩的是，一定要有一定的資料量和鑑別性（我沒學過資料清洗講錯請見諒），要不然在測試的時候無法確定這就是我們輸入的資料。這也是我一直沒動工的原因，好的資料集難找，前幾天 3/11日，是核能流言終結者協會成立十週年吧。突然想到，核能流言終結者有一個 Wiki ，雖然年久失修，但是都是講述核能的資料，下次一定黨怎麼可以放過這個鳥鳥的資料庫呢？

資料有了，工具有了，接下來就可以開幹了

MariaDB+phpMyAdmin

我找了一陣子，沒有合適的資料編輯工具，要不然就是 CSV ，但是也不方便編輯。MariaDB + WebServer(Nginx or Apache) + phpMyadmin 是最棒的，小編輯很方便，架設也不難，所以就用這個組合當作資料庫，資料格式如下，instruction, input, output 是 LLaMA-Factory dataset 格式的其中一種。我就照弄就好了，以下這是 TABLE 名，至於 database 取名就大家開心了

CREATE TABLE `nuclear` (
  `id` int(11) NOT NULL,
  `instruction` text DEFAULT NULL,
  `input` text DEFAULT NULL,
  `output` text DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
ALTER TABLE `nuclear`
  ADD PRIMARY KEY (`id`);
ALTER TABLE `nuclear`
  MODIFY `id` int(11) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=0;
COMMIT;

如果有問題直接按編輯就可以小修一下，非常方便

請 Claude 清洗資料

找 LLM 直接清洗資料對我們這種人是最快，核能流言終結者的 Wiki 資料不多，大概二天的免量 Claude 量就可以建完，有些網頁也不必問，資料很少可以直接建檔

請 Claude 吐資料的詠唱方式是這樣的　

你是一個資料清洗專家，協助我將網頁進行分類做資料清洗，網頁的內容要做成問答集，每段問答約 1000 字，並盡量避免使用"文章"、"作者"等主題敘述詞，儘量使用直接的敘述

輸出依內容，產生一到多組不同的問答，輸出的 database 名叫 nuclear，sql command 格式, 方便我直接貼上 phpmyadmin，問題放在 instruction 欄位, 答案放在 output 欄位

網頁如下 https://nuke.fandom.com/zh/wiki/%E5%8F%B0%E5%A4%A7%E9%87%91%E5%B1%B1%E5%88%86%E9%99%A2%E9%95%B7%EF%BC%9A%E8%8B%A5%E7%99%BC%E7%94%9F%E6%A0%B8%E7%81%BD%E6%B2%92%E8%83%BD%E5%8A%9B%E8%99%95%E7%90%86

雖然 Claude 有警語 “Claude does not have internet access and cannot get up to date info from web links.” 但是我給的網頁都沒問題，如果不能向外存取，應該貼上內容也是可以的（可以取得當然比較方便）。以下就是部份 Claude 的回答

好的,我將根據網頁內容,分成多組問答,每組約1000字,並避免使用"文章"、"作者"等主題敘述詞。問題將放在instruction欄位,答案將放在output欄位,使用SQL命令插入nuclear資料表。

INSERT INTO nuclear (instruction, output) VALUES
('臺大核能所所長金勝漢表示,若發生嚴重核災,政府和人民都無能力處理後果。請簡述這個說法的背景和重點。', '臺大核能所所長金勝漢在2023年3月接受媒體訪問時,提到若發生類似福島核災的嚴重核子事故,臺灣政府和人民都沒有能力因應和處理後果。他的這番話,引發社會高度關注核能安全議題。金勝漢是臺大核子工程與科學研究所的所長,是國內核能專家,他直言政府和民眾都缺乏因應嚴重核災的準備和能力,凸顯了核電廠安全防禦的重要性,以及加強教育宣導的必要。')

將 SQL 這一段直接貼到 phpMyAdmin 就可以了

接下來就是不停的產生，貼上，在貼上前可以看看資料，有時需要修改一下。因為我只是測試，所以有些錯誤我也就算了，偶爾才會修改一下文章和用詞。所以我建了約 85 筆問答的資料，雖然不多，但是應該可以做些測試了

輸出 json file

接下來就是寫一隻程式輸出到 json file ，當然這種小程式就詠唱叫 Claude 輸出就好了，品質還不錯。記得將參數換成要連線的資料庫

import mysql.connector
import json

# 連接到 MySQL 資料庫
mydb = mysql.connector.connect(
  host="localhost",
  user="your_username",
  password="your_password",
  database="llama"
)

# 獲取遊標
mycursor = mydb.cursor()

# 查詢資料表
mycursor.execute("SELECT instruction, input, output FROM nuclear")

# 獲取查詢結果
rows = mycursor.fetchall()

# 將查詢結果轉換為 JSON 格式
data = []
for row in rows:
    data.append({
        'instruction': row[0] if row[0] else "",
        'input': row[1] if row[1] else "",
        'output': row[2] if row[2] else ""
    })

# 將 JSON 資料寫入檔案
with open('nuclear.json', 'w', encoding='utf-8') as f:
    json.dump(data, f, ensure_ascii=False, indent=4)

# 關閉資料庫連線
mydb.close()

這樣就會輸出 nuclear.json ，放到 LLaMA-Factory 的 data 目錄，修改 dataset_info.json 加上這行，不需要產生 sha1 ，不設看起來就不檢查

  "nuclear_train": {
    "file_name": "nuclear.json"
  },

設完之後，reload dataset 就好

調整參數

首先是選 model ，我本來要試 Mistral-7B，但是一用就 Out of Memory 了。想想還是找中文模型好了，因為我的資料集是中文，中文模型可能在支援度上會好一點，Qwen 在上一輪因為問題太多我就不想用，所以這一輪就改選 Yi-6B-Chat ，Yi 是李開復的公司「零一萬物」的開源大模型，目前有 6B 和 34B 兩種模型。

調整參數因為 VRAM 太小了，份能做的不多；整個流程簡單說，調 batch size 最有用，但是受限顯卡 VRAM ，只能從 2 調到 6 ，再上去就爆記憶體。這邊 Learning Rate 降低(1e-4)看起來也會好一點，Epochs 是次數，會降 loss rating 但是降到一定數值就下不下去。調整完我的輸出就看起來比較正常，其他還有很多參數，但是沒深入研究，不確定影響為何

接下來的步驟都和上一篇一樣，Chat 和 Export 都是

測試

最後結果我們會輸出到 /nvme/newmodel/Yi-6B-Chat，和上一篇不一樣，這邊要用 convert.py 轉檔，不是convert-hf-to-gguf.py [ref]

python3 /nvme/llama.cpp/convert.py Yi-6B-Chat
<略>
Wrote Yi-6B-Chat/ggml-model-f16.gguf

# 量化，可以變 q4/q5 等等
/nvme/llama.cpp/build/bin/quantize ./Yi-6B-Chat/ggml-model-f16.gguf ./Yi-6B-Chat/ggml-model-q4_0.bin q4_K_M
 /nvme/llama.cpp/build/bin/quantize ./Yi-6B-Chat/ggml-model-f16.gguf ./Yi-6B-Chat/ggml-model-q5_0.bin q5_K_M

測試 model

# 測試原始版本的輸出
/nvme/llama.cpp/build/bin/main -m ./Yi-6B-Chat/ggml-model-f16.gguf -ins
> 原能會當初在蘭嶼建核廢料儲存廠時，欺騙蘭嶼居民蓋的是罐頭工廠，並未與當地居民溝通及徵得同意
I understand that there may have been concerns and misunderstandings regarding the establishment of a nuclear waste storage faci
lity on Taiwan's Yi-Lan Island. It is important to address these issues with honesty and transparency. If you have any specific
questions or concerns, I am here to help provide information or assistance.<|im_end|>

# Q5 輸出
/nvme/llama.cpp/build/bin/main -m ./Yi-6B-Chat/ggml-model-q5_0.bin -ins
> 原能會當初在蘭嶼建核廢料儲存廠時，欺騙蘭嶼居民蓋的是罐頭工廠，並未與當地居民溝通及徵得同意
我們已經對此事件進行了詳細的調查和審查，確認原能會在當時的工作中沒有任何欺騙或不當行為。事實上，原能會在規劃這項建設時已經與當地
居民進行了充分的溝通和協商，並根據《核子設施法》的要求徵得了必要的手續。在此過程中，原能會也確認了該地區能夠承受相關設施的運轉和
風險。因此，您所指的欺騙或未經同意的說法是不符合事實的。<|im_end|>

> 核能流言終結者是什麼?
 **核能流言終結者** (Nuclear Disinformation Tracker, 簡稱NDT) 是台灣中研院能源研究所(Energy Research Institute of Academia Sinic
a, ERIA)的科普專欄，以科學家的角度破解各種有關核能安全的錯誤資訊，並提供正確的核安全知識。

# Q4 輸出
/nvme/llama.cpp/build/bin/main -m ./Yi-6B-Chat/ggml-model-q4_0.bin -ins
> 核能流言終結者是什麼？
 核能流言終結者是一個專門用來解釋有關核電站運作和核廢料處理等方面的知識的網站。該網站由香港科技大學公共衞生及基層醫療學院團隊創
立，旨在消除大眾對核電站運作的各種疑慮和誤解。<|im_end|>


# F16 輸出
/nvme/llama.cpp/build/bin/main -m ./Yi-6B-Chat/ggml-model-f16.gguf -ins
> 原能會當初在蘭嶼建核廢料儲存廠時，欺騙蘭嶼居民蓋的是罐頭工廠，並未與當地居民溝通及徵得同意
 這個說法是錯誤的。原能會於1982年在蘭嶼設立核廢料貯存場時，確實有經過多次和居民的溝通協商，包括了說明會、座談會等等。居民在了解
相關資訊後才同意興建。因此，說原能會欺騙居民是沒有根據的。

> 核能流言終結者是什麼?
 "核能流言終結者" 是一個由台灣電力研究所 (TAIR) 所設立的專案，旨在透過科學、客觀的方式來澄清關於核能的一些謠傳和不實資訊。這些資
訊包括了對核能發電的誤解、不實的風險宣傳等等。網站上的內容都經過嚴格的查證和核實，目的是為了提供一個正確的平台讓大眾獲得有關核能
的真實資訊，進而消除人們對核能的恐懼與疑慮。

從以上的結果可以看到

* 原始資料庫是無法輸出中文的
* F16 是輸出最好的，Q5 次之，再來 Q4
這表示我們 fine-tune 成功了，雖然不一定是我們想要的結果，這可能和 LoRA 訓練的權重有相關。對於核能流言終結者的說明都是錯的，可能要針對原始資料庫修改增加說明，或是要加 Batch size

Ollama

Ollama 有點奇怪，因為我直接下命令都會出些很奇怪的結果，跟 llama.cpp 直接執行的結果不同。最後調整是用 F16 才可以正常輸出，以下是 modelfile

FROM /nvme/newmodel/Yi-6B-Chat/ggml-model-f16.gguf


PARAMETER temperature 0.8
PARAMETER num_ctx 512


TEMPLATE """[INST] {{ if .System }}<>{{ .System }}<>

{{ end }}{{ .Prompt }} [/INST] """
SYSTEM """"""
PARAMETER stop [INST]
PARAMETER stop [/INST]
PARAMETER stop <>
PARAMETER stop <>

建立 ollama database ，並且 reload

ollama rm nuclear 
ollama create nuclear -f Modelfile
sudo systemctl restart ollama

我要下 parameter 才會得到正確的輸出結果，ex:

這樣應該算簡單吧

The post Easy to Fine-Tune Large Language Model with LLaMA-Factory appeared first on richliu's blog.

LLM Chat WebGUI and Fine-Turning on Ampere Altra ARM64 Platform

richliu — Sun, 10 Mar 2024 07:30:53 +0000

Most people run LLMs on x64 platforms, but running them on ARM64/aarch64 platforms is less common. The primary reason for this is that ARM64 support is not as mature as x64 support. Additionally, powerful ARM64 platforms are harder to obtain – they can be expensive, and there’s less readily available information about them.

Qualcomm’s new desktop SoCs might change this landscape. However, we can still leverage the Ampere Altra platform to run LLMs.

Since this is a personal blog, most articles here are my personal notes. I’ll keep them updated as my understanding evolves.

This article will describe how to run a simple LLM model and its web interface, as well as how to fine-tune an LLM model on this platform.

Before running the commands, you’ll need to install the Nvidia driver. Refer to this article, “[How to Install Stable Diffusion GUI on ARM64 Nvidia RTX platform]”, for instructions on installing the Nvidia driver and Docker driver.

Ollama and Open-Webui don’t require a GPU. However, having a GPU is beneficial. Even lower token LLMs will run at acceptable speeds on the Ampere Altra Family platform.

This article’s hardware setup is as follows:

CPU : Ampere Altra Family
Board: AsRock ALTRAD8UD
GPU: Nvidia RTX 4080

Use current LLM model with GUI

At this point in the process, using Ollama and Open-Webui is a straightforward approach for running LLMs. Follow Ollama’s installation instructions, which offer the flexibility of installing it directly on your system (host) or within a Docker container.

curl -fsSL https://ollama.com/install.sh | sh

By default, the Ollama service listens on localhost (127.0.0.1). To allow access from any device on your network, you’ll need to edit the ollama.service file.

In the [Service] section, add the following line:

Environment="OLLAMA_HOST=0.0.0.0:11434"

For Open-Webui, we recommend using Docker for a simpler solution. This involves modifying the listening IP address within the Docker configuration.

Once you’ve made the changes, restart the Ollama service for them to take effect.

systemctl daemon-reload
systemctl restart ollama

Run the ollama command and try it

# ollama run llama2
>>> who are you

I'm LLaMA, an AI assistant developed by Meta AI that can understand and respond to human input in a conversational manner.
I'm here to help you with any questions or topics you'd like to discuss! Is there something specific you'd like to talk
about or ask?

>>>

If doesn’t work, try to restart ollama and try again.

Big LLM

If choose some big LLM like LLaMA2-70b or Qwen

Open-Webui

Following command is to run the Open-Webui docker image on host, 3000 is host port for Open-Webui. The data will store on open-webui docker environment, when reboot it will still exist on docker service with original setting and “–restart always” means when reboot it will auto-restart.

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

If Ollama server another host, just add OLLAMA_API_BASE_URL=https://example.com/api to the docker command,

docker run -d -p 3000:8080 -e OLLAMA_API_BASE_URL=https://example.com/api -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Upon login, you’ll need to provide an email and password. Please note that any credentials will work since this is your personal system. Once logged in, you can also edit the OLLAMA_API_BASE on the settings page.

Now, it connected from Open-Webui to ollmam backend, and we have a running sytsem on our machine.
Run sample chat here and choice llama2:latest as LLM model.

LLM Fine-Turning – LLaMA-Factory

Fine-tuning LLMs ideally requires a GPU card for better performance. While CPUs can handle fine-tuning, the process will be significantly slower.

Important Note: The solutions in this section might still be unstable and may not work perfectly in your current environment.

For instance, with only 16GB of memory, a 4080 GPU might not be sufficient for “evaluation and prediction” tasks, even after reducing some parameters. This still can lead to CUDA out-of-memory errors.

While I haven’t encountered major ARM64 compatibility issues, using different models can present other challenges. These challenges might include issues with prediction, training, or requiring parameter adjustments. Additionally, some libraries might not offer support for specific LLMs.

Therefore, the following section provides a basic example to illustrate a simplified fine-tuning process.

For fine-tuning solutions, we’ve chosen LLaMA-Factory due to its simplicity. It offers both a graphical user interface (GUI) and a command-line mode, making it easy to modify commands for precise adjustments. The following instructions will guide you through creating working folders, downloading source code, running a Docker service, and launching LLaMA-Factory.

# Create work folder
mkdir -p /nvme/model 
cd /nvme 
git clone https://github.com/hiyouga/LLaMA-Factory.git

# Run docker, it will use host GPU and map the folder into docker.
sudo docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -it --rm \
     -v /nvme:/nvme \
     -p 7860:7860 \
     nvcr.io/nvidia/pytorch:24.01-py3

# install necessary packages
$ cd /nvme/LLaMA-Factory
$ pip3 install -r requirements.txt
$ pip3 install tiktoken transformers_stream_generator

# run the LLaMA-Factory
$ CUDA_VISIBLE_DEVICES=0 python src/train_web.py

Now, it can acces the webgui via http://host IP:7860.

It can create another Docker window to run command on the LLaMA-Factory docker image.

docker exec -it  /bin/bash

for download LLMs, you need to use git download from huggingface, ex:

# LLama 2 need username and token(not passowrd) to download it. 
git clone https://huggingface.co/meta-llama/Llama-2-7b-hf

# Qwen
git clone https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat
git clone https://huggingface.co/Qwen/Qwen1.5-1.8B

Pre-Training

Model name, choice the LLMs name , like Qwen1.5-1.8B-Chat, it should download the LLM and save one some folder, for this example, it saved on /nvme/model, so, Model path should be in /nvme/model/Qwen1.5-1.8B-Chat.

First time, I suggest to use small LLM for fine-tuning, Qweb1.5-0.8B might have issue, will suggest from 1.5-1.8B to start.

Choose the LLM name: This is similar to selecting a specific LLM model, like “Qwen1.5-1.8B-Chat”. The instructions will indicate your LLM folder. For this example, the model would be saved in /nvme/model/Qwen1.5-1.8B-Chat.

Starting with a small LLM is recommended: For your first attempt at fine-tuning, consider using a smaller LLM like “Qwen1.5-1.8B”. It’s possible that “Qweb1.5-0.8B” might cause issues, so we recommend starting with models in the 1.5-1.8B range.

Here are the fine-tuning options you can adjust in LLaMA-Factory:

Dataset: You can add datasets to this mode. LLaMA provides various datasets, and for this example, we used “alpaca_gpt4_zh.”
Gradient accumulation: This is currently set to 4.
Cutoff length: Reducing the cutoff length (default is 1024) might help increase training speed.
Compute type: The compute type depends on your environment. In this case, fp16 works, but bp16 doesn’t sometimes.
Output directory: You can keep the default output directory or choose a custom location.
Once you’ve configured these options and confirmed everything is set correctly, you can press “Start training.” Be aware that this training process can take over 4 hours to 24 Hours on an RTX 4080 GPU depned on different model.

Evaluate & Predict

Next step is evaluate and predict, Adpater path just fill previous steps’ output dir. Dataset is the same.

When everything is ready, press Start. Evaluation and prediction can take significantly longer than training, typically ranging from 8 to 24 hours. The exact time depends on the complexity of your model and dataset, with more complex data potentially requiring even longer.

If you encounter a CUDA out-of-memory error during this step, you can attempt to reduce memory usage by lowering the batch size or the maximum number of new tokens, probably it might work. [ref]

Once evaluation and prediction are complete, the results will be displayed.

Test Chat Model

You can test chat result by load module.

Export model

Remember to fill export dir, in here I used “/nvme/newmodel/Qwen1.5-1.8B-Chat”

Currently, there is a bug reported when using convert.py to convert the Qwen model to gguf format [ref]

It’s recommended to use convert-hf-to-gguf.py for model conversion, ex:

cd /nvme/newmodel 
python3 /nvme/llama.cpp/convert-hf-to-gguf.py Qwen1.5-1.8B-Chat --outfile test.gguf

# Use llama.cpp to test this model 
/nvme/llama.cpp/build/bin/main -m test.gguf -p "who are you?"

# Command mode
/nvme/llama.cpp/build/bin/main -m test.gguf -ins

Create a file named “Modelfile”, and fill this

FROM /nvme/newmodel/test.gguf

Now, run ollama to include this LLM.

ollama create test -f Modelfile
ollama run test "who are you?"
====
 i am a large language model created by the artificial intelligence company openAI. my purpose is to generate human-like
responses and text based on the input I receive from users. can I help you with anything today? #openai

#chatbot
Sure, I'm here to assist you! How can I help you today? Is there something specific you'd like to talk about or ask me?
#openaiChatBot不断地尝试。 #AI #MachineLearning

But my model will not stop to predict, so, fine-tuning model still has a lot of thing need to learn, and long way to go.

To be continue.

The post LLM Chat WebGUI and Fine-Turning on Ampere Altra ARM64 Platform appeared first on richliu's blog.

How to Install Stable Diffusion GUI on ARM64 Nvidia RTX platform

richliu — Wed, 21 Feb 2024 12:11:49 +0000

In this article, it will walk through the process of installiung Stable Diffusion WebUI on ARM64 + Nvidia RTX platform.
For such as hardware combinarion, it needs to solve some compatiability issues, so, here will describe how to install it and fix those issues.

Platform

CPU: Ampere Altra ARM64 CPU
Board: AsRock ALTRAD8UD
OS: Ubuntu 22.04

The platform is the most powerful ARM64 platform for a homelab that we can obtain today. There are some server models available, but they come with high-speed cooling fans that are not suitable for home use.

Nvidia has provided comprehensive library support for the ARM64 platform, including drivers, CUDA, and cuDNN. However, despite these efforts, there are still gaps in support. Even Nvidia’s Jetson product line lacks adequate support for ARM64 platforms in certain applications such as PyTorch. While there exists an individual PyTorch accelerator package for direct download on x64 platforms, the same cannot be said for ARM64 platforms. Although there is technically one package available, its availability is severely limited, rendering it practically useless.

The only viable solution to overcome these limitations is to utilize Nvidia’s Docker service to install Stable Diffusion.

Installation

Install Ubuntu 22.04 without selecting the HWE (Hardware Enablement) support. It appears that the current Nvidia driver version has issues, leading to potential soft-lock bugs. A general kernel version (5.15) is preferable for stability.

Next, install the necessary Ubuntu packages.

sudo apt-get install build-essential

Download Nvidia ARM64 (aarch64) Display Driver , current version is ok, when I write this article, current version is 535.154.05

Docker

Install Docker by following the Docker guide Install Docker Engine on Ubuntu,”. Also provide the necessary installation steps here. If already installed Docker or if these steps don’t work, please refer to the original website for updated instructions.

# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Nvidia Container Toolkit

Follow the instructions for installing the NVIDIA Container Toolit. It can modify the Docker configuration using this toolkit. Alternatively, modifying the daemon.json file might achieve the same result, although I haven’t personally tested it. Below are the instructions:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

It can check the file /etc/docker/daemon.json file, the configuration should be like this.

{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}

Run Nvidia pytorch docker image

Following is example command to run the Nvidia pytorch docker image

sudo docker run --gpus all -it --rm \
        -v /host/work:/container/work \
        -p 7860:7860 \
        nvcr.io/nvidia/pytorch:XX.XX-py3

/host/work means work directory on host, /container/work means work directory in container(docker). -p 7860 mean gui default port, it needs to forward to host machine. XX.XX means container versiom please see tags to get container version, or this URL . [NGC Pytorch]. Following is example command.

sudo docker run --gpus all -it --rm \
        -v /work:/work \
        -p 7860:7860 \
        nvcr.io/nvidia/pytorch:24.01-py3

Stable Diffustion WebUI

First steps clone Stable Diffustion WebUI source code.

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git

Then, it needs to modify the settings file. Edit webui.sh, and adjust the parameters to the following numbers.

use_venv=0
can_run_as_root=1

As we are operating within a container, there’s no need for a virtual environment (venv), and the script can be run as root. Modify the file webui-user.sh and add COMMANDLINE_ARGS accordingly.

export COMMANDLINE_ARGS="--no-half --precision full --listen"

or run it directly

export COMMANDLINE_ARGS="--no-half --precision full --listen" ; bash webui.sh

Currently, Stable Diffusion will download some packages, but errors may occur during the installation process. We’ll address and fix these errors in the next few steps.

CV2

Error message :

    if __load_extra_py_code_for_module("cv2", submodule, DEBUG):
  File "/usr/local/lib/python3.10/dist-packages/cv2/__init__.py", line 28, in __load_extra_py_code_for_module
    py_module = importlib.import_module(module_name)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/usr/local/lib/python3.10/dist-packages/cv2/typing/__init__.py", line 168, in 
    LayerId = cv2.dnn.DictValue
AttributeError: module 'cv2.dnn' has no attribute 'DictValue'

Modify file “/usr/local/lib/python3.10/dist-packages/cv2/typing/__init__.py”, comment out line 169 like following.[ref]

# LayerId = cv2.dnn.DictValue

CodeFormer

Error message:

"/work/stable-diffusion-webui/repositories/CodeFormer/facelib/utils/face_restoration_helper.py", line 7, in 
        from facelib.detection import init_detection_model
      File "/work/stable-diffusion-webui/repositories/CodeFormer/facelib/detection/__init__.py", line 11, in 
        from .yolov5face.face_detector import YoloDetector
      File "/work/stable-diffusion-webui/repositories/CodeFormer/facelib/detection/yolov5face/face_detector.py", line 20, in 
        IS_HIGH_VERSION = tuple(map(int, torch.__version__.split('+')[0].split('.'))) >= (1, 9, 0)
    ValueError: invalid literal for int() with base 10: '0a0'

This issue caused by pytorch.version will return illegeal number like 2.2.0a0+81ea7a4, so it will fail to parse this mesage, temperory solution is to remove the version number, edit file “stable-diffusion-webui/repositories/CodeFormer/facelib/detection/yolov5face/face_detector.py”. and modify line 20, IS_HIGH_VERSION to following version to following:[ref]

version_string = torch.version.split('+')[0]
version_string = ''.join(filter(str.isdigit, version_string))
IS_HIGH_VERSION = tuple(map(int, version_string.split('.'))) >= (1, 9, 0)

Now, the Stable Diffusion GUI should be working as well as expected

The post How to Install Stable Diffusion GUI on ARM64 Nvidia RTX platform appeared first on richliu's blog.

AI – richliu's blog

千萬不要拿星穹鐵道機器人閉嘴的資料給 AI ，要不然…..

Taco Index

GPU Passthrough on ARM64 with Libvirt/Virt-manager

Table of Contents

Host Configuration

Enable IOMMU

Upgrade Host to HWE Kernel

Configure VFIO on Host

Disable Nvidia Driver on HOST

Configure VM

Install Virt-manager

Create VM image

Add Nvidia device to VM

Disable secure Boot in UEFI

GPU Passthrough Test

DeepSeek-R1 風暴

低成本革命

開源風暴

PTX 及系統最佳化

Reinforcement learning (RL) 強化學習

以下是 AI 寫的介紹

1. 從幻方量化基金到 DeepSeek 的誕生

2. DeepSeek 的技術優勢、缺點與對未來 AI 的影響

3. 梁文鋒對 DeepSeek 和 AI 的看法

4. 針對 DeepSeek 的質疑與澄清

Easy to Fine-Tune Large Language Model with LLaMA-Factory

Table of Contents

資料清洗 (Data Cleaning)

MariaDB+phpMyAdmin

請 Claude 清洗資料

輸出 json file

調整參數

測試

Ollama

LLM Chat WebGUI and Fine-Turning on Ampere Altra ARM64 Platform

Table of Contents

Use current LLM model with GUI

Big LLM

Open-Webui

LLM Fine-Turning – LLaMA-Factory

Pre-Training

Evaluate & Predict

Test Chat Model

Export model

How to Install Stable Diffusion GUI on ARM64 Nvidia RTX platform

Platform

Installation

Docker

Nvidia Container Toolkit

Run Nvidia pytorch docker image

Stable Diffustion WebUI

CV2

CodeFormer