-
Notifications
You must be signed in to change notification settings - Fork 13.9k
Description
I don't know what can get wrong. rocm or llama.cpp si for now juste a question.
I have 2 build of llama.cpp with rocm7.9 ans rocm7.11 (from therock) when I bench with Mistral-Small-2506-Q6_K.gguf I get:
Hadware: a Ryzen IA MAX+ with 128Go
OS: Fedora 43
- with rocm 7.9 (same with rocm 6.4.4)
⬢ [philou@toolbx LLM]$ GGML_CUDA_ENABLE_UNIFIED_MEMORY=ON ./build_ref/${THEROCK_VER}/bin/llama-bench -ngl 999 --mmap 0 -ub 4096 -b 8192 -fa 1 -r 3 -p "1,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16" -n 16 -pg "512,64" -m ${LLM_MODEL}
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_batch | n_ubatch | fa | mmap | test | t/s |
|---|---|---|---|---|---|---|---|---|---|---|
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp1 | 11.34 ± 0.01 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp1 | 11.34 ± 0.00 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp2 | 22.20 ± 0.01 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp3 | 32.85 ± 0.01 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp4 | 42.87 ± 0.05 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp5 | 51.47 ± 0.03 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp6 | 57.74 ± 0.06 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp7 | 62.06 ± 0.06 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp8 | 66.04 ± 0.03 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp9 | 77.15 ± 0.05 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp10 | 85.57 ± 0.10 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp11 | 94.00 ± 0.11 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp12 | 102.18 ± 0.04 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp13 | 110.20 ± 0.11 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp14 | 118.40 ± 0.08 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp15 | 126.76 ± 0.09 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp16 | 135.08 ± 0.13 |
- with rocm 7.11:
| model | size | params | backend | ngl | n_batch | n_ubatch | fa | mmap | test | t/s |
|---|---|---|---|---|---|---|---|---|---|---|
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp1 | 11.41 ± 0.00 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp1 | 11.41 ± 0.01 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp2 | 22.35 ± 0.00 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp3 | 33.01 ± 0.04 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp4 | 43.16 ± 0.02 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp5 | 52.03 ± 0.04 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp6 | 58.99 ± 0.08 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp7 | 62.81 ± 0.10 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp8 | 66.02 ± 0.07 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp9 | 26.60 ± 0.02 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp10 | 29.53 ± 0.02 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp11 | 32.47 ± 0.02 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp12 | 35.39 ± 0.01 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp13 | 38.35 ± 0.02 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp14 | 41.27 ± 0.02 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp15 | 44.21 ± 0.01 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp16 | 47.16 ± 0.03 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | tg16 | 11.40 ± 0.01 |
| llama 13B Q6_K | 18.31 GiB | 23.57 B | ROCm | 999 | 8192 | 4096 | 1 | 0 | pp512+tg64 | 81.44 ± 0.04 |
Now I know is a runtime problem, if I run the binary build with rocm7.9 with the rocm7.11 runtime I have the same result has with rocm7.9....
on BF16/FP16 model all look good with both release...
I did not know if it can be because of a different build path with rocm7.11, or with the compiler (hipcc/clang.)
- rocm 7.9 look to use llvm 20
- rocm 7.11 look to use llvm 22
For now, I'm trying to figure out what's going on. This is so I can report the right bugs to the right place.
If someone have idea? What happen with pp8 vs pp9, different path/code? (note: I'll have a look but did not see anything for now)
if some want to test I can provide more build detail.