Run deepseek-r1-distill model by llama.cpp on RISCV target

Build llama.cpp

git clone git@github.com:ggml-org/llama.cpp.git

Current llama.cpp only support RVV on VLEN-128 so we disable RVV for now. The openMP might not support in the target environment so we also disable it.

Cross-compiler the llama-cli binary

cmake -B build-riscv-nov \ 
  -DCMAKE_SYSTEM_NAME=Linux \
  -DCMAKE_SYSTEM_PROCESSOR=riscv64 \
  -DCMAKE_C_COMPILER=riscv64-unknown-linux-gnu-clang \
  -DCMAKE_CXX_COMPILER=riscv64-unknown-linux-gnu-clang++ \
  -DCMAKE_BUILD_TYPE=Release \
  -DGGML_RVV=OFF \
  -DGGML_OPENMP=OFF \
  -DBUILD_SHARED_LIBS=OFF \
  .
  
cmake --build build-riscv-nov -j64

The llama-cli is in build-riscv-nov/bin/.

Get model

llama.cpp use GGUF format weight so find the model like DeepSeek-R1-Distill-Qwen-7B-GGUF . In the files and versions tab, we can find the weight file DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf.

Run model

Copy the llama-cli and DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf to target machine(haps/qemu…etc). Type this command to run it

./llama-cli -m DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf --threads 16 -st --prompt 'What is 1+1?'

Run deepseek-r1-distill model by llama.cpp on RISCV target

Build llama.cpp

Get model

Run model

Related Posts