Install with docker not working as intended #233

T2WIN · 2025-07-24T21:26:52Z

T2WIN
Jul 24, 2025

Describe the bug
I can't run any model even though I verified that the paths I provided are correct.

Expected behaviour
Model running.

Operating system and version

OS: WSL2
GPUs: RTX5070ti

My Configuration

# llama-swap configuration for your models
# ----------------------------------------

# Health check timeout: 10 minutes, allows large models time to load.
healthCheckTimeout: 600

# Log level: info is usually sufficient.
logLevel: info

# Starting port for model servers.
startPort: 10001

# Reusable macros to keep the model definitions clean.
# This is updated to exactly match your working command's arguments.
macros:
  "llama-server-base": |
    /home/grand/llm/llama.cpp/build/bin/llama-server \
    --n_gpu_layers 99 \
    -c 4096 \
    --port ${PORT}

# Model definitions.
# Each key is a unique ID used in API requests.
models:

  # -- Model 1: Granite 8B Q4 --
  "granite-8b-q4":
    cmd: |
      ${llama-server-base} \
      --model /home/grand/llm/models/granite-3.3-8b/granite-3.3-8b-instruct-UD-Q4_K_XL.gguf
    aliases:
      - "granite-q4"
    ttl: 300 # Unload after 5 minutes of inactivity

  # -- Model 2: Granite 8B Q6 --
  "granite-8b-q6":
    cmd: |
      ${llama-server-base} \
      --model /home/grand/llm/models/Granite3.3-8B/granite-3.3-8b-instruct-UD-Q6_K_XL.gguf
    aliases:
      - "granite-q6"
    ttl: 300 # Unload after 5 minutes of inactivity

  # -- Model 3: Mistral Small v3.2 --
  "mistral-small-3.2":
    cmd: |
      ${llama-server-base} \
      --model /home/grand/llm/models/Mistral-Small-3.2-Q4/Mistral-Small-3.2-24B-Instruct-2506-UD-Q4_K_XL.gguf
    aliases:
      - "mistral-s"
    ttl: 300 # Unload after 5 minutes of inactivity

# This group ensures that only one model is ever loaded at a time.
# When you request a model, any other loaded model in this group will be unloaded first.
groups:
  "main-swap-group":
    # swap: true ensures only one model in this group runs at a time.
    swap: true
    # exclusive: true ensures this group unloads any other groups (good default).
    exclusive: true
    # members lists all the models that will follow these rules.
    members:
      - "granite-8b-q4"
      - "granite-8b-q6"
      - "mistral-small-3.2"

Proxy Logs

llama-swap listening on :8080
[INFO] Request 127.0.0.1 "GET / HTTP/1.1" 302 26 "curl/7.81.0" 37.058µs
[INFO] Request 172.17.0.1 "GET /api/events HTTP/1.1" 200 712 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:140.0) Gecko/20100101 Firefox/140.0" 51.580608465s
[INFO] Request 172.17.0.1 "GET /ui/models HTTP/1.1" 200 750 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:140.0) Gecko/20100101 Firefox/140.0" 2.363348ms
[INFO] Request 172.17.0.1 "GET /ui/assets/index-BLhXeKog.css HTTP/1.1" 200 17272 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:140.0) Gecko/20100101 Firefox/140.0" 1.020839ms
[INFO] Request 172.17.0.1 "GET /ui/assets/index-XjoeJCDB.js HTTP/1.1" 200 233878 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:140.0) Gecko/20100101 Firefox/140.0" 1.997705ms
[INFO] Request 172.17.0.1 "GET /ui/apple-touch-icon.png HTTP/1.1" 200 6076 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:140.0) Gecko/20100101 Firefox/140.0" 131.801µs
[INFO] Request 172.17.0.1 "GET /ui/favicon.svg HTTP/1.1" 200 38720 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:140.0) Gecko/20100101 Firefox/140.0" 638.682µs
[INFO] Request 172.17.0.1 "GET /upstream/granite-8b-q4/ HTTP/1.1" 502 315 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:140.0) Gecko/20100101 Firefox/140.0" 837.683µs
[INFO] Request 172.17.0.1 "GET /upstream/granite-8b-q4/ HTTP/1.1" 502 315 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:140.0) Gecko/20100101 Firefox/140.0" 376.119µs
[INFO] Request 172.17.0.1 "GET /upstream/granite-8b-q6/ HTTP/1.1" 502 314 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:140.0) Gecko/20100101 Firefox/140.0" 395.13µs
[INFO] Request 172.17.0.1 "GET /upstream/mistral-small-3.2/ HTTP/1.1" 502 333 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:140.0) Gecko/20100101 Firefox/140.0" 336.503µs
[INFO] Request 172.17.0.1 "GET /upstream/granite-8b-q6/ HTTP/1.1" 502 314 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:140.0) Gecko/20100101 Firefox/140.0" 356.039µs
[INFO] Request 172.17.0.1 "POST /api/models/unload HTTP/1.1" 200 12 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:140.0) Gecko/20100101 Firefox/140.0" 61.769µs
[INFO] Request 127.0.0.1 "GET / HTTP/1.1" 302 26 "curl/7.81.0" 28.488µs
[INFO] Request 172.17.0.1 "GET /upstream/granite-8b-q4/ HTTP/1.1" 502 315 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:140.0) Gecko/20100101 Firefox/140.0" 355.749µs
[INFO] Request 172.17.0.1 "GET /api/events HTTP/1.1" 200 8150 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:140.0) Gecko/20100101 Firefox/140.0" 9.259096612s
[INFO] Request 172.17.0.1 "POST /v1/chat/completions HTTP/1.1" 502 314 "AsyncOpenAI/Python 1.97.0" 583.373µs
[INFO] Request 172.17.0.1 "POST /v1/chat/completions HTTP/1.1" 502 314 "AsyncOpenAI/Python 1.97.0" 563.832µs
[INFO] Request 172.17.0.1 "POST /v1/chat/completions HTTP/1.1" 502 314 "AsyncOpenAI/Python 1.97.0" 549.513µs
[INFO] Request 127.0.0.1 "GET / HTTP/1.1" 302 26 "curl/7.81.0" 28.534µs
[INFO] Request 127.0.0.1 "GET / HTTP/1.1" 302 26 "curl/7.81.0" 28.071µs

Upstream Logs

unable to start process: start() failed for command '/home/grand/llm/llama.cpp/build/bin/llama-server --n_gpu_layers 99 -c 4096 --port 10002 --model /home/grand/llm/models/Granite3.3-8B/granite-3.3-8b-instruct-UD-Q6_K_XL.gguf': fork/exec /home/grand/llm/llama.cpp/build/bin/llama-server: no such file or directory

Here is the command I launched llama-swap with :
sudo docker run -it --rm -p 9292:8080 -v /home/grand/llm/llama_swap_config.yaml:/app/config.yaml ghcr.io/mostlygeek/llama-swap:cuda

I removed the -runtime nvidia because it was not running at all with it.

mostlygeek · 2025-07-24T23:31:01Z

mostlygeek
Jul 24, 2025
Maintainer

The path to llama-server should be to the one in the container. Looks like the paths to the binary and models are ones on the host machine.

0 replies

2025-08-08T02:19:13Z

github-actions[bot]
bot Aug 8, 2025

This issue is stale because it has been open for 2 weeks with no activity.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Install with docker not working as intended #233

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Install with docker not working as intended #233

Uh oh!

T2WIN Jul 24, 2025

Replies: 2 comments

Uh oh!

Uh oh!

mostlygeek Jul 24, 2025 Maintainer

Uh oh!

github-actions[bot] bot Aug 8, 2025

T2WIN
Jul 24, 2025

mostlygeek
Jul 24, 2025
Maintainer

github-actions[bot]
bot Aug 8, 2025