Replies: 5 comments
-
Share the proxy/upstream logs from llama-swap as well. llama-swap doesn’t have a 60s request. My guess is that it may be in nginx to prevent long running requests. |
Beta Was this translation helpful? Give feedback.
-
I can later when I have access however I did not see anything in there pointing to a timeout.
|
Beta Was this translation helpful? Give feedback.
-
On my box I have llama-swap running and clients talk directly to it. I have plenty of requests that go into several minutes and do not abort.
|
Beta Was this translation helpful? Give feedback.
-
Pretty sure this is caused by proxy_buffering in nginx. The turning it off: proxy_buffering off; |
Beta Was this translation helpful? Give feedback.
-
ok got it. will give it a try thanks! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the bug
I have some models running behind llama swap and if the generation of the first token takes longer than a minute I get an initial 504 timeout error and later start receiving the response.
Ive confirmed my proxy infront of llama swap has a long timeout configured.
Expected behaviour
A timeout should not occur for long generation times.
Operating system and version
Asahi linux
Mac Os Studio M1
Proxy Logs
Example log from my nginx proxy
Notice
"request_time": "60.006"
Beta Was this translation helpful? Give feedback.
All reactions