-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: Increase completion timeout to prevent crash? #1417
Comments
Did this timeout occur while your LLM was responding or while you had a session open but had not yet sent a chat to the LLM? If you LLM is taking 10 minutes to reply that is kind of an insane latency, but yes it should not crash the server |
Kind of in-between. Continuing a previous chat session/thread, restarted back-end (LocalAI), the first chat message to LocalAI loads the model into memory -- so its not responding yet to messages, and yes on CPU it takes a while to load into RAM for 70b - but subtle difference being that its not inferencing yet. Take a look at my screenshot here to see the events, by the time the model has loaded, the AnythingLLM server has told me it has better than things to do than wait (crashed) 😆 . Should mention, after its loaded, its works good no issue. Thanks for getting back to me 👍 |
Ah, so its just the model taking a long time to load the request moves on. The 10 minutes is no coincidence either. For LocalAI we use I would be nervous to have this be infinity because then you can hang the entire call. Is is unreasonable to ask to I'm not super excited to accidentally lead to infinitely hanging requests for LocalAI! |
Will add that regardless this should not exit the process - so that needs to be patched for sure |
Yeah, I'd agree with that. Not too worried about the specific handling for LocalAI, just that the server doesn't crash for anythingllm. I can handle model preloading and such, but when I'm downloading various models and trying them out and loading them on the fly -- just to not have a crash would be good enough. |
This is interesting, i am trying to replicate this right now and i cant get that exact timeout to occur. It is always handled, which have me thinking this exception is being thrown somewhere else that is not being caught. Any exception during streaming would be caught and prevent an outright crash. Right now I'm having trouble reproducing the exact error so I can locate its full stacktrace and handle it |
I'll see if i can run this locally, in debug mode, and give ya any more info. Also, I was 11 commits behind master, so, lemme grab latest and try again as well. |
From what i saw in the lockfile, the |
How are you running AnythingLLM?
Docker (local)
What happened?
Using LocalAI for backend -- loading the llama3 70b model, Anything LLM container crashed with a socket timeout.
A timeout I think is fine , it takes a while to load. However I didn't expect the container to crash, I kind of expected just to re-initiate the thread.
Are there known steps to reproduce?
I think this should be reproducable with any load time of greater than 10 minutes.
The text was updated successfully, but these errors were encountered: