Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Increase completion timeout to prevent crash? #1417

Open
rsdmike opened this issue May 16, 2024 · 9 comments
Open

[Question]: Increase completion timeout to prevent crash? #1417

rsdmike opened this issue May 16, 2024 · 9 comments
Assignees
Labels
investigating Core team or maintainer will or is currently looking into this issue needs info / can't replicate Issues that require additional information and/or cannot currently be replicated, but possible bug

Comments

@rsdmike
Copy link

rsdmike commented May 16, 2024

How are you running AnythingLLM?

Docker (local)

What happened?

Using LocalAI for backend -- loading the llama3 70b model, Anything LLM container crashed with a socket timeout.

2024-05-15 22:24:20 [TELEMETRY SENT] {
2024-05-15 22:24:20   event: 'sent_chat',
2024-05-15 22:24:20   distinctId: 'be3ac3d9-aa83-4458-ae1a-583a3fcc909b',
2024-05-15 22:24:20   properties: {
2024-05-15 22:24:20     multiUserMode: false,
2024-05-15 22:24:20     LLMSelection: 'localai',
2024-05-15 22:24:20     Embedder: 'openai',
2024-05-15 22:24:20     VectorDbSelection: 'lancedb',
2024-05-15 22:24:20     runtime: 'docker'
2024-05-15 22:24:20   }
2024-05-15 22:24:20 }
2024-05-15 22:24:20 [Event Logged] - sent_chat
2024-05-15 23:04:01 Cannonball results 3511 -> 470 tokens.
2024-05-15 23:04:01 Cannonball results 356 -> 286 tokens.
2024-05-15 23:04:53 [TELEMETRY SENT] {
2024-05-15 23:04:53   event: 'sent_chat',
2024-05-15 23:04:53   distinctId: 'be3ac3d9-aa83-4458-ae1a-583a3fcc909b',
2024-05-15 23:04:53   properties: {
2024-05-15 23:04:53     multiUserMode: false,
2024-05-15 23:04:53     LLMSelection: 'localai',
2024-05-15 23:04:53     Embedder: 'openai',
2024-05-15 23:04:53     VectorDbSelection: 'lancedb',
2024-05-15 23:04:53     runtime: 'docker'
2024-05-15 23:04:53   }
2024-05-15 23:04:53 }
2024-05-15 23:04:53 [Event Logged] - sent_chat
2024-05-15 23:25:00 node:internal/process/promises:288
2024-05-15 23:25:00             triggerUncaughtException(err, true /* fromPromise */);
2024-05-15 23:25:00             ^
2024-05-15 23:25:00 
2024-05-15 23:25:00 Error: Socket timeout
2024-05-15 23:25:00     at Socket.onTimeout (/app/server/node_modules/agentkeepalive/lib/agent.js:350:23)
2024-05-15 23:25:00     at Socket.emit (node:events:529:35)
2024-05-15 23:25:00     at Socket._onTimeout (node:net:598:8)
2024-05-15 23:25:00     at listOnTimeout (node:internal/timers:569:17)
2024-05-15 23:25:00     at process.processTimers (node:internal/timers:512:7) {
2024-05-15 23:25:00   code: 'ERR_SOCKET_TIMEOUT',
2024-05-15 23:25:00   timeout: 601000
2024-05-15 23:25:00 }
2024-05-15 23:25:00 
2024-05-15 23:25:00 Node.js v18.19.1

A timeout I think is fine , it takes a while to load. However I didn't expect the container to crash, I kind of expected just to re-initiate the thread.

Are there known steps to reproduce?

I think this should be reproducable with any load time of greater than 10 minutes.

@rsdmike rsdmike added the possible bug Bug was reported but is not confirmed or is unable to be replicated. label May 16, 2024
@timothycarambat
Copy link
Member

Did this timeout occur while your LLM was responding or while you had a session open but had not yet sent a chat to the LLM? If you LLM is taking 10 minutes to reply that is kind of an insane latency, but yes it should not crash the server

@rsdmike
Copy link
Author

rsdmike commented May 17, 2024

Kind of in-between. Continuing a previous chat session/thread, restarted back-end (LocalAI), the first chat message to LocalAI loads the model into memory -- so its not responding yet to messages, and yes on CPU it takes a while to load into RAM for 70b - but subtle difference being that its not inferencing yet. Take a look at my screenshot here to see the events, by the time the model has loaded, the AnythingLLM server has told me it has better than things to do than wait (crashed) 😆 .

image

Should mention, after its loaded, its works good no issue.

Thanks for getting back to me 👍

@timothycarambat
Copy link
Member

Ah, so its just the model taking a long time to load the request moves on. The 10 minutes is no coincidence either. For LocalAI we use openai's NPM package which has a 10-minute timeout

I would be nervous to have this be infinity because then you can hang the entire call. Is is unreasonable to ask to mlock the model and basically prime i before using it to prevent this 😬 ?

I'm not super excited to accidentally lead to infinitely hanging requests for LocalAI!

@timothycarambat timothycarambat added enhancement New feature or request investigating Core team or maintainer will or is currently looking into this issue and removed possible bug Bug was reported but is not confirmed or is unable to be replicated. labels May 17, 2024
@timothycarambat timothycarambat changed the title [BUG]: Socket Timeout crashes app [Question]: Increase completion timeout to prevent crash? May 17, 2024
@timothycarambat timothycarambat added the bug Something isn't working label May 17, 2024
@timothycarambat
Copy link
Member

Will add that regardless this should not exit the process - so that needs to be patched for sure

@rsdmike
Copy link
Author

rsdmike commented May 17, 2024

Yeah, I'd agree with that. Not too worried about the specific handling for LocalAI, just that the server doesn't crash for anythingllm. I can handle model preloading and such, but when I'm downloading various models and trying them out and loading them on the fly -- just to not have a crash would be good enough.

@timothycarambat
Copy link
Member

This is interesting, i am trying to replicate this right now and i cant get that exact timeout to occur. It is always handled, which have me thinking this exception is being thrown somewhere else that is not being caught. Any exception during streaming would be caught and prevent an outright crash.

Right now I'm having trouble reproducing the exact error so I can locate its full stacktrace and handle it

@timothycarambat timothycarambat added needs info / can't replicate Issues that require additional information and/or cannot currently be replicated, but possible bug and removed bug Something isn't working enhancement New feature or request labels May 17, 2024
@rsdmike
Copy link
Author

rsdmike commented May 17, 2024

I'll see if i can run this locally, in debug mode, and give ya any more info. Also, I was 11 commits behind master, so, lemme grab latest and try again as well.

@rsdmike
Copy link
Author

rsdmike commented May 17, 2024

Not sure if this adds any more info, but using latest, still able to reproduce though. Looks like it originated from agentkeepalive in node_modules.
image
image

I'll keep playing around with this over the weekend. Workaround is easy enough to pre-load the model

@timothycarambat
Copy link
Member

From what i saw in the lockfile, the openai npm module requires that sub-dependency, its just frustrating because i cant determine where we call the library before it aborts so we can handle it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigating Core team or maintainer will or is currently looking into this issue needs info / can't replicate Issues that require additional information and/or cannot currently be replicated, but possible bug
Projects
None yet
Development

No branches or pull requests

3 participants