[Question]: Increase completion timeout to prevent crash? #1417

rsdmike · 2024-05-16T06:38:07Z

How are you running AnythingLLM?

Docker (local)

What happened?

Using LocalAI for backend -- loading the llama3 70b model, Anything LLM container crashed with a socket timeout.

2024-05-15 22:24:20 [TELEMETRY SENT] {
2024-05-15 22:24:20   event: 'sent_chat',
2024-05-15 22:24:20   distinctId: 'be3ac3d9-aa83-4458-ae1a-583a3fcc909b',
2024-05-15 22:24:20   properties: {
2024-05-15 22:24:20     multiUserMode: false,
2024-05-15 22:24:20     LLMSelection: 'localai',
2024-05-15 22:24:20     Embedder: 'openai',
2024-05-15 22:24:20     VectorDbSelection: 'lancedb',
2024-05-15 22:24:20     runtime: 'docker'
2024-05-15 22:24:20   }
2024-05-15 22:24:20 }
2024-05-15 22:24:20 [Event Logged] - sent_chat
2024-05-15 23:04:01 Cannonball results 3511 -> 470 tokens.
2024-05-15 23:04:01 Cannonball results 356 -> 286 tokens.
2024-05-15 23:04:53 [TELEMETRY SENT] {
2024-05-15 23:04:53   event: 'sent_chat',
2024-05-15 23:04:53   distinctId: 'be3ac3d9-aa83-4458-ae1a-583a3fcc909b',
2024-05-15 23:04:53   properties: {
2024-05-15 23:04:53     multiUserMode: false,
2024-05-15 23:04:53     LLMSelection: 'localai',
2024-05-15 23:04:53     Embedder: 'openai',
2024-05-15 23:04:53     VectorDbSelection: 'lancedb',
2024-05-15 23:04:53     runtime: 'docker'
2024-05-15 23:04:53   }
2024-05-15 23:04:53 }
2024-05-15 23:04:53 [Event Logged] - sent_chat
2024-05-15 23:25:00 node:internal/process/promises:288
2024-05-15 23:25:00             triggerUncaughtException(err, true /* fromPromise */);
2024-05-15 23:25:00             ^
2024-05-15 23:25:00 
2024-05-15 23:25:00 Error: Socket timeout
2024-05-15 23:25:00     at Socket.onTimeout (/app/server/node_modules/agentkeepalive/lib/agent.js:350:23)
2024-05-15 23:25:00     at Socket.emit (node:events:529:35)
2024-05-15 23:25:00     at Socket._onTimeout (node:net:598:8)
2024-05-15 23:25:00     at listOnTimeout (node:internal/timers:569:17)
2024-05-15 23:25:00     at process.processTimers (node:internal/timers:512:7) {
2024-05-15 23:25:00   code: 'ERR_SOCKET_TIMEOUT',
2024-05-15 23:25:00   timeout: 601000
2024-05-15 23:25:00 }
2024-05-15 23:25:00 
2024-05-15 23:25:00 Node.js v18.19.1

A timeout I think is fine , it takes a while to load. However I didn't expect the container to crash, I kind of expected just to re-initiate the thread.

Are there known steps to reproduce?

I think this should be reproducable with any load time of greater than 10 minutes.

The text was updated successfully, but these errors were encountered:

timothycarambat · 2024-05-16T15:50:49Z

Did this timeout occur while your LLM was responding or while you had a session open but had not yet sent a chat to the LLM? If you LLM is taking 10 minutes to reply that is kind of an insane latency, but yes it should not crash the server

rsdmike · 2024-05-17T07:10:24Z

Kind of in-between. Continuing a previous chat session/thread, restarted back-end (LocalAI), the first chat message to LocalAI loads the model into memory -- so its not responding yet to messages, and yes on CPU it takes a while to load into RAM for 70b - but subtle difference being that its not inferencing yet. Take a look at my screenshot here to see the events, by the time the model has loaded, the AnythingLLM server has told me it has better than things to do than wait (crashed) 😆 .

Should mention, after its loaded, its works good no issue.

Thanks for getting back to me 👍

timothycarambat · 2024-05-17T17:55:50Z

Ah, so its just the model taking a long time to load the request moves on. The 10 minutes is no coincidence either. For LocalAI we use openai's NPM package which has a 10-minute timeout

I would be nervous to have this be infinity because then you can hang the entire call. Is is unreasonable to ask to mlock the model and basically prime i before using it to prevent this 😬 ?

I'm not super excited to accidentally lead to infinitely hanging requests for LocalAI!

timothycarambat · 2024-05-17T17:57:15Z

Will add that regardless this should not exit the process - so that needs to be patched for sure

rsdmike · 2024-05-17T21:01:43Z

Yeah, I'd agree with that. Not too worried about the specific handling for LocalAI, just that the server doesn't crash for anythingllm. I can handle model preloading and such, but when I'm downloading various models and trying them out and loading them on the fly -- just to not have a crash would be good enough.

timothycarambat · 2024-05-17T21:23:17Z

This is interesting, i am trying to replicate this right now and i cant get that exact timeout to occur. It is always handled, which have me thinking this exception is being thrown somewhere else that is not being caught. Any exception during streaming would be caught and prevent an outright crash.

Right now I'm having trouble reproducing the exact error so I can locate its full stacktrace and handle it

rsdmike · 2024-05-17T22:07:47Z

I'll see if i can run this locally, in debug mode, and give ya any more info. Also, I was 11 commits behind master, so, lemme grab latest and try again as well.

rsdmike · 2024-05-17T22:36:38Z

Not sure if this adds any more info, but using latest, still able to reproduce though. Looks like it originated from agentkeepalive in node_modules.

I'll keep playing around with this over the weekend. Workaround is easy enough to pre-load the model

timothycarambat · 2024-05-17T23:24:23Z

From what i saw in the lockfile, the openai npm module requires that sub-dependency, its just frustrating because i cant determine where we call the library before it aborts so we can handle it!

rsdmike added the possible bug Bug was reported but is not confirmed or is unable to be replicated. label May 16, 2024

timothycarambat mentioned this issue May 17, 2024

Patch WSS upgrade for manual HTTPS certs #1429

Merged

10 tasks

timothycarambat added enhancement New feature or request investigating Core team or maintainer will or is currently looking into this issue and removed possible bug Bug was reported but is not confirmed or is unable to be replicated. labels May 17, 2024

timothycarambat changed the title ~~[BUG]: Socket Timeout crashes app~~ [Question]: Increase completion timeout to prevent crash? May 17, 2024

timothycarambat added the bug Something isn't working label May 17, 2024

timothycarambat assigned shatfield4 May 17, 2024

timothycarambat closed this as completed in #1429 May 17, 2024

timothycarambat reopened this May 17, 2024

timothycarambat added needs info / can't replicate Issues that require additional information and/or cannot currently be replicated, but possible bug and removed bug Something isn't working enhancement New feature or request labels May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Increase completion timeout to prevent crash? #1417

[Question]: Increase completion timeout to prevent crash? #1417

rsdmike commented May 16, 2024 •

edited

timothycarambat commented May 16, 2024

rsdmike commented May 17, 2024 •

edited

timothycarambat commented May 17, 2024

timothycarambat commented May 17, 2024

rsdmike commented May 17, 2024

timothycarambat commented May 17, 2024

rsdmike commented May 17, 2024

rsdmike commented May 17, 2024 •

edited

timothycarambat commented May 17, 2024

[Question]: Increase completion timeout to prevent crash? #1417

[Question]: Increase completion timeout to prevent crash? #1417

Comments

rsdmike commented May 16, 2024 • edited

How are you running AnythingLLM?

What happened?

Are there known steps to reproduce?

timothycarambat commented May 16, 2024

rsdmike commented May 17, 2024 • edited

timothycarambat commented May 17, 2024

timothycarambat commented May 17, 2024

rsdmike commented May 17, 2024

timothycarambat commented May 17, 2024

rsdmike commented May 17, 2024

rsdmike commented May 17, 2024 • edited

timothycarambat commented May 17, 2024

rsdmike commented May 16, 2024 •

edited

rsdmike commented May 17, 2024 •

edited

rsdmike commented May 17, 2024 •

edited