How are multiple API clients handled? #306
geordie2020
started this conversation in
General
Replies: 1 comment 1 reply
-
I have a branch that support continuous batching. For now it will use the vLLM backend, but yes we are working on supporting this for general usecase. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, i'm investigating ways to serve LLMs to a small group of people. Therefore e.g. 10 clients/people would need to connect via a webinterface to the OpenLLM served API, and be able to query the LLM at the same time.
How does OpenLLM handle (potentially concurrent) multiple requests? Is there some queueing with round robin going on?
Or maybe time multiplexing?
Beta Was this translation helpful? Give feedback.
All reactions