How are multiple API clients handled? #306

geordie2020 · 2023-09-07T11:03:36Z

geordie2020
Sep 7, 2023

Hi, i'm investigating ways to serve LLMs to a small group of people. Therefore e.g. 10 clients/people would need to connect via a webinterface to the OpenLLM served API, and be able to query the LLM at the same time.

How does OpenLLM handle (potentially concurrent) multiple requests? Is there some queueing with round robin going on?
Or maybe time multiplexing?

aarnphm · 2023-09-07T14:40:28Z

aarnphm
Sep 7, 2023
Maintainer

I have a branch that support continuous batching. For now it will use the vLLM backend, but yes we are working on supporting this for general usecase.

1 reply

aarnphm Sep 7, 2023
Maintainer

Right now we have some sort of naive batching, but it is not efficient enough.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How are multiple API clients handled? #306

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

How are multiple API clients handled? #306

geordie2020 Sep 7, 2023

Replies: 1 comment · 1 reply

aarnphm Sep 7, 2023 Maintainer

aarnphm Sep 7, 2023 Maintainer

geordie2020
Sep 7, 2023

Replies: 1 comment 1 reply

aarnphm
Sep 7, 2023
Maintainer

aarnphm Sep 7, 2023
Maintainer