triton-inference-server / server Public

Notifications You must be signed in to change notification settings
Fork 1.4k
Star 7.6k

Code
Issues 444
Pull requests 46
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: triton-inference-server/server

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

444 Open 3,101 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Triton Tensorrt-LLM 24.04 and 24.05 are very large

#7335 opened Jun 8, 2024 by yaysummeriscoming

Does Triton Server support Dynamic Request Batching for models which has sparse tensors as inputs

#7333 opened Jun 7, 2024 by MorrisMLZ

Segmentation fault when multi-requsts to triton-vllm

#7332 opened Jun 7, 2024 by tricky61

Segmentation fault (core dumped) - Server version 2.46.0

#7330 opened Jun 6, 2024 by rahchuenmonroe

CUDA runtime API error raised when using only cpu on Mac M3

#7324 opened Jun 5, 2024 by SunXuan90

Building and developing with libtritonserver.so

#7320 opened Jun 4, 2024 by asaff1

Triton Server 24.05 can't detect CUDA drivers if host system has installed Nvidia driver 555.85

#7319 opened Jun 4, 2024 by romanvelichkin

Uneven QPS leads to low throughput and high latency as well as low GPU utilization

#7318 opened Jun 4, 2024 by SunnyGhj

When the request is large, the Triton server has a very high TTFT.

#7316 opened Jun 4, 2024 by Godlovecui

Memory over 100% with decoupled dali video model

#7315 opened Jun 3, 2024 by wq9

Single docker layer is too large

#7314 opened Jun 3, 2024 by ShuaiShao93

Low QPS with momentary traffic surges cause significant increases in inference TP99 latency.

#7313 opened Jun 3, 2024 by a1342772

triton malloc fail

#7308 opened May 31, 2024 by MouseSun846

unexpected datatype TYPE_INT64 for inference input ,expecting TYPE_INT32

#7307 opened May 31, 2024 by CallmeZhangChenchen

Add TT-Metalium as a backend

#7305 opened May 30, 2024 by jvasilje

Why is my model in ensemble receiving out-of-order input

#7303 opened May 30, 2024 by Joenhle

ONNX backend with TensorRT optimizer sometimes fails to start

#7296 opened May 29, 2024 by ShuaiShao93

How does Triton implement one instance to handle multiple requests simultaneously? investigating

The developement team is investigating this issue

#7295 opened May 29, 2024 by SeibertronSS

triton-inference-server cannot be started

#7293 opened May 29, 2024 by tuninger

Backend support for .keras files?

#7289 opened May 28, 2024 by chriscarollo

Support histogram custom metric in Python backend enhancement

New feature or request

#7287 opened May 28, 2024 by ShuaiShao93

What is the correct way to run inference in parallel in Triton?

#7283 opened May 28, 2024 by sandesha-hegde

A Confusion about prefetch performance

A possible performance tune-up

question

Further information is requested

#7282 opened May 28, 2024 by SunnyGhj

Windows 10 docker build Error "Could not locate a complete Visual Studio instance" investigating

The developement team is investigating this issue

#7281 opened May 28, 2024 by jinkilee

Specific structure for ensemble model may causes deadlock

#7280 opened May 28, 2024 by ukus04

Previous 1 2 3 4 5 … 17 18 Next

Previous Next

ProTip! Exclude everything labeled bug with -label:bug.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly