Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Misc]: can vllm support long content inference like 800k #4909

Closed
yunll opened this issue May 19, 2024 · 1 comment
Closed

[Misc]: can vllm support long content inference like 800k #4909

yunll opened this issue May 19, 2024 · 1 comment
Labels

Comments

@yunll
Copy link

yunll commented May 19, 2024

Anything you want to discuss about vllm.

we will finetune a 70B model that support long content with 800k, can vllm support to inference this model?

@yunll yunll added the misc label May 19, 2024
@mgoin
Copy link
Collaborator

mgoin commented May 20, 2024

Hi @yunll, yes if you have enough GPU memory available for a context length that large it will run. I tested Mistral 128k last week and was able to use its full length. Model for reference: https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k

@mgoin mgoin closed this as completed May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants