Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: why can't I set gpu nums while use "tensor_parallel_size"? #4882

Open
GodHforever opened this issue May 17, 2024 · 0 comments
Open
Labels
usage How to use vllm

Comments

@GodHforever
Copy link

Your current environment

The output of `python collect_env.py`

How would you like to use vllm

I notice there is a annotation, but why

# Apply batch inference for all input data.
ds = ds.map_batches(
    LLMPredictor,
    # Set the concurrency to the number of LLM instances.
    concurrency=10,
    # Specify the number of GPUs required per LLM instance.
    # NOTE: Do NOT set `num_gpus` when using vLLM with tensor-parallelism
    # (i.e., `tensor_parallel_size`).
    num_gpus=1,
    # Specify the batch size for inference.
    batch_size=32,
)
@GodHforever GodHforever added the usage How to use vllm label May 17, 2024
@GodHforever GodHforever changed the title [Usage]: why I can not sey gpu nums while use "tensor_parallel_size"? [Usage]: why can't I set gpu nums while use "tensor_parallel_size"? May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
Development

No branches or pull requests

1 participant