[Usage]: why can't I set gpu nums while use "tensor_parallel_size"? #4882

GodHforever · 2024-05-17T08:09:41Z

Your current environment

The output of `python collect_env.py`

How would you like to use vllm

I notice there is a annotation, but why

# Apply batch inference for all input data.
ds = ds.map_batches(
    LLMPredictor,
    # Set the concurrency to the number of LLM instances.
    concurrency=10,
    # Specify the number of GPUs required per LLM instance.
    # NOTE: Do NOT set `num_gpus` when using vLLM with tensor-parallelism
    # (i.e., `tensor_parallel_size`).
    num_gpus=1,
    # Specify the batch size for inference.
    batch_size=32,
)

The text was updated successfully, but these errors were encountered:

GodHforever added the usage How to use vllm label May 17, 2024

GodHforever changed the title ~~[Usage]: why I can not sey gpu nums while use "tensor_parallel_size"?~~ [Usage]: why can't I set gpu nums while use "tensor_parallel_size"? May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: why can't I set gpu nums while use "tensor_parallel_size"? #4882

[Usage]: why can't I set gpu nums while use "tensor_parallel_size"? #4882

GodHforever commented May 17, 2024

[Usage]: why can't I set gpu nums while use "tensor_parallel_size"? #4882

[Usage]: why can't I set gpu nums while use "tensor_parallel_size"? #4882

Comments

GodHforever commented May 17, 2024

Your current environment

How would you like to use vllm