token_latency parameter not support in the xpu-main branch run_generation.py #556

oldmikeyang · 2024-03-09T09:14:03Z

Describe the bug

run the test script
intel-extension-for-pytorch/examples/gpu/inference/python/llm/run_benchmark.sh
It will have the following error

Namespace(model_id='/home/llm/disk/llm/meta-llama/Llama-2-7b-hf', sub_model_name='llama2-7b', device='xpu', dtype='float16', input_tokens='1024', max_new_tokens=128, prompt=None, greedy=False, ipex=True, jit=False, profile=False, benchmark=True, lambada=False, dataset='lambada', num_beams=4, num_iter=10, num_warmup=3, batch_size=1, token_latency=True, print_memory=False, disable_optimize_transformers=False, woq=False, calib_dataset='wikitext2', calib_group_size=-1, calib_output_dir='./', calib_checkpoint_name='quantized_weight.pt', calib_nsamples=128, calib_wbits=4, calib_seed=0, woq_checkpoint_path='', accuracy_only=False, acc_tasks=['lambada_standard'])
/home/llm/miniconda3/envs/ipex-gpu-py3.10/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
warn(
Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00, 7.57it/s]
xpu optimize_transformers function is activated
Warning: we didn't find deepspeed package in your environment, all deepspeed related feature will be disabled
tp size less than 2, tensor parallel will be disabled
*** Starting to generate 128 tokens for 1024 tokens with num_beams=4
---- Prompt size: 1024
Traceback (most recent call last):
File "/home/llm/intel-extension-for-pytorch/examples/gpu/inference/python/llm/run_generation.py", line 454, in
run_generate(o, i, g)
File "/home/llm/intel-extension-for-pytorch/examples/gpu/inference/python/llm/run_generation.py", line 390, in run_generate
output = model.generate(
File "/home/llm/miniconda3/envs/ipex-gpu-py3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/llm/miniconda3/envs/ipex-gpu-py3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 1282, in generate
self._validate_model_kwargs(model_kwargs.copy())
File "/home/llm/miniconda3/envs/ipex-gpu-py3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 1155, in _validate_model_kwargs
raise ValueError(
ValueError: The following model_kwargs are not used by the model: ['token_latency'] (note: typos in the generate arguments will also show up in this list)

Versions

intel-extension-for-pytorch==2.1.10+xpu

The text was updated successfully, but these errors were encountered:

YuningQiu · 2024-03-11T19:21:07Z

Hello, thanks for reporting this issue. Could you please share your platform HW information with us?

YuningQiu · 2024-03-11T20:05:16Z

You can use the collect_env.py script.

YuningQiu · 2024-03-11T22:02:39Z

Also, could you please put your commands that you run the workload here?

YuningQiu · 2024-03-12T18:27:14Z

Could you please let us know from which branch or which release you got the sample codes?

YuningQiu · 2024-06-12T20:13:47Z

Let me close this issue. Feel free to reopen or create a new issue if you are still facing issues. Thanks a lot!

ZhaoqiongZ added XPU/GPU XPU/GPU specific issues LLM labels Apr 24, 2024

jingxu10 assigned YuningQiu May 22, 2024

YuningQiu closed this as completed Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

token_latency parameter not support in the xpu-main branch run_generation.py #556

token_latency parameter not support in the xpu-main branch run_generation.py #556

oldmikeyang commented Mar 9, 2024

YuningQiu commented Mar 11, 2024

YuningQiu commented Mar 11, 2024

YuningQiu commented Mar 11, 2024

YuningQiu commented Mar 12, 2024

YuningQiu commented Jun 12, 2024

token_latency parameter not support in the xpu-main branch run_generation.py #556

token_latency parameter not support in the xpu-main branch run_generation.py #556

Comments

oldmikeyang commented Mar 9, 2024

Describe the bug

Versions

YuningQiu commented Mar 11, 2024

YuningQiu commented Mar 11, 2024

YuningQiu commented Mar 11, 2024

YuningQiu commented Mar 12, 2024

YuningQiu commented Jun 12, 2024