Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

token_latency parameter not support in the xpu-main branch run_generation.py #556

Closed
oldmikeyang opened this issue Mar 9, 2024 · 5 comments
Assignees
Labels
LLM XPU/GPU XPU/GPU specific issues

Comments

@oldmikeyang
Copy link

Describe the bug

run the test script
intel-extension-for-pytorch/examples/gpu/inference/python/llm/run_benchmark.sh
It will have the following error

Namespace(model_id='/home/llm/disk/llm/meta-llama/Llama-2-7b-hf', sub_model_name='llama2-7b', device='xpu', dtype='float16', input_tokens='1024', max_new_tokens=128, prompt=None, greedy=False, ipex=True, jit=False, profile=False, benchmark=True, lambada=False, dataset='lambada', num_beams=4, num_iter=10, num_warmup=3, batch_size=1, token_latency=True, print_memory=False, disable_optimize_transformers=False, woq=False, calib_dataset='wikitext2', calib_group_size=-1, calib_output_dir='./', calib_checkpoint_name='quantized_weight.pt', calib_nsamples=128, calib_wbits=4, calib_seed=0, woq_checkpoint_path='', accuracy_only=False, acc_tasks=['lambada_standard'])
/home/llm/miniconda3/envs/ipex-gpu-py3.10/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
warn(
Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00, 7.57it/s]
xpu optimize_transformers function is activated
Warning: we didn't find deepspeed package in your environment, all deepspeed related feature will be disabled
tp size less than 2, tensor parallel will be disabled
*** Starting to generate 128 tokens for 1024 tokens with num_beams=4
---- Prompt size: 1024
Traceback (most recent call last):
File "/home/llm/intel-extension-for-pytorch/examples/gpu/inference/python/llm/run_generation.py", line 454, in
run_generate(o, i, g)
File "/home/llm/intel-extension-for-pytorch/examples/gpu/inference/python/llm/run_generation.py", line 390, in run_generate
output = model.generate(
File "/home/llm/miniconda3/envs/ipex-gpu-py3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/llm/miniconda3/envs/ipex-gpu-py3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 1282, in generate
self._validate_model_kwargs(model_kwargs.copy())
File "/home/llm/miniconda3/envs/ipex-gpu-py3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 1155, in _validate_model_kwargs
raise ValueError(
ValueError: The following model_kwargs are not used by the model: ['token_latency'] (note: typos in the generate arguments will also show up in this list)

Versions

intel-extension-for-pytorch==2.1.10+xpu

@YuningQiu
Copy link
Contributor

Hello, thanks for reporting this issue. Could you please share your platform HW information with us?

@YuningQiu
Copy link
Contributor

You can use the collect_env.py script.

@YuningQiu
Copy link
Contributor

Also, could you please put your commands that you run the workload here?

@YuningQiu
Copy link
Contributor

Could you please let us know from which branch or which release you got the sample codes?

@ZhaoqiongZ ZhaoqiongZ added XPU/GPU XPU/GPU specific issues LLM labels Apr 24, 2024
@YuningQiu
Copy link
Contributor

Let me close this issue. Feel free to reopen or create a new issue if you are still facing issues. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LLM XPU/GPU XPU/GPU specific issues
Projects
None yet
Development

No branches or pull requests

3 participants