You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
run the test script
intel-extension-for-pytorch/examples/gpu/inference/python/llm/run_benchmark.sh
It will have the following error
Namespace(model_id='/home/llm/disk/llm/meta-llama/Llama-2-7b-hf', sub_model_name='llama2-7b', device='xpu', dtype='float16', input_tokens='1024', max_new_tokens=128, prompt=None, greedy=False, ipex=True, jit=False, profile=False, benchmark=True, lambada=False, dataset='lambada', num_beams=4, num_iter=10, num_warmup=3, batch_size=1, token_latency=True, print_memory=False, disable_optimize_transformers=False, woq=False, calib_dataset='wikitext2', calib_group_size=-1, calib_output_dir='./', calib_checkpoint_name='quantized_weight.pt', calib_nsamples=128, calib_wbits=4, calib_seed=0, woq_checkpoint_path='', accuracy_only=False, acc_tasks=['lambada_standard'])
/home/llm/miniconda3/envs/ipex-gpu-py3.10/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
warn(
Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00, 7.57it/s]
xpu optimize_transformers function is activated
Warning: we didn't find deepspeed package in your environment, all deepspeed related feature will be disabled
tp size less than 2, tensor parallel will be disabled
*** Starting to generate 128 tokens for 1024 tokens with num_beams=4
---- Prompt size: 1024
Traceback (most recent call last):
File "/home/llm/intel-extension-for-pytorch/examples/gpu/inference/python/llm/run_generation.py", line 454, in
run_generate(o, i, g)
File "/home/llm/intel-extension-for-pytorch/examples/gpu/inference/python/llm/run_generation.py", line 390, in run_generate
output = model.generate(
File "/home/llm/miniconda3/envs/ipex-gpu-py3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/llm/miniconda3/envs/ipex-gpu-py3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 1282, in generate
self._validate_model_kwargs(model_kwargs.copy())
File "/home/llm/miniconda3/envs/ipex-gpu-py3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 1155, in _validate_model_kwargs
raise ValueError( ValueError: The following model_kwargs are not used by the model: ['token_latency'] (note: typos in the generate arguments will also show up in this list)
Versions
intel-extension-for-pytorch==2.1.10+xpu
The text was updated successfully, but these errors were encountered:
Describe the bug
run the test script
intel-extension-for-pytorch/examples/gpu/inference/python/llm/run_benchmark.sh
It will have the following error
Namespace(model_id='/home/llm/disk/llm/meta-llama/Llama-2-7b-hf', sub_model_name='llama2-7b', device='xpu', dtype='float16', input_tokens='1024', max_new_tokens=128, prompt=None, greedy=False, ipex=True, jit=False, profile=False, benchmark=True, lambada=False, dataset='lambada', num_beams=4, num_iter=10, num_warmup=3, batch_size=1, token_latency=True, print_memory=False, disable_optimize_transformers=False, woq=False, calib_dataset='wikitext2', calib_group_size=-1, calib_output_dir='./', calib_checkpoint_name='quantized_weight.pt', calib_nsamples=128, calib_wbits=4, calib_seed=0, woq_checkpoint_path='', accuracy_only=False, acc_tasks=['lambada_standard'])
/home/llm/miniconda3/envs/ipex-gpu-py3.10/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from
torchvision.io
, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you havelibjpeg
orlibpng
installed before buildingtorchvision
from source?warn(
Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00, 7.57it/s]
xpu optimize_transformers function is activated
Warning: we didn't find deepspeed package in your environment, all deepspeed related feature will be disabled
tp size less than 2, tensor parallel will be disabled
*** Starting to generate 128 tokens for 1024 tokens with num_beams=4
---- Prompt size: 1024
Traceback (most recent call last):
File "/home/llm/intel-extension-for-pytorch/examples/gpu/inference/python/llm/run_generation.py", line 454, in
run_generate(o, i, g)
File "/home/llm/intel-extension-for-pytorch/examples/gpu/inference/python/llm/run_generation.py", line 390, in run_generate
output = model.generate(
File "/home/llm/miniconda3/envs/ipex-gpu-py3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/llm/miniconda3/envs/ipex-gpu-py3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 1282, in generate
self._validate_model_kwargs(model_kwargs.copy())
File "/home/llm/miniconda3/envs/ipex-gpu-py3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 1155, in _validate_model_kwargs
raise ValueError(
ValueError: The following
model_kwargs
are not used by the model: ['token_latency'] (note: typos in the generate arguments will also show up in this list)Versions
intel-extension-for-pytorch==2.1.10+xpu
The text was updated successfully, but these errors were encountered: