RuntimeError: tensor does not have a device when training PixArt-alpha lora on Arc A770 #552

congdm · 2024-03-04T22:58:05Z

Describe the bug

I was trying to run the lora training script of PixArt-alpha: https://github.com/PixArt-alpha/PixArt-alpha/blob/master/train_scripts/train_pixart_lora_hf.py but got Runtime Error: tensor does not have a device from the C++ backend. I believe this error is from the xpu backend, as when I reconfigured accelerate to use CPU, the training script ran fine without problem.

(C:\Data\PixArt-alpha\env) C:\Data\PixArt-alpha>accelerate launch --num_processes=1 --main_process_port=36667  train_scripts/train_pixart_lora_hf.py --mixed_precision="fp16" --pretrained_model_name_or_path=PixArt-alpha/PixArt-XL-2-1024-MS --dataset_name=Fazzie/Teyvat --caption_column="text" --resolution=1024 --random_flip --train_batch_size=1 --num_train_epochs=200 --checkpointing_steps=100 --learning_rate=1e-06 --lr_scheduler="constant" --lr_warmup_steps=0 --seed=74332 --output_dir="pixart-teyvat-lora" --validation_prompt="teyvat" --report_to="tensorboard" --gradient_checkpointing --checkpoints_total_limit=10 --validation_epochs=5 --rank=16
C:\Data\PixArt-alpha\env\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
 warn(
C:\Data\PixArt-alpha\env\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
 warn(
C:\Data\PixArt-alpha\env\lib\site-packages\torch\cuda\amp\grad_scaler.py:125: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
 warnings.warn(
2024-03-05 01:18:06,412 - __main__ - INFO - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: xpu:0

Mixed precision type: fp16

{'clip_sample', 'rescale_betas_zero_snr', 'clip_sample_range'} was not found in config. Values will be initialized to default values.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Downloading shards: 100%|██████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 1951.75it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:26<00:00, 13.02s/it]
Some weights of the model checkpoint were not used when initializing Transformer2DModel:
['caption_projection.y_embedding']
trainable params: 13,810,432 || all params: 625,159,584 || trainable%: 2.2091050594850996
2024-03-05 01:18:54,890 - __main__ - INFO - ***** Running training *****
2024-03-05 01:18:54,890 - __main__ - INFO -   Num examples = 234
2024-03-05 01:18:54,891 - __main__ - INFO -   Num Epochs = 200
2024-03-05 01:18:54,892 - __main__ - INFO -   Instantaneous batch size per device = 1
2024-03-05 01:18:54,893 - __main__ - INFO -   Total train batch size (w. parallel, distributed & accumulation) = 1
2024-03-05 01:18:54,893 - __main__ - INFO -   Gradient Accumulation steps = 1
2024-03-05 01:18:54,894 - __main__ - INFO -   Total optimization steps = 46800
Steps:   0%|                                                                                 | 0/46800 [00:00<?, ?it/s]Traceback (most recent call last):
 File "C:\Data\PixArt-alpha\train_scripts\train_pixart_lora_hf.py", line 1010, in <module>
   main()
 File "C:\Data\PixArt-alpha\train_scripts\train_pixart_lora_hf.py", line 855, in main
   accelerator.backward(loss)
 File "C:\Data\PixArt-alpha\env\lib\site-packages\accelerate\accelerator.py", line 1964, in backward
   self.scaler.scale(loss).backward(**kwargs)
 File "C:\Data\PixArt-alpha\env\lib\site-packages\torch\_tensor.py", line 492, in backward
   torch.autograd.backward(
 File "C:\Data\PixArt-alpha\env\lib\site-packages\torch\autograd\__init__.py", line 251, in backward
   Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: tensor does not have a device
Steps:   0%|                                                                                 | 0/46800 [00:12<?, ?it/s]
Traceback (most recent call last):
 File "C:\Data\PixArt-alpha\env\lib\runpy.py", line 196, in _run_module_as_main
   return _run_code(code, main_globals, None,
 File "C:\Data\PixArt-alpha\env\lib\runpy.py", line 86, in _run_code
   exec(code, run_globals)
 File "C:\Data\PixArt-alpha\env\Scripts\accelerate.exe\__main__.py", line 7, in <module>
 File "C:\Data\PixArt-alpha\env\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
   args.func(args)
 File "C:\Data\PixArt-alpha\env\lib\site-packages\accelerate\commands\launch.py", line 1023, in launch_command
   simple_launcher(args)
 File "C:\Data\PixArt-alpha\env\lib\site-packages\accelerate\commands\launch.py", line 643, in simple_launcher
   raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Data\\PixArt-alpha\\env\\python.exe', 'train_scripts/train_pixart_lora_hf.py', '--mixed_precision=fp16', '--pretrained_model_name_or_path=PixArt-alpha/PixArt-XL-2-1024-MS', '--dataset_name=Fazzie/Teyvat', '--caption_column=text', '--resolution=1024', '--random_flip', '--train_batch_size=1', '--num_train_epochs=200', '--checkpointing_steps=100', '--learning_rate=1e-06', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--seed=74332', '--output_dir=pixart-teyvat-lora', '--validation_prompt=teyvat', '--report_to=tensorboard', '--gradient_checkpointing', '--checkpoints_total_limit=10', '--validation_epochs=5', '--rank=16']' returned non-zero exit status 1.

Accelerate launch params:

accelerate launch --num_processes=1 --main_process_port=36667  train_scripts/train_pixart_lora_hf.py --mixed_precision="fp16" --pretrained_model_name_or_path=PixArt-alpha/PixArt-XL-2-1024-MS --dataset_name=Fazzie/Teyvat --caption_column="text" --resolution=1024 --random_flip --train_batch_size=1 --num_train_epochs=200 --checkpointing_steps=100 --learning_rate=1e-06 --lr_scheduler="constant" --lr_warmup_steps=0 --seed=74332 --output_dir="pixart-teyvat-lora" --validation_prompt="teyvat" --report_to="tensorboard" --gradient_checkpointing --checkpoints_total_limit=10 --validation_epochs=5 --rank=16

Packages installed aside from torch:

pip install accelerate transformers diffusers tensorboard peft==0.6.2 datasets sentencepiece

Versions

Collecting environment information...
PyTorch version: 2.1.0a0+cxx11.abi
PyTorch CXX11 ABI: No
IPEX version: 2.1.10+xpu
IPEX commit: a12f9f6
Build type: Release

OS: Microsoft Windows 10 IoT Enterprise LTSC
GCC version: N/A
Clang version: N/A
IGC version: 2024.0.2 (2024.0.2.20231213)
CMake version: version 3.28.3
Libc version: N/A

Python version: 3.10.13 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:24:38) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19044-SP0
Is XPU available: True
DPCPP runtime version: N/A
MKL version: N/A
GPU models and configuration:
[0] _DeviceProperties(name='Intel(R) Arc(TM) A770 Graphics', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=0, total_memory=15930MB, max_compute_units=512, gpu_eu_count=512)
Intel OpenCL ICD version: N/A
Level Zero version: N/A

CPU:
Architecture=9
CurrentClockSpeed=2401
DeviceID=CPU0
Family=179
L2CacheSize=2560
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2401
Name=Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz
ProcessorType=3
Revision=20225

Versions of relevant libraries:
[pip3] intel-extension-for-pytorch==2.1.10+xpu
[pip3] numpy==1.26.4
[pip3] torch==2.1.0a0+cxx11.abi
[pip3] torchaudio==2.1.0a0+cxx11.abi
[pip3] torchvision==0.16.0a0+cxx11.abi
[conda] intel-extension-for-pytorch 2.1.10+xpu pypi_0 pypi
[conda] numpy 1.26.4 pypi_0 pypi
[conda] torch 2.1.0a0+cxx11.abi pypi_0 pypi
[conda] torchaudio 2.1.0a0+cxx11.abi pypi_0 pypi
[conda] torchvision 0.16.0a0+cxx11.abi pypi_0 pypi

The text was updated successfully, but these errors were encountered:

YuningQiu · 2024-03-11T19:19:23Z

Thanks for reporting this issue. I will try reproducing this issue on our side and then get back to you soon.

YuningQiu · 2024-03-12T22:30:39Z

Hello! I am trying to reproduce this issue. Could you please let me know the command that you used to install the intel extension for pytorch and the other required packages? Thanks a lot!

congdm · 2024-03-13T02:18:25Z

Hello! I am trying to reproduce this issue. Could you please let me know the command that you used to install the intel extension for pytorch and the other required packages? Thanks a lot!

Hi. First I installed Intel oneAPI base toolkit 2024 for Windows. Then in PixArt-alpha repo folder, for example C:\Data\repos\PixArt-alpha, I created a new conda environment:

conda create --prefix .\env
conda activate .\env
conda install python=3.10
conda install pkg-config libuv --freeze-installed
python -m pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

call "C:\Program Files (x86)\Intel\oneAPI\compiler\2024.0\env\vars.bat"
call "C:\Program Files (x86)\Intel\oneAPI\mkl\2024.0\env\vars.bat"
python -c "import torch; import intel_extension_for_pytorch as ipex; print(torch.__version__); print(ipex.__version__); [print(f'[{i}]: {torch.xpu.get_device_properties(i)}') for i in range(torch.xpu.device_count())];"

After that it's verified that IPEX was installed succesfully. Btw, each time we reopen that environment, it's neccesary to run:

call "C:\Program Files (x86)\Intel\oneAPI\compiler\2024.0\env\vars.bat"
call "C:\Program Files (x86)\Intel\oneAPI\mkl\2024.0\env\vars.bat"

again to run any script depended on IPEX.

Then in that conda environment:

pip install accelerate transformers diffusers tensorboard peft==0.6.2 datasets sentencepiece
accelerate config
accelerate launch --num_processes=1 --main_process_port=36667  train_scripts/train_pixart_lora_hf.py --mixed_precision="fp16" --pretrained_model_name_or_path=PixArt-alpha/PixArt-XL-2-1024-MS --dataset_name=Fazzie/Teyvat --caption_column="text" --resolution=1024 --random_flip --train_batch_size=1 --num_train_epochs=200 --checkpointing_steps=100 --learning_rate=1e-06 --lr_scheduler="constant" --lr_warmup_steps=0 --seed=74332 --output_dir="pixart-teyvat-lora" --validation_prompt="teyvat" --report_to="tensorboard" --gradient_checkpointing --checkpoints_total_limit=10 --validation_epochs=5 --rank=16

Accelerate configuration was as followed:

------------------------------------------------------------------------------------------------------------------------In which compute environment are you running?
This machine
------------------------------------------------------------------------------------------------------------------------Which type of machine are you using?
No distributed training
Do you want to run your training on CPU only (even if a GPU / Apple Silicon / Ascend NPU device is available)? [yes/NO]:NO
Do you want to use XPU plugin to speed up training on XPU? [yes/NO]:yes
Do you wish to optimize your script with torch dynamo?[yes/NO]:NO
Do you want to use DeepSpeed? [yes/NO]: NO
What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:all
------------------------------------------------------------------------------------------------------------------------Do you wish to use FP16 or BF16 (mixed precision)?
fp16

YuningQiu · 2024-04-01T19:15:48Z

Hello, sorry for the late response. Could you please try with the latest oneAPI 2024.1 https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html?operatingsystem=window&distributions=offline?

Also, please install the corresponding GPU drivers and latest intel extension for pytorch at https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu&version=v2.1.20%2bxpu&os=windows&package=pip.

congdm · 2024-04-09T00:46:45Z

Hello, sorry for the late response. Could you please try with the latest oneAPI 2024.1 https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html?operatingsystem=window&distributions=offline?

Also, please install the corresponding GPU drivers and latest intel extension for pytorch at https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu&version=v2.1.20%2bxpu&os=windows&package=pip.

Thank you very much for the remind.
I have done as you said, installed the latest oneAPI 2024.1, IPEX v2.1.20+xpu and the corresponding driver version 31.0.101.5085, however the problem sadly still persists.
May I ask that, had this problem been also replicated, or it only happens with my case?

YuningQiu · 2024-04-16T15:29:10Z

Hello, so far we have not been able to reproduce the exact issue that you meet, but this issue is continually being worked on since it was first reported. We will share status as soon as we have a meaningful update. Thank you for your patience.

YuningQiu · 2024-06-12T16:11:47Z

Let me close this issue. Feel free to reopen or create a new issue if you are still facing issues. Thanks a lot!

YuningQiu self-assigned this Apr 16, 2024

YuningQiu added Bug Something isn't working ARC ARC GPU Windows labels Apr 16, 2024

YuningQiu closed this as completed Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: tensor does not have a device when training PixArt-alpha lora on Arc A770 #552

RuntimeError: tensor does not have a device when training PixArt-alpha lora on Arc A770 #552

congdm commented Mar 4, 2024

YuningQiu commented Mar 11, 2024 •

edited

YuningQiu commented Mar 12, 2024 •

edited

congdm commented Mar 13, 2024

YuningQiu commented Apr 1, 2024

congdm commented Apr 9, 2024

YuningQiu commented Apr 16, 2024

YuningQiu commented Jun 12, 2024

RuntimeError: tensor does not have a device when training PixArt-alpha lora on Arc A770 #552

RuntimeError: tensor does not have a device when training PixArt-alpha lora on Arc A770 #552

Comments

congdm commented Mar 4, 2024

Describe the bug

Versions

YuningQiu commented Mar 11, 2024 • edited

YuningQiu commented Mar 12, 2024 • edited

congdm commented Mar 13, 2024

YuningQiu commented Apr 1, 2024

congdm commented Apr 9, 2024

YuningQiu commented Apr 16, 2024

YuningQiu commented Jun 12, 2024

YuningQiu commented Mar 11, 2024 •

edited

YuningQiu commented Mar 12, 2024 •

edited