-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: tensor does not have a device when training PixArt-alpha lora on Arc A770 #552
Comments
Thanks for reporting this issue. I will try reproducing this issue on our side and then get back to you soon. |
Hello! I am trying to reproduce this issue. Could you please let me know the command that you used to install the intel extension for pytorch and the other required packages? Thanks a lot! |
Hi. First I installed Intel oneAPI base toolkit 2024 for Windows. Then in PixArt-alpha repo folder, for example C:\Data\repos\PixArt-alpha, I created a new conda environment:
After that it's verified that IPEX was installed succesfully. Btw, each time we reopen that environment, it's neccesary to run:
again to run any script depended on IPEX. Then in that conda environment:
Accelerate configuration was as followed:
|
Hello, sorry for the late response. Could you please try with the latest oneAPI 2024.1 https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html?operatingsystem=window&distributions=offline? Also, please install the corresponding GPU drivers and latest intel extension for pytorch at https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu&version=v2.1.20%2bxpu&os=windows&package=pip. |
Thank you very much for the remind. |
Hello, so far we have not been able to reproduce the exact issue that you meet, but this issue is continually being worked on since it was first reported. We will share status as soon as we have a meaningful update. Thank you for your patience. |
Let me close this issue. Feel free to reopen or create a new issue if you are still facing issues. Thanks a lot! |
Describe the bug
I was trying to run the lora training script of PixArt-alpha: https://github.com/PixArt-alpha/PixArt-alpha/blob/master/train_scripts/train_pixart_lora_hf.py but got Runtime Error: tensor does not have a device from the C++ backend. I believe this error is from the xpu backend, as when I reconfigured accelerate to use CPU, the training script ran fine without problem.
Accelerate launch params:
Packages installed aside from torch:
Versions
Collecting environment information...
PyTorch version: 2.1.0a0+cxx11.abi
PyTorch CXX11 ABI: No
IPEX version: 2.1.10+xpu
IPEX commit: a12f9f6
Build type: Release
OS: Microsoft Windows 10 IoT Enterprise LTSC
GCC version: N/A
Clang version: N/A
IGC version: 2024.0.2 (2024.0.2.20231213)
CMake version: version 3.28.3
Libc version: N/A
Python version: 3.10.13 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:24:38) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19044-SP0
Is XPU available: True
DPCPP runtime version: N/A
MKL version: N/A
GPU models and configuration:
[0] _DeviceProperties(name='Intel(R) Arc(TM) A770 Graphics', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=0, total_memory=15930MB, max_compute_units=512, gpu_eu_count=512)
Intel OpenCL ICD version: N/A
Level Zero version: N/A
CPU:
Architecture=9
CurrentClockSpeed=2401
DeviceID=CPU0
Family=179
L2CacheSize=2560
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2401
Name=Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz
ProcessorType=3
Revision=20225
Versions of relevant libraries:
[pip3] intel-extension-for-pytorch==2.1.10+xpu
[pip3] numpy==1.26.4
[pip3] torch==2.1.0a0+cxx11.abi
[pip3] torchaudio==2.1.0a0+cxx11.abi
[pip3] torchvision==0.16.0a0+cxx11.abi
[conda] intel-extension-for-pytorch 2.1.10+xpu pypi_0 pypi
[conda] numpy 1.26.4 pypi_0 pypi
[conda] torch 2.1.0a0+cxx11.abi pypi_0 pypi
[conda] torchaudio 2.1.0a0+cxx11.abi pypi_0 pypi
[conda] torchvision 0.16.0a0+cxx11.abi pypi_0 pypi
The text was updated successfully, but these errors were encountered: