-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using XPU training actually increases the training time and sharply reduces the accuracy. #565
Comments
Hi @SoldierWz, thanks for reporting this issue. Could you please help provide the env info via the following commands:
Besides, please also provide us the minimum code reproducer for this issue training on A770 as well as the dataset? Thanks. |
If possible, please also provide the torch profiler info so that we can know the breakdown and what ops take most time. |
This is environmental information OS: Ubuntu 22.04.4 LTS (x86_64) Python version: 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:53:32) [GCC 12.3.0] (64-bit runtime) CPU: Versions of relevant libraries:
class MLP(nn.Module):
X_processed = preprocessor.fit_transform(X) kf = KFold(n_splits=20, shuffle=True, random_state=42)
I think these are enough to run. |
I want to provide information, but a new problem occurred when I was running it today. When I started the computer today, I found that the oneAPI suite seemed to be updated. After I updated it, I ran my code and the following problem occurred.ImportError Traceback (most recent call last) File ~/mambaforge/envs/pytorch-arc/lib/python3.11/site-packages/torch/init.py:235 ImportError: /home/wangzhen/mambaforge/envs/pytorch-arc/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent |
@SoldierWz XPU seems not be detected by IPEX in your environment. IPEX 2.1.10+xpu works with oneAPI 2024.0, please help downgrade the dpcpp and mkl version via |
No, the GPU could be detected before, and I successfully ran the code on the XPU. I think the unavailability of the GPU this time was caused by the oneAPI update this morning. I'll try downgrading, thanks. |
Hi @SoldierWz - did downgrading work? I literally just got my arc a770 today and spun it up on Ubuntu 23 and faced the same issue you did. Thanks beforehand. |
@SoldierWz @Shr1ftyy |
@SoldierWz BTW, the dataset you shared seems not successfully uploaded. Please help check. |
There is no response on this issue for over 1 month. Close the issue now. If you still have issues, feel free to reopen it. Thanks. |
Describe the issue
When I wanted to try using a graphics card to train my classification model I made changes to the following code
device = 'xpu'
X_tensor = torch.tensor(X_processed, dtype=torch.float).to(device)
y_tensor = torch.tensor(y, dtype=torch.long).to(device)
model = FCN(X_train.shape[1], len(np.unique(y))).to(device)
model, optimizer = ipex.optimize(model, optimizer=optimizer)
features, labels = features.to(device), labels.to(device)
X_test = X_test.to(device)
What I am doing is a prediction and classification task on a small data set. The sample size is only a few hundred.I know this task is not suitable for running on GPU but I just tried it. I am using A770 graphics card and the processor is 12400. I have successfully installed all the necessary according to the tutorial.
My training time using CPU was 24 seconds with an accuracy of 0.94
But when I sent all the data to the XPU, the same training took 1 minute and 40 seconds and the accuracy was only 0.34
This is not an important issue, but I still want to report this unusual phenomenon.
The text was updated successfully, but these errors were encountered: