-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage is low #626
Comments
What's the problem of "faster"? The memory consumption is about the host or the device? It is weird if it is for the device. Can you share your training script? |
Sorry for the confusion. My code used a very simple transformer encode layer and some linear layers. " I use Xeon 6146 with 24 Cores to train it originally but it takes very long time. I was hoping that using A770 GPU it could be much faster (5X+). But it turns out the speed is not "faster" at all. As for resource usage, with "intel_gpu_top", the "Blitter" usage by "python3" process is like at 10%, which I feel is not correct. |
May I ask why do you have to use Fedora38 ? |
I don't have to use Fedora38. But installed it since it's compatible with RHEL. I have to pick one among RHEL, SUSE, CentOS. |
In case it's helpful, here is the memory summary:
|
I tried aliyun os and it's OK to run oneAPI and IPEX. I think aliyun os is CentOS |
You could try running your code in Intel VTune to see the CPU/GPU compute+memory usage and find possible bottlenecks. |
Thanks. I'll try. |
Oh. I use my own machine for this task. My problem is not that it doesn't run. But the results are suspicious. |
We have not verified on Fedora38. |
Describe the issue
After 2 weeks struggle, finally I got My A770 working on Fedora38.
But it seems the training is barely faster than my 24 cpu machine.
I tried to increase batch size, but the memory consumption kept same at 1.7g. Is it expected?
What could I do to improve the training performance and to increase the memory usage?
The text was updated successfully, but these errors were encountered: