Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid calling torch.cuda.synchronize in precompile_config #126624

Open
ezyang opened this issue May 18, 2024 · 0 comments
Open

Avoid calling torch.cuda.synchronize in precompile_config #126624

ezyang opened this issue May 18, 2024 · 0 comments
Labels
module: inductor oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@ezyang
Copy link
Contributor

ezyang commented May 18, 2024

馃悰 Describe the bug

Internal xref: https://fb.workplace.com/groups/6829516587176185/posts/7228787720582401/?comment_id=7231763463618160&reply_comment_id=7235527919908381

I was debugging a deadlock and I noticed one of our threads was deadlocked on this stack:

[trainer1|1]:  File "/mnt/xarfuse/uid-236622/e595b196-seed-nspid4026531836_cgpid37271928-ns-4026531841/torch/cuda/__init__.py", line 803 in synchronize
[trainer1|1]:  File "/mnt/xarfuse/uid-236622/e595b196-seed-nspid4026531836_cgpid37271928-ns-4026531841/torch/_inductor/runtime/triton_heuristics.py", line 422 in _precompile_config
[trainer1|1]:  File "/mnt/xarfuse/uid-236622/e595b196-seed-nspid4026531836_cgpid37271928-ns-4026531841/torch/_inductor/runtime/triton_heuristics.py", line 231 in precompile
[trainer1|1]:  File "/mnt/xarfuse/uid-236622/e595b196-seed-nspid4026531836_cgpid37271928-ns-4026531841/torch/_inductor/codecache.py", line 3087 in triton
[trainer1|1]:  File "/tmp/torchinductor_shuaiyang/u6/cu6smvlhusvxdug2bu7lrz3zofdzwkt27gnjniqk2ti6u5jgvm4h.py", line 33 in <module>
[trainer1|1]:  File "/mnt/xarfuse/uid-236622/e595b196-seed-nspid4026531836_cgpid37271928-ns-4026531841/torch/_inductor/runtime/compile_tasks.py", line 44 in _reload_python_module
[trainer1|1]:  File "/mnt/xarfuse/uid-236622/e595b196-seed-nspid4026531836_cgpid37271928-ns-4026531841/torch/_inductor/codecache.py", line 2576 in load_by_key_path
[trainer1|1]:  File "/mnt/xarfuse/uid-236622/e595b196-seed-nspid4026531836_cgpid37271928-ns-4026531841/torch/_inductor/graph.py", line 1680 in compile_to_module
[trainer1|1]:  File "/mnt/xarfuse/uid-236622/e595b196-seed-nspid4026531836_cgpid37271928-ns-4026531841/torch/_dynamo/utils.py", line 273 in time_wrapper

The deadlock doesn't technically have anything to do with this synchronize; the real problem is that this rank issued an all_to_all earlier and it has deadlocked. But why are we stuck here? Well, we've asked for a full synchronize, so of course we have to wait for all the comms to finish. This seems... bad for compile time? Like, if we've issued comms, there's no reason to wait for them to all finish before we can run some Triton tuning?!

Versions

main

cc @msaroufim @bdhirsh @anijain2305 @chauhang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire

@xmfan xmfan added module: inductor triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: inductor oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

2 participants