Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

2D TP+FSDP with device mesh #126548

Closed
ad8e opened this issue May 17, 2024 · 1 comment
Closed

2D TP+FSDP with device mesh #126548

ad8e opened this issue May 17, 2024 · 1 comment
Assignees
Labels
oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (fsdp) release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@ad8e
Copy link
Contributor

ad8e commented May 17, 2024

馃摎 The doc issue

At https://pytorch.org/docs/stable/fsdp.html, there is an undocumented argument device_mesh. This is necessary for DTensor TP, as TP support was added for device_mesh but not for process_group. Attempting to use process_group produces the following error: RuntimeError: Attempted to call resize_() on an invalid python storage.

Suggest a potential alternative/fix

Document device_mesh. For process_group, mention that it will not work with Tensor Parallel.

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k

@wz337 wz337 self-assigned this May 17, 2024
@wz337 wz337 added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (fsdp) release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 17, 2024
@wz337
Copy link
Contributor

wz337 commented May 17, 2024

Thanks for reporting the issue. We will submit a PR to fix this soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (fsdp) release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants