-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable dumping raw prof files in AdvancedProfiler
#19703
base: master
Are you sure you want to change the base?
Enable dumping raw prof files in AdvancedProfiler
#19703
Conversation
For your consideration, @awaelchli |
cf1947c
to
0c21547
Compare
Are there any concerns about this change, @awaelchli? Thanks! |
Just bumping this one in case it got lost in notifications, @awaelchli. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clumsy Looks great to me. Just a few minor comments
@override | ||
def summary(self) -> str: | ||
recorded_stats = {} | ||
for action_name, pr in self.profiled_actions.items(): | ||
self._maybe_dump_stats(action_name, pr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: I would do the if self.dump_stats
check outside and remove the maybe_
prefix from the name (general consistency in code base)
def _maybe_dump_stats(self, action_name: str, pr: cProfile.Profile) -> None: | ||
if not self.dump_stats: | ||
return | ||
assert self.dirpath # redundant, but needed for mypy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, we have this in several places of the code base. I suggest to remove the comment to be consistent in the code base
assert self.dirpath # redundant, but needed for mypy | |
assert self.dirpath |
@@ -75,10 +83,28 @@ def stop(self, action_name: str) -> None: | |||
raise ValueError(f"Attempting to stop recording an action ({action_name}) which was never started.") | |||
pr.disable() | |||
|
|||
def _maybe_dump_stats(self, action_name: str, pr: cProfile.Profile) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _maybe_dump_stats(self, action_name: str, pr: cProfile.Profile) -> None: | |
def _maybe_dump_stats(self, action_name: str, profile: cProfile.Profile) -> None: |
maybe we can spell out the name of the argument for clarity here :)
dst_fs = get_filesystem(dst_filepath) | ||
dst_fs.mkdirs(self.dirpath, exist_ok=True) | ||
# temporarily save to local since pstats can only dump into a local file | ||
with tempfile.TemporaryDirectory(prefix="test", suffix="test", dir=os.getcwd()) as tmp_dir, dst_fs.open( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd need to include the rank in the prefix. Otherwise we could run into rare race conditions, right?
Raises: | ||
ValueError: | ||
If you attempt to stop recording an action which was never started. | ||
""" | ||
super().__init__(dirpath=dirpath, filename=filename) | ||
self.profiled_actions: Dict[str, cProfile.Profile] = {} | ||
self.line_count_restriction = line_count_restriction | ||
self.dump_stats = dump_stats | ||
assert not self.dump_stats or self.dirpath is not None, "dirname must be provided for dump_states to work" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make this a ValueError rather than an assertion error? That's because the error would be caused by user input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the docs:
If ``dirpath`` is ``None`` but ``filename`` is present, the
``trainer.log_dir`` (from :class:`~lightning.pytorch.loggers.tensorboard.TensorBoardLogger`)
will be used.
So a None check alone is not enough. We'd have to check that either dirname or filename is set.
AdvancedProfiler
What does this PR do?
Adding a new
dump_stats
flag toAdvanedProfiler
to persist.prof
files collected during profiling. These files can be visualized by SnakeViz like so:snakeviz fit-run-[Strategy]DDPStrategy.on_train_end.prof
A
.prof
file is created for each profiled action, e.g.:Fixes #19698
No breaking changes, adding a new backward-compatible flag, with the new functionality disabled by default.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Reviewer checklist
📚 Documentation preview 📚: https://pytorch-lightning--19703.org.readthedocs.build/en/19703/