Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resuming from a checkpoint that ended before the epoch ended #160

Open
rkskekzzz opened this issue May 9, 2022 · 2 comments
Open

Resuming from a checkpoint that ended before the epoch ended #160

rkskekzzz opened this issue May 9, 2022 · 2 comments
Assignees
Labels
QUESTION Further information is requested

Comments

@rkskekzzz
Copy link

❓ Questions & Help

epoch 한번이 끝나기 전 런타임이 끊겼는데요. 혹시 이런 상황에선 checkpoint를 불러오는 것이 불가능할까요
아래와 같은 메세지가 떴습니다!

Details

UserWarning: You're resuming from a checkpoint that ended before the epoch ended. This can cause unreliable results if further training is done. Consider using an end-of-epoch checkpoint or enabling fault-tolerant training: https://pytorch-lightning.readthedocs.io/en/stable/advanced/fault_tolerant_training.html
  "You're resuming from a checkpoint that ended before the epoch ended. This can cause unreliable"
@upskyy upskyy self-assigned this May 10, 2022
@upskyy upskyy added the QUESTION Further information is requested label May 10, 2022
@upskyy
Copy link
Member

upskyy commented May 10, 2022

@rkskekzzz 혹시 저장된 checkpoint가 있다면 해당 옵션으로 다시 학습 진행하시면 됩니다!

https://github.com/openspeech-team/openspeech/blob/main/openspeech/utils.py#L325-L339

@rkskekzzz
Copy link
Author

답변 감사합니다! 저도 해당 옵션으로 실행을 해보았었는데요!
gpu-resume에 trainer.checkpoint_path를 지정해둔 상태였고, 총 370000 데이터 중 220000정도에서 끊겼습니다.
epoch은 0번째 였습니다.

0_220000.ckpt파일을 불러 학습을 진행했는데, 위와같은 warning이 뜨더니
다시 0/370000에서 시작했습니다. 표시는 이렇게 되더라도 정상적으로 학습이 진행되는 것인지 궁금해서 올려보았습니다!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
QUESTION Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants