Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RNaD off policy case #1109

Open
spktrm opened this issue Aug 22, 2023 · 4 comments
Open

RNaD off policy case #1109

spktrm opened this issue Aug 22, 2023 · 4 comments
Assignees

Comments

@spktrm
Copy link
Contributor

spktrm commented Aug 22, 2023

In the example for RNaD, the importance sampling correction for get_loss_nerd is 1. This is because the example provided is the on-policy case, and there are synchronous updates of the policy between acting and learning.

My question is what needs to be changed for this example to be used in an asynchronous off-policy setting? Is it as simple as substituting the importance sampling correction for a policy ratio term? What would this look like exactly?

How could I construct the importance sampling correction for the off-policy case?

@spktrm
Copy link
Contributor Author

spktrm commented Sep 26, 2023

@perolat any ideas?

@spktrm
Copy link
Contributor Author

spktrm commented Jan 29, 2024

@lanctot is there a better channel to get in contact with @perolat - I feel as though he may have missed my email.

@lanctot
Copy link
Collaborator

lanctot commented Jan 29, 2024

I just chatted with him and will send him the currently open questions later today. Is this currently the only unresolved one?

@spktrm
Copy link
Contributor Author

spktrm commented Jan 30, 2024

Hi,

Both this issue and this one: #1075

Keen to hear back :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants