Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RNaD reward transformation #1075

Open
spktrm opened this issue May 25, 2023 · 7 comments
Open

RNaD reward transformation #1075

spktrm opened this issue May 25, 2023 · 7 comments
Assignees

Comments

@spktrm
Copy link
Contributor

spktrm commented May 25, 2023

Based on formulae from the paper, the reward transformation is given by adding the log policy ratio

image

However, the code contains an entropy term instead.

https://github.com/deepmind/open_spiel/blob/db0f4a78b1fd0bee0263d46d62fb4d693897329e/open_spiel/python/algorithms/rnad/rnad.py#L422

Which one is it?

@lanctot
Copy link
Collaborator

lanctot commented Jun 1, 2023

@perolat, @bartdevylder: any ideas?

@bartdevylder
Copy link
Collaborator

Hi,
Thanks for your question. The merged_log_policy term in the line you posted actually already contains the log policy ratio. It is defined here: https://github.com/deepmind/open_spiel/blob/db0f4a78b1fd0bee0263d46d62fb4d693897329e/open_spiel/python/algorithms/rnad/rnad.py#L801
taking into account the interpolation between the two regularization policies.

@spktrm
Copy link
Contributor Author

spktrm commented Jun 13, 2023

Hi

Thank you for your reply. I understand this already. I want to understand why the merged_log_policy is multiplied by the policy in the code when this is not communicated in the paper.

@bartdevylder
Copy link
Collaborator

Hi,
ok now I see your point. The eta_log_policy variable corresponds to the regularisation described in the paper, but the meaning of eta_reg_entropy is not so clear. @perolat will look into this to clarify

@spktrm
Copy link
Contributor Author

spktrm commented Jun 23, 2023

@perolat any updates on this?

@sbl1996
Copy link

sbl1996 commented Apr 15, 2024

@spktrm Do you know the reason? Thanks.

@spktrm
Copy link
Contributor Author

spktrm commented Apr 15, 2024

@spktrm Do you know the reason? Thanks.

Nope, unfortunately. Waiting for @perolat or related to clarify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants