Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RNaD: Possible Error in calculation of Neurd Loss #1156

Closed
spktrm opened this issue Dec 12, 2023 · 7 comments
Closed

RNaD: Possible Error in calculation of Neurd Loss #1156

spktrm opened this issue Dec 12, 2023 · 7 comments
Labels
bug Something isn't working fixed This is fixed internally, and will be merged in the next github sync!

Comments

@spktrm
Copy link
Contributor

spktrm commented Dec 12, 2023

In this line of the RNaD algorithm

logits = logit_pi - jnp.mean(

Should the line instead be this? This is so we only subtract the mean calculated from the valid logits.

logits = logit_pi - (jnp.sum(
        logit_pi * legal_actions, axis=-1, keepdims=True) / jnp.sum(legal_actions, axis=-1, keepdims=True))

As a result, should the line below be an average over actions rather than a sum?


i.e.

nerd_loss = jnp.sum(
        legal_actions *
        apply_force_with_threshold(logits, adv_pi, threshold, threshold_center),
        axis=-1) / jnp.sum(legal_actions, axis=-1)

This is particularly relevant in games where there is frequently a number of invalid actions.

@lanctot
Copy link
Collaborator

lanctot commented Dec 13, 2023

@perolat can you take a look?

@lanctot
Copy link
Collaborator

lanctot commented Dec 13, 2023

Hi @spktrm , I spoke to Julien.

He said you're correct about the first one, can you submit a PR?

The second one could go either way: it's just a matter of knowing what works. It is not clear whether one works better than the other and it might end up being similar behavior but require different hyper-parameters. Maybe you can try it and let us know?

@spktrm
Copy link
Contributor Author

spktrm commented Dec 13, 2023

I have submitted a PR regarding the first point here: #1157, thank you for the opportunity to contribute :).

With regards to the second point, I will experiment further with the fix I am suggesting and let you know how it goes.

Meanwhile, is it possible to provide clarity on these other issues? Namely:

@lanctot
Copy link
Collaborator

lanctot commented Dec 13, 2023

Hi @spktrm,

Yeah I will make Julien aware of those (sorry, I thought they were resolved already).

I think it may be useful to also try contacting him directly by email, though... because I'm mostly just relaying messages from here to him and back :)

@spktrm
Copy link
Contributor Author

spktrm commented Dec 14, 2023

Thank you. What is his best email?

@lanctot
Copy link
Collaborator

lanctot commented Dec 14, 2023

Thank you. What is his best email?

Still the same one from the Mastering Stratego paper.

@lanctot lanctot added bug Something isn't working fixed This is fixed internally, and will be merged in the next github sync! labels Dec 15, 2023
@lanctot
Copy link
Collaborator

lanctot commented Jan 4, 2024

Fixed by #1157, which has now been merged into master.

@lanctot lanctot closed this as completed Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed This is fixed internally, and will be merged in the next github sync!
Projects
None yet
Development

No branches or pull requests

2 participants