Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPO and selfplay #1193

Open
drblallo opened this issue Mar 29, 2024 · 1 comment
Open

PPO and selfplay #1193

drblallo opened this issue Mar 29, 2024 · 1 comment

Comments

@drblallo
Copy link

drblallo commented Mar 29, 2024

I am trying to use PPO (which worked wonderfully out of the box, thank you very much for it) to learn a game that allows the same player to take multiple actions in a row, and depending on which action is performed, the next action may belong to a player or another.

It is not clear how to do so because a agent takes both the quantity of envs and the index, so if i start 5 envs for a game with two PPO agents and then after a action the turn in two of the envs belongs to a different player, i don't have enough envs to pass to any agent.

I have looked around the repo but i have not found any hint that this problem is already solved by some other mechanism.
From what i understand the alternatives to solve it are:

  • Have more envs than NumAgents * NumEnvsPerPlayer so that at least one of the agents always has enough environments to be able to run. It is not clear to me if this would be a issue for PPO, since it would not see the game evolve in order.
  • Have a fake action that both players can always execute that does nothing.
  • Have a single agent that plays both sides, but that is not possible because the agent takes the player ID.

Do you have any suggestion about which is the correct way of addressing this issue?

Thank you in advance.

@lanctot
Copy link
Collaborator

lanctot commented Mar 30, 2024

Hi @drblallo,

I don't really understand the question, sorry.

But please note that the PPO implementation only supports the single-agent case:

Currently only supports the single-agent case.

It was added for a specific use case and was never extended to the multiagent case.

So it has only been used and tested on single-agent settings like Atari or asca best response oracle.

I suspect this addresses your question: basically this code does not address the situation you describe since it was designed for the single agent setting.

Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants