Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert quoridor movement action IDs to relative #1229

Closed
wants to merge 4 commits into from

Conversation

tacertain
Copy link
Contributor

@tacertain tacertain commented May 15, 2024

The original implementation of Quoridor used absolute position numbering for pawn moves, so moving to b3 was always the same action ID, regardless of where the pawn was. This commit changes to using relative action IDs, so moving directly north is always the same action ID, regardless of what square that is moving to.

Fixes #1158

The original implementation of Quoridor used absolute position numbering
for pawn moves, so moving to b3 was always the same action ID, regardless
of where the pawn was. This commit changes to using relative action IDs,
so moving directly north is always the same action ID, regardless of
what square that is moving to.
@tacertain
Copy link
Contributor Author

This commit fails the playthrough testing because the action IDs have changed. Still need to update the test scripts.

After changing the action IDs in quoridor to relative positions, the
playthrough files needed updating. This commit updates them.
@tacertain
Copy link
Contributor Author

I think all the tests should now pass and this is ready to be reviewed.

@tacertain
Copy link
Contributor Author

The build failures don't seem to be because of my changes, but let me know if that's not the case.

Copy link
Collaborator

@lanctot lanctot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to say, I'm quite confused. It doesn't help that I am not familiar with the game.

I'll have to ask a round of questions to help me understand. Here is the first:

Can you explain what the OpenSpiel action range is (i.e. how the action ids in 0 to Game::NumDistinctActions() map to the game-specific actions):

  • How many unique actions correspond to movement actions? Are they separated from wall moves? If not, why not?
  • What are the movement action IDs (which integers in the range correspond to movement actions)
  • What are the wall moves? (which integers in the range correspond to wall moves)

* This meant that moving "north" had a different ID for every pawn position.
* Now action IDs are encoded in the same virtual space as absolute board
* positions, but they indicate the pawn's relative move as if it were in
* square (1,1). So when we get those action IDs in, we need to convert them
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean (2,2) here?

Why is this (1,1) when base_for_relative_ is (2,2)?

@@ -261,7 +281,7 @@ void QuoridorState::AddActions(Move cur, Offset offset,
Move forward = cur + offset * 2;
if (GetPlayer(forward) == kPlayerNone) {
// Normal single step in this direction.
moves->push_back(forward.xy);
moves->push_back((base_for_relative_ + offset * 2).xy);
Copy link
Collaborator

@lanctot lanctot May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the * 2 on the offset here?

Is it to keep the coordinates even ? (otherwise they get interpreted as wall moves?)

@lanctot
Copy link
Collaborator

lanctot commented May 18, 2024

The build failures don't seem to be because of my changes, but let me know if that's not the case.

Yes I think you're right but I've rerun them twice and they're still happening. Same error.

You might have to do another push (e.g. add a blank line somewhere, or take one out and then commit + push again... I can always remove it when it gets imported) because it seems to be failing some version check which is likely attached to the specific time you did the last push. I've had to play these games with Github Actions before...

@lanctot
Copy link
Collaborator

lanctot commented May 18, 2024

Let me add another question about the actions: is the range compact? (i.e. are there any integers on the range of 0 .. Game::NumDistinctActions() that are never used in any single valid state?)

E.g. if there were 8 movement actions then I'd expect they'd correspond to action ids 0 - 7 and then all wall actions would be from 8 to Game::NumDistinctActions() - 1. Is that the case here?

@tacertain
Copy link
Contributor Author

tacertain commented May 18, 2024

I have five minutes right now, so I'll write something quick and then more later. The action space is not compact.

If the board is NxN, then the action ID space (before I made any changes) was (2N-1)x(2N-1). Action IDs were mapped into this 2D space (by the obvious mapping) and then interpreted as follows: If both the x and y values are even then it's indicates the pawn position x/2, y/2. If x is odd but y is even, then it indicates a vertical wall whose top half separates squares (x-1)/2,y/2 and (x+1)/2,y/2. If x is even but y is odd it is a horizontal wall, similarly identified. I might have vertical and horizontal flipped.

So x and y both odd are all unused, as is x odd and equal to 2N-3 (because the walls span two squares, they can't start on the last row or column), and similarly for y = 2N-3.

I made the minimal change necessary to do relative movements, which didn't change any of that logic. It wouldn't be that hard to change things so that you had a dense mapping of action IDs to possible moves.

@tacertain
Copy link
Contributor Author

Hey @lanctot. Do you want me to work on making the action IDs compact?

Also, the observation tensor also seems less than ideal. It encodes everything in a 2D tensor, where one dimension (call it rows) is the board, linearized, and the other has two columns for each player plus one for the walls. The first column for each player and the wall column indicate whether that player/wall occupies that space, and the second column for each player is the number of walls that player has left to play. Note that the board squares are encoded as described above, where there are really four locations devoted to each board square, indicating the location where a pawn could be, the location to the right where a vertical wall could be, the location below where a horizontal wall could be, and the location diagonally to the right and below where either a vertical or horizontal wall could be.

I realize that the observation space doesn't need to be compact - obviously the AlphaZero encoding used planes where all values were the same to encode certain pieces of game information. However, the current Quoridor encoding has the property that nearby squares on the field are not nearby in the tensor, and also that completely unrelated information (like how many walls a player has vs whether a pawn is in a location) is spatially close. Given the image convolution core of at least the classic AlphaZero architecture, these attributes seem like they would inhibit learning. In addition, since there are four rows devoted to each board square, many entries are always zero (i.e. the vertical wall columns can only be non-zero on rows that represent vertical wall locations or the cross-road).

Some pictures. Imagine a board square (x,y):

+--------+---+
|        |   |
|        | V |
|        |   |
|  Pawn  | W |
|        | a |
|        | l |
|        | l |
|        |   | 
+--------+---+
| H Wall | * |
+--------+---+

The board in memory is encoded as follows:

  1. Board memory location [x*2, y*2] records which pawn, if any, is in that square.
  2. Board memory location [x*2+1, y*2] records whether there's a vertical wall between (x,y) and (x+1,y).
  3. Board memory location [x*2, y*2+1] records whether there's a horizontal wall between (x,y) and (x,y+1).
  4. Board memory location [x*2+1, y*2+1] records whether there's either kind of wall that blocks the * area marked above: Walls span two slots - i.e. if there's a wall between (x,y) and (x+1,y), then there also needs to be either a wall between (x,y-1) and (x+1,y-1) or between (x,y+1), (x+1,y+1). If it's the latter, then the * area in (x,y) is blocked, meaning you can't put a horizontal wall through there.

The board memory location is then linearlized, so that each square is a row in the tensor :[i,j] maps to i * board_width * 2 +j. So the section of the tensor relating to square (x,y) looks like this:


+-----------+---------+---------+------+----------+----------+
|     -     | P1 Pawn | P2 Pawn | Wall | P1 Walls | P2 Walls |
+-----------+---------+---------+------+----------+----------+
| [i,j]     |       - |       - |    0 | WC       | WC       |
| [i,j+1]   |       0 |       0 |    - | WC       | WC       |
| ...       |         |         |      |          |          |
| [i+1,j]   |       0 |       0 |    - | WC       | WC       |
| [i+1,j+1] |       0 |       0 |    - | WC       | WC       |
+-----------+---------+---------+------+----------+----------+

Where the zeros are always zero, the dashes are 1.0 if the corresponding wall or pawn occupies that location, and WC is the number of remaining walls left to place for that player (i.e. that column is a constant for every row in the tensor).

I'm happy to keep reworking this game as part of my continuing education, though I also am starting to wonder if the investment is worth it.

Thoughts?

@lanctot
Copy link
Collaborator

lanctot commented May 21, 2024

Hi @tacertain, it would be good to have them in a more compact space, yeah.

Agreed on the observation as well. It should be more convnet-friendly, like in Breakthrough / Chess / etc.

I'm happy to keep reworking this game as part of my continuing education, though I also am starting to wonder if the investment is worth it.

Fair enough. I'm also really hammered right now and it's quite difficult to give timely responses at the moment. How about this: I'm happy with the current PR as it fixes the main problem. As long as I understand it (which I likely will once I have time to read the response with the code in front of me), then I'm happy to import it in time for v1.5 release.

Given the other issues (non-compact action space and problems with the observations), I'd be inclined to leave it in the category of games with known issues and then point to this thread. If you want (or anyone else) wants to keep working on it at that point, you/they can open another PR after this one is imported.

wdyt?

@tacertain
Copy link
Contributor Author

Sounds good. I am going to close this PR and open a new one based on the current head of master and only pushing the first two commits (i.e. not the one that removes it from the games with known issues).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Quoridor Movement Action IDs keep changing
2 participants