Convert quoridor movement action IDs to relative #1229

tacertain · 2024-05-15T02:35:18Z

The original implementation of Quoridor used absolute position numbering for pawn moves, so moving to b3 was always the same action ID, regardless of where the pawn was. This commit changes to using relative action IDs, so moving directly north is always the same action ID, regardless of what square that is moving to.

Fixes #1158

The original implementation of Quoridor used absolute position numbering for pawn moves, so moving to b3 was always the same action ID, regardless of where the pawn was. This commit changes to using relative action IDs, so moving directly north is always the same action ID, regardless of what square that is moving to.

tacertain · 2024-05-15T02:36:03Z

This commit fails the playthrough testing because the action IDs have changed. Still need to update the test scripts.

After changing the action IDs in quoridor to relative positions, the playthrough files needed updating. This commit updates them.

tacertain · 2024-05-15T05:04:15Z

I think all the tests should now pass and this is ready to be reviewed.

tacertain · 2024-05-16T18:27:51Z

The build failures don't seem to be because of my changes, but let me know if that's not the case.

lanctot

I have to say, I'm quite confused. It doesn't help that I am not familiar with the game.

I'll have to ask a round of questions to help me understand. Here is the first:

Can you explain what the OpenSpiel action range is (i.e. how the action ids in 0 to Game::NumDistinctActions() map to the game-specific actions):

How many unique actions correspond to movement actions? Are they separated from wall moves? If not, why not?
What are the movement action IDs (which integers in the range correspond to movement actions)
What are the wall moves? (which integers in the range correspond to wall moves)

lanctot · 2024-05-18T10:25:29Z

open_spiel/games/quoridor/quoridor.cc

+ * This meant that moving "north" had a different ID for every pawn position.
+ * Now action IDs are encoded in the same virtual space as absolute board 
+ * positions, but they indicate the pawn's relative move as if it were in
+ * square (1,1). So when we get those action IDs in, we need to convert them


Did you mean (2,2) here?

Why is this (1,1) when base_for_relative_ is (2,2)?

lanctot · 2024-05-18T10:26:48Z

open_spiel/games/quoridor/quoridor.cc

@@ -261,7 +281,7 @@ void QuoridorState::AddActions(Move cur, Offset offset,
  Move forward = cur + offset * 2;
  if (GetPlayer(forward) == kPlayerNone) {
    // Normal single step in this direction.
-    moves->push_back(forward.xy);
+    moves->push_back((base_for_relative_ + offset * 2).xy);


why the * 2 on the offset here?

Is it to keep the coordinates even ? (otherwise they get interpreted as wall moves?)

lanctot · 2024-05-18T10:34:38Z

The build failures don't seem to be because of my changes, but let me know if that's not the case.

Yes I think you're right but I've rerun them twice and they're still happening. Same error.

You might have to do another push (e.g. add a blank line somewhere, or take one out and then commit + push again... I can always remove it when it gets imported) because it seems to be failing some version check which is likely attached to the specific time you did the last push. I've had to play these games with Github Actions before...

lanctot · 2024-05-18T10:37:12Z

Let me add another question about the actions: is the range compact? (i.e. are there any integers on the range of 0 .. Game::NumDistinctActions() that are never used in any single valid state?)

E.g. if there were 8 movement actions then I'd expect they'd correspond to action ids 0 - 7 and then all wall actions would be from 8 to Game::NumDistinctActions() - 1. Is that the case here?

tacertain · 2024-05-18T14:21:24Z

I have five minutes right now, so I'll write something quick and then more later. The action space is not compact.

If the board is NxN, then the action ID space (before I made any changes) was (2N-1)x(2N-1). Action IDs were mapped into this 2D space (by the obvious mapping) and then interpreted as follows: If both the x and y values are even then it's indicates the pawn position x/2, y/2. If x is odd but y is even, then it indicates a vertical wall whose top half separates squares (x-1)/2,y/2 and (x+1)/2,y/2. If x is even but y is odd it is a horizontal wall, similarly identified. I might have vertical and horizontal flipped.

So x and y both odd are all unused, as is x odd and equal to 2N-3 (because the walls span two squares, they can't start on the last row or column), and similarly for y = 2N-3.

I made the minimal change necessary to do relative movements, which didn't change any of that logic. It wouldn't be that hard to change things so that you had a dense mapping of action IDs to possible moves.

tacertain · 2024-05-21T16:05:38Z

Hey @lanctot. Do you want me to work on making the action IDs compact?

Also, the observation tensor also seems less than ideal. It encodes everything in a 2D tensor, where one dimension (call it rows) is the board, linearized, and the other has two columns for each player plus one for the walls. The first column for each player and the wall column indicate whether that player/wall occupies that space, and the second column for each player is the number of walls that player has left to play. Note that the board squares are encoded as described above, where there are really four locations devoted to each board square, indicating the location where a pawn could be, the location to the right where a vertical wall could be, the location below where a horizontal wall could be, and the location diagonally to the right and below where either a vertical or horizontal wall could be.

I realize that the observation space doesn't need to be compact - obviously the AlphaZero encoding used planes where all values were the same to encode certain pieces of game information. However, the current Quoridor encoding has the property that nearby squares on the field are not nearby in the tensor, and also that completely unrelated information (like how many walls a player has vs whether a pawn is in a location) is spatially close. Given the image convolution core of at least the classic AlphaZero architecture, these attributes seem like they would inhibit learning. In addition, since there are four rows devoted to each board square, many entries are always zero (i.e. the vertical wall columns can only be non-zero on rows that represent vertical wall locations or the cross-road).

Some pictures. Imagine a board square (x,y):

+--------+---+
|        |   |
|        | V |
|        |   |
|  Pawn  | W |
|        | a |
|        | l |
|        | l |
|        |   | 
+--------+---+
| H Wall | * |
+--------+---+

The board in memory is encoded as follows:

Board memory location [x*2, y*2] records which pawn, if any, is in that square.
Board memory location [x*2+1, y*2] records whether there's a vertical wall between (x,y) and (x+1,y).
Board memory location [x*2, y*2+1] records whether there's a horizontal wall between (x,y) and (x,y+1).
Board memory location [x*2+1, y*2+1] records whether there's either kind of wall that blocks the * area marked above: Walls span two slots - i.e. if there's a wall between (x,y) and (x+1,y), then there also needs to be either a wall between (x,y-1) and (x+1,y-1) or between (x,y+1), (x+1,y+1). If it's the latter, then the * area in (x,y) is blocked, meaning you can't put a horizontal wall through there.

The board memory location is then linearlized, so that each square is a row in the tensor :[i,j] maps to i * board_width * 2 +j. So the section of the tensor relating to square (x,y) looks like this:


+-----------+---------+---------+------+----------+----------+
|     -     | P1 Pawn | P2 Pawn | Wall | P1 Walls | P2 Walls |
+-----------+---------+---------+------+----------+----------+
| [i,j]     |       - |       - |    0 | WC       | WC       |
| [i,j+1]   |       0 |       0 |    - | WC       | WC       |
| ...       |         |         |      |          |          |
| [i+1,j]   |       0 |       0 |    - | WC       | WC       |
| [i+1,j+1] |       0 |       0 |    - | WC       | WC       |
+-----------+---------+---------+------+----------+----------+

Where the zeros are always zero, the dashes are 1.0 if the corresponding wall or pawn occupies that location, and WC is the number of remaining walls left to place for that player (i.e. that column is a constant for every row in the tensor).

I'm happy to keep reworking this game as part of my continuing education, though I also am starting to wonder if the investment is worth it.

Thoughts?

lanctot · 2024-05-21T18:54:12Z

Hi @tacertain, it would be good to have them in a more compact space, yeah.

Agreed on the observation as well. It should be more convnet-friendly, like in Breakthrough / Chess / etc.

I'm happy to keep reworking this game as part of my continuing education, though I also am starting to wonder if the investment is worth it.

Fair enough. I'm also really hammered right now and it's quite difficult to give timely responses at the moment. How about this: I'm happy with the current PR as it fixes the main problem. As long as I understand it (which I likely will once I have time to read the response with the code in front of me), then I'm happy to import it in time for v1.5 release.

Given the other issues (non-compact action space and problems with the observations), I'd be inclined to leave it in the category of games with known issues and then point to this thread. If you want (or anyone else) wants to keep working on it at that point, you/they can open another PR after this one is imported.

wdyt?

tacertain · 2024-05-21T19:37:08Z

Sounds good. I am going to close this PR and open a new one based on the current head of master and only pushing the first two commits (i.e. not the one that removes it from the games with known issues).

tacertain mentioned this pull request May 15, 2024

Quoridor Movement Action IDs keep changing #1158

Closed

tacertain added 2 commits May 14, 2024 22:01

Add updated playthroughs

1e370f7

After changing the action IDs in quoridor to relative positions, the playthrough files needed updating. This commit updates them.

Remove quoridor from the list of games with known issues

9995020

Merge branch 'google-deepmind:master' into tacertain-patch-3

72910c4

lanctot reviewed May 18, 2024

View reviewed changes

tacertain closed this May 21, 2024

tacertain mentioned this pull request May 21, 2024

Convert quoridor movement action IDs to relative #1232

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert quoridor movement action IDs to relative #1229

Convert quoridor movement action IDs to relative #1229

tacertain commented May 15, 2024 •

edited

tacertain commented May 15, 2024

tacertain commented May 15, 2024

tacertain commented May 16, 2024

lanctot left a comment

lanctot May 18, 2024

lanctot May 18, 2024 •

edited

lanctot commented May 18, 2024 •

edited

lanctot commented May 18, 2024 •

edited

tacertain commented May 18, 2024 •

edited

tacertain commented May 21, 2024

lanctot commented May 21, 2024 •

edited

tacertain commented May 21, 2024

Convert quoridor movement action IDs to relative #1229

Convert quoridor movement action IDs to relative #1229

Conversation

tacertain commented May 15, 2024 • edited

tacertain commented May 15, 2024

tacertain commented May 15, 2024

tacertain commented May 16, 2024

lanctot left a comment

Choose a reason for hiding this comment

lanctot May 18, 2024

Choose a reason for hiding this comment

lanctot May 18, 2024 • edited

Choose a reason for hiding this comment

lanctot commented May 18, 2024 • edited

lanctot commented May 18, 2024 • edited

tacertain commented May 18, 2024 • edited

tacertain commented May 21, 2024

lanctot commented May 21, 2024 • edited

tacertain commented May 21, 2024

tacertain commented May 15, 2024 •

edited

lanctot May 18, 2024 •

edited

lanctot commented May 18, 2024 •

edited

lanctot commented May 18, 2024 •

edited

tacertain commented May 18, 2024 •

edited

lanctot commented May 21, 2024 •

edited