[Puzzle] N-Queens #1222

95A31 · 2024-05-06T20:09:37Z

Hello,

I am investigating how effective is RL in the context of combinatorial problems. To this end, I would like to implement the N-Queens puzzle, where n queens has to be placed in a n x n chessboard so that no two queens threaten each other. I am wondering if you could suggest me some game to start from.

Thank you in advance!

tacertain · 2024-05-10T17:36:35Z

Somebody more experienced might chime in, but I'm not sure there's a great starting choice that will save you a lot of time. Breakthrough is simple and uses a chess board, but the actions are going to be very different. Quoridor (at least until my PR #1221 is merged) uses absolute board positioning for action IDs, which is what you want (but beware that the board has even rows and columns for piece locations and odd rows and columns for wall locations). Chess obviously has the logic for tracing queen moves and seeing if there are other pieces in the way.

95A31 · 2024-05-10T18:10:14Z

Thank you for your answer! The implementation's complexity or time is not a problem :) My concerns are about how to model the puzzle effectively for RL. I have three approaches in mind, but I'm unsure which would be more RL-friendly:

Single agent: The agent decides where to place the next queen. If the chosen position is threatened by another queen, the agent loses.
Single agent with limited choices: The agent selects from positions not threatened by existing queens. If no safe positions remain, the agent loses.
Two-agent approach: One agent places queens, and another removes threatened positions. If no safe positions remain, the first agent loses and the second agent wins.

Any suggestion or insights is very appreciated.

tacertain · 2024-05-10T19:32:11Z

Caveat: I'm not a RL expert

I think /2/ is your best approach. There's no reason to say moves are legal that you know are going to result in a loss, and adding a second agent with no "agency" isn't going to change the learning process for the primary agent.

All that said, I think this puzzle is going to be a "wicked" learning environment (https://psycnet.apa.org/record/2015-47120-008). Almost all paths down the MCTS are going to result in a loss for the agent - it's going to be hard to get much signal on which moves are "better" than others. In addition, a move is only "good" in the context of the entire solution. In other words, the value of placing a queen in a certain location is only high if you're heading toward the few (or one) solution where that space is occupied - if you have a slightly different board position, the value of that placement is zero.

Obviously, the point of research is to learn new things, so I am not telling you that you shouldn't proceed. I think it'll be quick to implement the puzzle and start training - just be aware that if your results are disappointing, it may not be because of any of the particular choices for how to encode things that are making it hard.

95A31 · 2024-05-10T20:14:24Z

Thank you for the suggestions!

I agree, this environment isn't easy to work with. However, it's precisely this challenge that makes it interesting to explore :)

lanctot · 2024-05-11T23:44:12Z

Hi @95A31,

We don't have too many single-agent games in OpenSpiel but there are a few (morpion solitaire, some of the gridworld domains, klondike solitaire) but they're all more complex than you need. I'll give some advice below as to how I think you should do it. You should probably start from Tic-Tac-Toe or Breakthrough and just empty out all the methods. You can make N a parameter if you like.

I agree with @tacertain: your proposed Option 2 is the way to go. The legal moves in that case would simply be all possible placements for the next queen. If you place a new queen, and then there are no more legal moves but still some queens to place, then the agent loses (return of -1). If the player places them all and there are no queens left, the player wins (return of +1).

Firstly, your CurrentPlayer should always return 1 or kTerminalPlayer (at the end). In the state you should maintain how many queens are left to place and a list of the legal next moves (which starts at all 64). In the ApplyMove, you should place a queen on the spot that was chosen, and then recompute the list of legal placements. If there are none left, then the state is terminal and is_terminal should return false when the list of legal moves is empty. Otherwise, the next legal moves are stored in the list and returned by State::LegalActions.

tacertain · 2024-05-12T06:19:30Z

Since everything is symmetric through rotations of 90°, you might start with the first queen only having legal moves into one quadrant of the board - it'll cut that initial branching by a factor of 4.

You could get even more clever and eliminate rotations and reflections after the first queen, but those are going to be a lot trickier to code.

95A31 · 2024-05-13T21:51:19Z

@lanctot Thank you for the guidance! I already have a C++ generic “simulator” that can model a lot of “puzzles” from their domain description. I guess I just have to properly bridge it to OpenSpiel. Would be possible to pass the (path of the file of the) domain description as parameter?

@tacertain Thank you for the suggestion! It indeed cut a lot of the search space :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Puzzle] N-Queens #1222

[Puzzle] N-Queens #1222

95A31 commented May 6, 2024

tacertain commented May 10, 2024

95A31 commented May 10, 2024 •

edited

tacertain commented May 10, 2024

95A31 commented May 10, 2024

lanctot commented May 11, 2024 •

edited

tacertain commented May 12, 2024

95A31 commented May 13, 2024 •

edited

[Puzzle] N-Queens #1222

[Puzzle] N-Queens #1222

Comments

95A31 commented May 6, 2024

tacertain commented May 10, 2024

95A31 commented May 10, 2024 • edited

tacertain commented May 10, 2024

95A31 commented May 10, 2024

lanctot commented May 11, 2024 • edited

tacertain commented May 12, 2024

95A31 commented May 13, 2024 • edited

95A31 commented May 10, 2024 •

edited

lanctot commented May 11, 2024 •

edited

95A31 commented May 13, 2024 •

edited