Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spades Implementation #1214

Open
i-Madsen opened this issue Apr 24, 2024 · 4 comments
Open

Spades Implementation #1214

i-Madsen opened this issue Apr 24, 2024 · 4 comments

Comments

@i-Madsen
Copy link

i-Madsen commented Apr 24, 2024

Hi there, I'm trying to make a new implementation of Spades and am looking for some help along the way as I'm pretty new to RL. Ultimately, I'm hoping to use the framework to train a Spades AI and use it in an iOS/Android app made in Unity (but that's getting a little ahead of myself).

Since the game of Bridge has a similar structure (partnerships, bidding phase, trick taking) and is already in OpenSpiel, I'm using it as a base, but some game concepts are pretty different - particularly how bidding works.

So the first thing I want to make sure I'm doing correctly is dealing with the tensor sizing/representation.

bridge.h starts out with:

inline constexpr int kNumObservationTypes = 4;  // Bid, lead, declare, defend
// Because bids always increase, any individual bid can be made at most once.
// Thus for each bid, we only need to track (a) who bid it (if anyone), (b) who
// doubled it (if anyone), and (c) who redoubled it (if anyone).
// We also report the number of passes before the first bid; we could
// equivalently report which player made the first call.
// This is much more compact than storing the auction call-by-call, which
// requires 318 turns * 38 possible calls per turn = 12084 bits (although
// in practice almost all auctions have fewer than 80 calls).
inline constexpr int kAuctionTensorSize =
    kNumPlayers * (1           // Did this player pass before the opening bid?
                   + kNumBids  // Did this player make each bid?
                   + kNumBids  // Did this player double each bid?
                   + kNumBids  // Did this player redouble each bid?
                   ) +
    kNumCards                                  // Our hand
    + kNumVulnerabilities * kNumPartnerships;  // Vulnerability of each side
inline constexpr int kPublicInfoTensorSize =
    kAuctionTensorSize  // The auction
    - kNumCards         // But not any player's cards
    + kNumPlayers;      // Plus trailing passes
inline constexpr int kMaxAuctionLength =
    kNumBids * (1 + kNumPlayers * 2) + kNumPlayers;

As far as I understand, Bridge bidding has multiple actions you can take and keeps going until all other players pass, culminating in a single contract. Spades, on the other hand, has each player make a single bid and then the bidding phase ends, leaving '4' contracts (partnerships typically work together to meet their combined contract, but a 'Nil' bid (0) is player dependent). Spades also has no concepts of being vulnerable or having a declarer.

So with that in mind, I'm thinking I change it like this:

inline constexpr int kNumObservationTypes = 1;
inline constexpr int kAuctionTensorSize = kNumPlayers * kNumBids
                                          + kNumCards;  // Our hand
inline constexpr int kPublicInfoTensorSize =
    kAuctionTensorSize  // The auction
    - kNumCards;         // But not any player's cards
inline constexpr int kMaxAuctionLength = 4;

Later on, there is a method for getting the play tensor size.

bridge.h

  static int GetPlayTensorSize(int num_tricks) {
    return kNumBidLevels          // What the contract is
           + kNumDenominations    // What trumps are
           + kNumOtherCalls       // Undoubled / doubled / redoubled
           + kNumPlayers          // Who declarer is
           + kNumVulnerabilities  // Vulnerability of the declaring side
           + kNumCards            // Our remaining cards
           + kNumCards            // Dummy's remaining cards
           + num_tricks * kNumPlayers * kNumCards  // Number of played tricks
           + kNumTricks   // Number of tricks we have won
           + kNumTricks;  // Number of tricks they have won
  }

Again, bids in spades are a simple. single bid, one per player, with no concept of modifying calls, a declarer, or vulnerability. Spades will always be trump and there is no 'dummy' hand, so I'm assuming I can just remove half of these and then multiply bids/tricks by kNumPlayers since we'll want to track each player individually.

spades.h

  static int GetPlayTensorSize(int num_tricks) {
    return kNumBids * kNumPlayers                   // What each player's contract is
           + kNumCards                              // Our remaining cards
           + num_tricks * kNumPlayers * kNumCards   // Number of played tricks
           + kNumTricks * kNumPlayers;              // Number of tricks each player has won
  }

Am I misunderstanding anything or does this seem correct?

@lanctot
Copy link
Collaborator

lanctot commented Apr 27, 2024

Hi @i-Madsen

That sounds about right to me, but I will give a bit of advice: those observation tensor functions are the most difficult ones to implement. Leave them until the end. I suggest getting a working implementation of Spades with those functions left blank or returning bogus values first because that's already a nontrivial task. It's best to discuss those functions only after the basic game logic functionality is implemented.

@i-Madsen
Copy link
Author

Gotcha, that makes sense. Thanks, I'll stick with this for now and continue on with getting the game logic all working.

@i-Madsen
Copy link
Author

I've now got the game functionality able to build and run, completing the basic game tests successfully (I'm ignoring/disabling anything to do with double_double and the uncontested_bidding variant). I think I'm about ready to take a stab at implementing the tensor functions, but have a question about the overall approach for this game.

The Bridge implementation runs only a single round as a zero sum game, always starting with the same player. While this makes sense for Bridge, Spades is a little different as both teams can go either positive or negative during a round. Therefore, the overall game state is much more important as a lower-risk bid that results in that partnership winning the overall game is much better than a high-risk bid that results in a higher point difference. (Overall game state also determines if bag penalties are a risk during the round.)

Currently, I've kept the Bridge structure of always starting with player 0 and playing a single round and my thoughts are that I'll just feed in game parameters like this:

/*parameter_specification=*/
                         {
                             // Whether to end the game early if score gets too low
                             {"use_mercy_rule", GameParameter(true)},

                             // If using mercy rule, the threshold of negative points
                             {"mercy_threshold", GameParameter(-350)},

                             // Amount of points needed to win the game
                             {"win_threshold", GameParameter(500)},

                             // Parnership's current scores
                             // (can infer bags from last digit)
                             {"score_partnership_0", GameParameter(0)},
                             {"score_partnership_1", GameParameter(0)},

                             // If true, replace the play phase with a computed
                             // result based on perfect-information play.
                             {"use_double_dummy_result", GameParameter(false)},

                             // Number of played tricks in observation tensor
                             {"num_tricks", GameParameter(2)},
                         }};

The outside training script can track the scores and manage the teams by 'rotating' what positions the players/team scores are in. (Also need to add a big bonus for winning to the returned rewards in the game environment?)

@lanctot Does this seem reasonable? Or do I need to rethink the approach?

@lanctot
Copy link
Collaborator

lanctot commented May 17, 2024

Hi @i-Madsen , yes, that sounds very reasonable to me! And using the Bridge functions as a reference is good. You can also check out Hearts, Skat, and Dou Di Zhu games as well for reference. Good luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants