Skip to content

Possible bug in tabular q-learning algorithm #1026

Answered by lanctot
giogix2 asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @giogix2,

I'm having a hard time understanding where the bug could be. So I'll follow-up with an explanation. Maybe you can try to get an example where you think the update rule is wrong?

First important thing to note is that this is a "two-player zero-sum" specialized variant of Q-learning (e.g. specifically, it would be different from two independent Q-learners playing against each other -- that's critical).

I think there might be a bug in the tabular q-learning algorithm. In the function RunIteration(), the following line is used to update value next_q_value in the q table update (linked here): const double next_q_value = (player != next_state->CurrentPlayer() ? -1 : 1) * GetBestAc…

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Answer selected by giogix2
Comment options

You must be logged in to vote
1 reply
@lanctot
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants