Updating Alpha Zero #1085

ramizouari · 2023-06-20T13:27:32Z

This is a feature proposal.

While basing my work on this version of Alpha Zero (Tensorflow, both in Python and C++), I have many points that I have addressed, including:

It is clear that the Alpha Zero implementation is depending on the compatibility layer of TensorFlow.
Some constants are hard coded. I have two examples of this, stage_count=7 and wait(0.01). While their meaning is clear from context and from the subsequent logic, it will be nice to parameterise them.
The replay buffer seems to be manually implemented. It will be also nicer to base it on the sibling project dm-reverb
The implementation seems to be tightly coupled. By service, I mean an actor or a learner or an evaluator.

My proposed implementation is based on TensorFlow 2, and is almost functionally equivalent to the old one, only the model definition is not the same (I am working on a specific game, but I can follow the "standard" architectures exactly if needed)
There are also later improvements concerning point 3 and 4, but I will be happy to propose the migration from TF1 logic to TF2 as a starter.

Before anything else, I want to know if you are open to such idea?

The text was updated successfully, but these errors were encountered:

lanctot · 2023-06-22T00:43:03Z

Hi @ramizouari,

That's great! I think we will soon need to replace (or deprecate) all of our TF1 implementations soon / eventually, so I'd be happy to upgrade our TF AlphaZero to be base primarily on TF2. Luckily on the basic RL side, we have most of main algorithms already in JAX & PyTorch so this won't be too problematic, but definitely TF AlphaZero is not in that category.

I'm surprised you got the Tensorflow-based C++ AlphaZero version to work. Last I remember, we only had that working internally. Either way: it's not used very much because most people could not get it to compile externally. So we left it there mostly so people could see it. It'd be OK if the TF2 version of AlphaZero didn't support the C++ TF API because it's not nearly as user-friendly as the LibTorch, which most of the C++ users have used instead.

Would you be willing to submit a PR that upgrades it? I'd like some evidence that it works as well, on maybe say Tic-Tac-Toe and/or Connect Four. We should also loop in @tewalds who was the original author of that code to see if he has anything to add.

Let me know what you think!

ramizouari · 2023-06-22T01:33:09Z

Hi @lanctot,

First of all, thank you for your reply. I will be more than happy to submit a PR for this proposal.

To be honest, getting to C++ version to work was not an easy task. The main problem was getting TensorFlow to work. While I was not able to build TensorFlow from source, nor build it as mentioned by the documentation via TensorFlowCC. I was able to link the library against the pip installed version (It contains the *.so files, and fortunately, it does also contain the header files).

Also, to load the model, I had to use the SavedModelBundle interface. Still, I was not able to implement these scenarios yet:

Loading particular checkpoints
Calling the Fit function
I think they are possible, but need further inspection of the protobuf file.

For the Python version, everything works with the new API.

For the PR, I will try to do it before the end of July.

For a quick inspection of the code before the PR, you can find it on my fork under the same name open_spiel. It should contain:

For Python: The files of the new implementation on the same path with a suffix _v2.
For C++: It is under open_spiel/algorithms/alpha_zero_mpg, but this is an implementation for a very specific game. I will rewrite a generalised version from it for the PR.

If you need further clarification, Let me know.

lanctot · 2023-06-22T11:28:15Z

Alright, cool! Take all the time you need. In the mean time, I will point @tewalds to the thread to see if he has any comments.

tewalds · 2023-06-27T16:11:13Z

It'd be great to have a C++ version of alphazero that uses TF2 and reverb. I didn't use reverb initially with the goal of maximizing what is doable on a single machine, ignoring the multi-machine case, but it'd be great to support multi-machine as well.

As long as you can get it general enough to learn any similar game, I'm all for inclusion.

ramizouari · 2023-06-30T09:31:49Z

Hello @tewalds.

Thank you for your reply.

In the game that I am working on (Mean Payoff Game), I had to deploy it on a HPC cluster for faster trajectory generation.
For my use case, trajectories were sent with reverb. Also, broadcasting the model and monitoring were done with a HTTP server on each service.

Also, the HTTP part is generic on the sense that we can switch it with another protocol (or default to the multiprocessing queues as is implemented by default), one only have to change:

Model broadcasting function
The ReplayBuffer implementation (Local via queues / Reverb via gRPC, or a custom one)
The model update part on the actors and evaluators.

I had to switch to Python on that part due to the lack of documentation of C++'s implementation of Reverb, but of course it is doable. I will need to contact the Reverb team for more intuition on their C++ code.

Also another limitation of the C++ part, I am still not find the correct format to call the fit function on C++. I only was able to do inference. Also I was not able to load individual checkpoints, but that can be mitigated by simply loading the whole SavedModel bundle on each update.

On the other hand, assuming the Reverb problem in C++ will be resolved, what I can do is implement the learner in Python, and the actors and evaluators in C++. And have them communicate using for example HTTP + Reverb.

Now, as that will constitute a big code addition, I think it will be best if we split them on PR at a time. And for that I will start with the TF2 update.

And as a performance measure, can you please tell me what games should the new implementation be able to learn?

tewalds · 2023-07-21T11:50:14Z

That all sounds great!

I was using tic-tac-toe as my basic test as it should learn that pretty quickly, then connect four as something a bit harder that it should get great at, and havannah as my real challenge since that's what I worked on for my masters so had a strong opponent I could play via the gtp connector.

aadharna · 2023-10-04T22:17:20Z

@ramizouari I was playing around with your python-based tf2.0 fork on connect_four and (maybe this is a wsl issue), but I'm getting cuda errors on the learner and evaluator workers when initializing the error because tensorflow doesn't like to share the GPU when using multi-processing. How did you get the code running quickly? When I was just using the CPU, the workers were really really slow to the point of being unusable. (I would have simply raised this as an issue on your fork, but didn't see that as an option)

Nightbringers · 2024-02-18T03:27:09Z

Hi @ramizouari,

That's great! I think we will soon need to replace (or deprecate) all of our TF1 implementations soon / eventually, so I'd be happy to upgrade our TF AlphaZero to be base primarily on TF2. Luckily on the basic RL side, we have most of main algorithms already in JAX & PyTorch so this won't be too problematic, but definitely TF AlphaZero is not in that category.

I'm surprised you got the Tensorflow-based C++ AlphaZero version to work. Last I remember, we only had that working internally. Either way: it's not used very much because most people could not get it to compile externally. So we left it there mostly so people could see it. It'd be OK if the TF2 version of AlphaZero didn't support the C++ TF API because it's not nearly as user-friendly as the LibTorch, which most of the C++ users have used instead.

Would you be willing to submit a PR that upgrades it? I'd like some evidence that it works as well, on maybe say Tic-Tac-Toe and/or Connect Four. We should also loop in @tewalds who was the original author of that code to see if he has anything to add.

Let me know what you think!

where is AlphaZero algorithms in JAX? what about the speed of AlphaZero in jax? Is there a big difference in speed between AlphaZero JAX and AlphaZero C++?

lanctot · 2024-02-18T09:59:24Z

where is AlphaZero algorithms in JAX? what about the speed of AlphaZero in jax? Is there a big difference in speed between AlphaZero JAX and AlphaZero C++?

There is no JAX AlphaZero in OpenSpiel. It would make a welcome contribution, though!

lanctot · 2024-04-06T18:22:00Z

Hi @ramizouari ,

I'm doing a bit of spring cleaning and looking into removing the TF-based C++ AlphaZero, here: #1201. @tewalds, we never managed to get this to work externally and I'm not sure it's worth maintaining long-term if it's not being used. wdyt?

If you have worked on a JAX or TF2-based implementation, we'd love to have it in the repos (if it's well-tested on a few small games), so I'd encourage you to submit it as a PR.

Otherwise users interested in C++-based AlphaZero should use the one based LibTorch. There's also the Python TF implementation.

lanctot added the contribution welcome It's a nice feature! But we do not have the time to do it ourselves. Contribution welcomed! label Aug 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating Alpha Zero #1085

Updating Alpha Zero #1085

ramizouari commented Jun 20, 2023 •

edited

lanctot commented Jun 22, 2023 •

edited

ramizouari commented Jun 22, 2023

lanctot commented Jun 22, 2023

tewalds commented Jun 27, 2023

ramizouari commented Jun 30, 2023 •

edited

tewalds commented Jul 21, 2023

aadharna commented Oct 4, 2023 •

edited

Nightbringers commented Feb 18, 2024

lanctot commented Feb 18, 2024

lanctot commented Apr 6, 2024

Updating Alpha Zero #1085

Updating Alpha Zero #1085

Comments

ramizouari commented Jun 20, 2023 • edited

lanctot commented Jun 22, 2023 • edited

ramizouari commented Jun 22, 2023

lanctot commented Jun 22, 2023

tewalds commented Jun 27, 2023

ramizouari commented Jun 30, 2023 • edited

tewalds commented Jul 21, 2023

aadharna commented Oct 4, 2023 • edited

Nightbringers commented Feb 18, 2024

lanctot commented Feb 18, 2024

lanctot commented Apr 6, 2024

ramizouari commented Jun 20, 2023 •

edited

lanctot commented Jun 22, 2023 •

edited

ramizouari commented Jun 30, 2023 •

edited

aadharna commented Oct 4, 2023 •

edited