python: state.observation_tensor() creates a new state... #1068

too-far-away · 2023-05-11T22:46:34Z

I've been playing with open_spiel's R-NAD algorithm implementation in python and noticed some strange behavior: each time R-NAD calls state.observation_tensor() a new state is created, then there is a call to observation.set_from(new_state) and only then the original observation.set_from(state) is called. It also looks like the new state is not a clone of the original one.
My game implementation is in python. Here is an excerpt from its configuration:

pyspiel.GameType(
    dynamics = pyspiel.GameType.Dynamics.SEQUENTIAL,
    chance_mode = pyspiel.GameType.ChanceMode.EXPLICIT_STOCHASTIC,
    information = pyspiel.GameType.Information.IMPERFECT_INFORMATION,
    utility = pyspiel.GameType.Utility.ZERO_SUM,
    reward_model = pyspiel.GameType.RewardModel.TERMINAL,
    max_num_players = _NUM_PLAYERS,
    min_num_players = _NUM_PLAYERS,
    provides_information_state_string = False,
    provides_information_state_tensor = False,
    provides_observation_string = False,
    provides_observation_tensor = True,
    provides_factored_observation_string = False,
    parameter_specification = {}
)

There is a great chance I'm doing something wrong. on the other hand I could not find any issue related to the described behavior in my code. I also tried to find the actual code of State::ObservationTensor(), but I guess the implementation of the virtual method is in the pyspiel.State which to my embarrassment I was not able to find.
Please advise.

The text was updated successfully, but these errors were encountered:

lanctot · 2023-05-12T21:46:08Z

I've been playing with open_spiel's R-NAD algorithm implementation in python and noticed some strange behavior: each time R-NAD calls state.observation_tensor() a new state is created, then there is a call to new_state.set_from() and only then the original state.set_from() is called. It also looks like the new state is not a clone of the original one. My game implementation is in python. Here is an excerpt from its configuration:

Do you have a pointer to code? I did a search for set_from in rnad.py and could not find any. So I don't know what code you're referring to. I'm not sure what code is creating a new state every time state.observation_tensor() is called.

There is a great chance I'm doing something wrong. on the other hand I could not find any issue related to the described behavior in my code. I also tried to find the actual code of State::ObservationTensor(), but I guess the implementation of the virtual method is in the pyspiel.State which to my embarrassment I was not able to find. Please advise.

The State::ObservationTensor() code is in C++. The pure virtual function is found in spiel.h. Each game overrides this function to provide a vector that corresponds to an observation for that game. There is also a related one called State::InformationStateTensor() for imperfect information games. You will find these in the C++ sources of the games (e.g. leduc.cc). They are exposed to Python via pybind11.

too-far-away · 2023-05-13T14:18:42Z

Do you have a pointer to code?

It's the call to state.observation_tensor() in rnad.py:968
The observer.set_from() is not called directly from rnad.py (sorry, I mistakenly used state.set_from() instead). I guess the implementation of state.observation_tensor() at some point calls observer.set_from() and it does it twice. I imagine it might happen in the python implementation of the pure virtual function you've mentioned (pyspiel.State?). In any case, one call to state.observation_tensor() produces 2 calls to observer.set_from().

lanctot · 2023-05-13T16:28:24Z

Ok I think this question is more about how the Python games work than about R-NaD.

The code for those functions is here: https://github.com/deepmind/open_spiel/blob/master/open_spiel/python/pybind11/python_games.cc

It is possible that there are two observers?

Without seeing code of your game or a stack trace, it is a bit difficult to help out. But I will tag our resident expert on python games in case he has any ideas: @elkhrt

too-far-away · 2023-05-14T13:49:48Z

You are right, R-NaD was just a trigger for the issue. Thank you very much for suggestions on where to look in the code.
I've spent some more time investigating this and here are my findings:
State::ObservationTensor() calls Game::ObservationTensorSize() in spiel.cc:762 (true for other similar methods), then it calls the pure virtual State::ObservationTensor()
The Game::ObservationTensorSize() calls Game::ObservationTensorShape() (spiel.h:846) which for python games implemented in python_games.cc::287. There it creates a new state (line 289) and calls Observer::WriteTensor() for it, meaning that every python call to state.observation_tensor() or state.info_state_tensor() creates a redundant new state, writes an observation tensor for it and only then writes an observation tensor for the original state.
Hope this helps.

lanctot · 2023-05-15T01:04:51Z

Ok perfect, thanks for the detail.

I will check with @elkhrt to see if this was known. Would be nice if we could avoid this.

elkhrt · 2023-05-15T10:27:32Z

Thanks for flagging and the great investigation! This is indeed a bit unsatisfactory. Probably the simplest option is to add two fields to PyGame:

std::vector<int> observation_tensor_shape_;
std::vector<int> information_state_tensor_shape_;

And then modify the methods to cache the data there:

std::vector<int> PyGame::InformationStateTensorShape() const {
  if (information_state_tensor_shape_.empty()) {
    TrackingVectorAllocator allocator;
    auto state = NewInitialState();
    info_state_observer().WriteTensor(*state, kDefaultPlayerId, &allocator);
    information_state_tensor_shape_ = TensorShape(allocator);
  }
  return information_state_tensor_shape_;
}

That still results in one superfluous call per game object, but that's pretty cheap.
I'm happy to make this change on our side, or you can submit a PR - as you prefer.
Of course if you have other ideas on how to fix it, we'd be delighted to hear them.

too-far-away · 2023-05-23T15:04:33Z

Sorry for the delayed answer. As you guys are more familiar with the code, thus you may want to make the change.
As a quick and dirty workaround in my python's game code I've overridden State.information_state_tensor() method to return new filled up tensor and in rnad.py it goes directly into the EnvStep without copying. This gives ~30% performance gain.
I understand that it is too late to change the API, but I think it would be nice to completely decouple the game, state and observer with regards to the observation. The game does not need to know about it, its tensor shape or the way it is being created. The same applies for the state. One could create a few observers each making a tensor with a shape different from the others for experimenting with various algorithms. Such an observer's API would have 2 methods, one returning the shape (without having to create states or tensors and filling them up), the other would receive a state and a tensor and fill it up from the state.
Thank you for hearing me out.

lanctot · 2023-08-28T09:14:54Z

@elkhrt did we resolve this?

elkhrt · 2023-08-29T14:55:12Z

We didn't make change to the core functionality; @too-far-away found a work-around for their case.
Returning to this thread, I think I see a cleaner method which I'll attempt soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python: state.observation_tensor() creates a new state... #1068

python: state.observation_tensor() creates a new state... #1068

too-far-away commented May 11, 2023 •

edited

lanctot commented May 12, 2023

too-far-away commented May 13, 2023 •

edited

lanctot commented May 13, 2023 •

edited

too-far-away commented May 14, 2023 •

edited

lanctot commented May 15, 2023

elkhrt commented May 15, 2023 •

edited

too-far-away commented May 23, 2023

lanctot commented Aug 28, 2023

elkhrt commented Aug 29, 2023

python: state.observation_tensor() creates a new state... #1068

python: state.observation_tensor() creates a new state... #1068

Comments

too-far-away commented May 11, 2023 • edited

lanctot commented May 12, 2023

too-far-away commented May 13, 2023 • edited

lanctot commented May 13, 2023 • edited

too-far-away commented May 14, 2023 • edited

lanctot commented May 15, 2023

elkhrt commented May 15, 2023 • edited

too-far-away commented May 23, 2023

lanctot commented Aug 28, 2023

elkhrt commented Aug 29, 2023

too-far-away commented May 11, 2023 •

edited

too-far-away commented May 13, 2023 •

edited

lanctot commented May 13, 2023 •

edited

too-far-away commented May 14, 2023 •

edited

elkhrt commented May 15, 2023 •

edited