Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python: state.observation_tensor() creates a new state... #1068

Open
too-far-away opened this issue May 11, 2023 · 9 comments
Open

python: state.observation_tensor() creates a new state... #1068

too-far-away opened this issue May 11, 2023 · 9 comments

Comments

@too-far-away
Copy link

too-far-away commented May 11, 2023

I've been playing with open_spiel's R-NAD algorithm implementation in python and noticed some strange behavior: each time R-NAD calls state.observation_tensor() a new state is created, then there is a call to observation.set_from(new_state) and only then the original observation.set_from(state) is called. It also looks like the new state is not a clone of the original one.
My game implementation is in python. Here is an excerpt from its configuration:

pyspiel.GameType(
    dynamics = pyspiel.GameType.Dynamics.SEQUENTIAL,
    chance_mode = pyspiel.GameType.ChanceMode.EXPLICIT_STOCHASTIC,
    information = pyspiel.GameType.Information.IMPERFECT_INFORMATION,
    utility = pyspiel.GameType.Utility.ZERO_SUM,
    reward_model = pyspiel.GameType.RewardModel.TERMINAL,
    max_num_players = _NUM_PLAYERS,
    min_num_players = _NUM_PLAYERS,
    provides_information_state_string = False,
    provides_information_state_tensor = False,
    provides_observation_string = False,
    provides_observation_tensor = True,
    provides_factored_observation_string = False,
    parameter_specification = {}
)

There is a great chance I'm doing something wrong. on the other hand I could not find any issue related to the described behavior in my code. I also tried to find the actual code of State::ObservationTensor(), but I guess the implementation of the virtual method is in the pyspiel.State which to my embarrassment I was not able to find.
Please advise.

@lanctot
Copy link
Collaborator

lanctot commented May 12, 2023

I've been playing with open_spiel's R-NAD algorithm implementation in python and noticed some strange behavior: each time R-NAD calls state.observation_tensor() a new state is created, then there is a call to new_state.set_from() and only then the original state.set_from() is called. It also looks like the new state is not a clone of the original one. My game implementation is in python. Here is an excerpt from its configuration:

Do you have a pointer to code? I did a search for set_from in rnad.py and could not find any. So I don't know what code you're referring to. I'm not sure what code is creating a new state every time state.observation_tensor() is called.

There is a great chance I'm doing something wrong. on the other hand I could not find any issue related to the described behavior in my code. I also tried to find the actual code of State::ObservationTensor(), but I guess the implementation of the virtual method is in the pyspiel.State which to my embarrassment I was not able to find. Please advise.

The State::ObservationTensor() code is in C++. The pure virtual function is found in spiel.h. Each game overrides this function to provide a vector that corresponds to an observation for that game. There is also a related one called State::InformationStateTensor() for imperfect information games. You will find these in the C++ sources of the games (e.g. leduc.cc). They are exposed to Python via pybind11.

@too-far-away
Copy link
Author

too-far-away commented May 13, 2023

Do you have a pointer to code?

It's the call to state.observation_tensor() in rnad.py:968
The observer.set_from() is not called directly from rnad.py (sorry, I mistakenly used state.set_from() instead). I guess the implementation of state.observation_tensor() at some point calls observer.set_from() and it does it twice. I imagine it might happen in the python implementation of the pure virtual function you've mentioned (pyspiel.State?). In any case, one call to state.observation_tensor() produces 2 calls to observer.set_from().

@lanctot
Copy link
Collaborator

lanctot commented May 13, 2023

Ok I think this question is more about how the Python games work than about R-NaD.

The code for those functions is here: https://github.com/deepmind/open_spiel/blob/master/open_spiel/python/pybind11/python_games.cc

It is possible that there are two observers?

Without seeing code of your game or a stack trace, it is a bit difficult to help out. But I will tag our resident expert on python games in case he has any ideas: @elkhrt

@too-far-away
Copy link
Author

too-far-away commented May 14, 2023

You are right, R-NaD was just a trigger for the issue. Thank you very much for suggestions on where to look in the code.
I've spent some more time investigating this and here are my findings:
State::ObservationTensor() calls Game::ObservationTensorSize() in spiel.cc:762 (true for other similar methods), then it calls the pure virtual State::ObservationTensor()
The Game::ObservationTensorSize() calls Game::ObservationTensorShape() (spiel.h:846) which for python games implemented in python_games.cc::287. There it creates a new state (line 289) and calls Observer::WriteTensor() for it, meaning that every python call to state.observation_tensor() or state.info_state_tensor() creates a redundant new state, writes an observation tensor for it and only then writes an observation tensor for the original state.
Hope this helps.

@lanctot
Copy link
Collaborator

lanctot commented May 15, 2023

Ok perfect, thanks for the detail.

I will check with @elkhrt to see if this was known. Would be nice if we could avoid this.

@elkhrt
Copy link
Member

elkhrt commented May 15, 2023

Thanks for flagging and the great investigation! This is indeed a bit unsatisfactory. Probably the simplest option is to add two fields to PyGame:

std::vector<int> observation_tensor_shape_;
std::vector<int> information_state_tensor_shape_;

And then modify the methods to cache the data there:

std::vector<int> PyGame::InformationStateTensorShape() const {
  if (information_state_tensor_shape_.empty()) {
    TrackingVectorAllocator allocator;
    auto state = NewInitialState();
    info_state_observer().WriteTensor(*state, kDefaultPlayerId, &allocator);
    information_state_tensor_shape_ = TensorShape(allocator);
  }
  return information_state_tensor_shape_;
}

That still results in one superfluous call per game object, but that's pretty cheap.
I'm happy to make this change on our side, or you can submit a PR - as you prefer.
Of course if you have other ideas on how to fix it, we'd be delighted to hear them.

@too-far-away
Copy link
Author

Sorry for the delayed answer. As you guys are more familiar with the code, thus you may want to make the change.
As a quick and dirty workaround in my python's game code I've overridden State.information_state_tensor() method to return new filled up tensor and in rnad.py it goes directly into the EnvStep without copying. This gives ~30% performance gain.
I understand that it is too late to change the API, but I think it would be nice to completely decouple the game, state and observer with regards to the observation. The game does not need to know about it, its tensor shape or the way it is being created. The same applies for the state. One could create a few observers each making a tensor with a shape different from the others for experimenting with various algorithms. Such an observer's API would have 2 methods, one returning the shape (without having to create states or tensors and filling them up), the other would receive a state and a tensor and fill it up from the state.
Thank you for hearing me out.

@lanctot
Copy link
Collaborator

lanctot commented Aug 28, 2023

@elkhrt did we resolve this?

@elkhrt
Copy link
Member

elkhrt commented Aug 29, 2023

We didn't make change to the core functionality; @too-far-away found a work-around for their case.
Returning to this thread, I think I see a cleaner method which I'll attempt soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants