Google Research Football Competition - liveinparis team

The exact codes used by the team "liveinparis" at the kaggle football competition
Implementations of self-play RL from scratch with distributed actors
Final version of agents ranked 6th/1141 (gold prize)
You can find all the training details at here

Dependencies

google-research football
PyTorch
tensorboardX
kaggle_environments

Usage

python3 train.py 
# You can find args and hyper-parameters at the "arg_dict" in train.py.

training curves (vs rule base AI)

(x-axis : # of episodes)

Orange curve - vs. easy level AI
Blue - vs. medium level AI

learning system

Actor proceeds simulation and send rollouts(transition tuples of horizon length 30) to the central learner. Learner updates the agent with provided rollouts. Since we chose on-policy update algorithm, we used a trick to ensure perfect on-policyness(behavior policy and learning policy are equal). Actor periodically stops simulation process when the learner is updating the policy. Actor resumes simulation when it receives the newest model from the learner after training. We used 1 actor per 1 cpu core. Our final version of agent is trained with 30 cpu cores and 1 gpu for 370 hours (cpu: AMD Ryzen Threadripper 2950X, gpu : RTX 2080). This is equivalent to 450,000 episodes, and 133M times of mini batch updates(single mini batch composed of 32 rollouts, each rollout composed of 30 state transitions).

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
algos		algos
data/images		data/images
encoders		encoders
kaggle_simulations/agent		kaggle_simulations/agent
models		models
rewarders		rewarders
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
actor.py		actor.py
evaluator.py		evaluator.py
learner.py		learner.py
requirements.txt		requirements.txt
train.py		train.py
view_match.ipynb		view_match.ipynb

License

seungeunrho/football-paris

Folders and files

Latest commit

History

Repository files navigation

Google Research Football Competition - liveinparis team

Dependencies

Usage

training curves (vs rule base AI)

learning system

About

Topics

Resources

License

Stars

Watchers

Forks

Languages