Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I can only run in one cpu? #5

Open
Gezx opened this issue Sep 27, 2020 · 4 comments
Open

I can only run in one cpu? #5

Gezx opened this issue Sep 27, 2020 · 4 comments

Comments

@Gezx
Copy link

Gezx commented Sep 27, 2020

I'm very sorry to trouble you. When I run --nprocesses more than one, it can't run. But --nprocesses 1 is ok, and the result of different nprocesses is similar?

@apsdehal
Copy link
Contributor

What exactly is that error you are seeing?

@ReinholdM
Copy link

ReinholdM commented Oct 30, 2020

@apsdehal Hi! I have the same problem when I choose num_process=16 as default. It only works on num_process=0. When num_process=16, the error info is shown as follows
image

@acse-yl421
Copy link

acse-yl421 commented Feb 24, 2023

Hi,

I am wondering has anyone managed to fix the issue ? When I used --nprocesses 16, the execution hung at line, s = trainer.train_batch(ep), in run(num_epochs) main.py. As a result, no output was generated even waiting for 24 hrs. Could someone please help advise how I could run the programme with multiple processes ?

Thank you very much.

@Rza-A
Copy link

Rza-A commented Mar 30, 2023

Hi,

I'm facing the same problem. When I use --nprocesses 1, it gets executed as expected. However, when --nprocesses 16 is used, it freezes on the following screen:
image

It seems, as @acse-yl421 said, it freezes on s = trainer.train_batch(ep). Also, Visdom doesn't show any progress/output.

The command I use to execute:

python main.py --env_name predator_prey --nagents 3 --nprocesses 16 --num_epochs 100 --epoch_size 15 --hid_size 128 --detach_gap 10 --lrate 0.001 --dim 3 --max_steps 3 --ic3net --vision 0 --recurrent --plot

The environment used:

Conda 4.12.0
Python 3.6.13

----------
Packages:
----------
certifi==2021.5.30
cffi @ file:///tmp/build/80754af9/cffi_1625814693874/work
charset-normalizer==2.0.12
cloudpickle==2.2.1
dataclasses==0.8
gym==0.9.6
gym-notices==0.0.8
-e git+https://github.com/IC3Net/IC3Net.git@69b7e0ce51a79def593abfef1a976f43e5e13f75#egg=ic3net_envs&subdirectory=ic3net-envs
idna==3.4
importlib-metadata==6.1.0
mkl-fft==1.0.6
mkl-random==1.0.1
numpy==1.13.3
Pillow==6.2.0
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
pyglet==2.0.5
pyzmq==25.0.2
requests==2.27.1
six==1.16.0
TBB==0.2
torch==0.4.0
torchfile==0.1.0
tornado==6.1
typing-extensions==3.7.4.1
urllib3==1.26.15
visdom==0.1.4
zipp==3.15.0

Hardware:
11th Gen Intel(R) Core(TM) i9-11950H @ 2.60GHz

I know it is an old repo, but I'd appreciate any help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants