[Question]how to run the mixtral inference in multi-node? #5544

leachee99 · 2024-05-17T08:29:42Z

Describe the bug
The program is killed by timeout of watchdog when I run deepspeed on mutli-node.

To Reproduce
Steps to reproduce the behavior:
my code

Simple inference script to reproduce

   deepspeed \
    --hostfile=./hostfile \
    --include="node0:2,3@node1:0,1" \
    mixtralDs.py \
    --deepspeed_config ./ds_config.json

mixtralDs.py

def run_mixtral_ds():
    local_rank = int(os.environ["LOCAL_RANK"])
    torch.cuda.set_device(local_rank)
 
    device = torch.device("cuda",local_rank)
    configuration = MixtralConfig(vocab_size=32000,
            hidden_size=4096//2,
            intermediate_size=14336//2,
            num_hidden_layers=32//2,
            num_attention_heads=32//2,
            num_key_value_heads=8//2,
            hidden_act="silu",
            max_position_embeddings=(4096) * (32),
            initializer_range=0.02,
            rms_norm_eps=1e-5,
            use_cache=True,
            pad_token_id=None,
            bos_token_id=1,
            eos_token_id=2,
            tie_word_embeddings=False,
            rope_theta=1e6,
            sliding_window=None,
            attention_dropout=0.0,
            num_experts_per_tok=2,
            num_local_experts=8,
            output_router_logits=False,
            router_aux_loss_coef=0.001)

    mixtralmodel = MixtralModel(config = configuration).to(device)
    inputs_ids = torch.randint(
        low=0,high=configuration.vocab_size,size=(4,30)
    ).to(device)        
    ds_model = deepspeed.init_inference(mixtralmodel,mp_size=4, dtype=torch.float16)
    res = ds_model(inputs_ids)
    print(res)

if __name__ == '__main__':
    run_mixtral_ds()

Expected behavior
The program run in multi-node and print a result.

ds_report output

Please run ds_report to give us details about your setup.

DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import('pkg_resources').require('deepspeed==0.14.3+0fc19b6a')
[2024-05-16 21:46:15,312] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-05-16 21:46:15,444] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] NVIDIA Inference is only supported on Ampere and newer architectures
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible

DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
ninja .................. [OKAY]

op name ................ installed .. compatible

[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
[WARNING] NVIDIA Inference is only supported on Ampere and newer architectures
fp_quantizer ........... [NO] ....... [NO]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]

DeepSpeed general environment info:
torch install path ............... ['/home/archlab/zyl/copy/jlq/anaconda3/lib/python3.11/site-packages/torch']
torch version .................... 2.3.0+cu121
deepspeed install path ........... ['/home/archlab/zyl/copy/jlq/project/DeepSpeed/deepspeed']
deepspeed info ................... 0.14.3+0fc19b6a, 0fc19b6, master
torch cuda version ............... 12.1
torch hip version ................ None
nvcc version ..................... 12.1
deepspeed wheel compiled w. ...... torch 2.3, cuda 12.1
shared memory (/dev/shm) size .... 125.89 GB

Screenshots

System info (please complete the following information):

OS: Ubuntu 18.04
GPU count and types: two machines with x2 V100s each
(if applicable) what DeepSpeed-MII version are you using
deepspeed==0.14.3+0fc19b6a
(if applicable) Hugging Face Transformers/Accelerate/etc. versions
transfomers ==4.40.2
accelerate==0.30.0
Python version:3.11
Any other relevant info about your setup

Docker context
Are you using a specific docker image that you can share?
Not use docker

Additional context
And there is error when I run the deepspeedExample in the same environment.
run the DeepSpeedExamples/inference/huggingface/text-generation/run-generation-script/test-gpt.sh

Error is

then I changed the parameter in deepspeed.init_inference

model = deepspeed.init_inference(model,
              mp_size=1,
              dtype=(torch.half if args.fp16 else torch.float),
              replace_with_kernel_inject=True)

Then error is

What should I do if I want to use deepspeed in multi-node?

Thanks

The text was updated successfully, but these errors were encountered:

leachee99 added bug Something isn't working inference labels May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]how to run the mixtral inference in multi-node? #5544

[Question]how to run the mixtral inference in multi-node? #5544

leachee99 commented May 17, 2024

[Question]how to run the mixtral inference in multi-node? #5544

[Question]how to run the mixtral inference in multi-node? #5544

Comments

leachee99 commented May 17, 2024

DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja ninja .................. [OKAY]

op name ................ installed .. compatible

NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
ninja .................. [OKAY]