Support for Meta LLaMA 3 with ORTModelForCausalLM for Faster Inference #1856

saleshwaram · 2024-05-15T14:23:26Z

Feature request

I would like to request support for using Meta LLaMA 3 with ORTModelForCausalLM for faster inference. This integration would leverage the capabilities of the ONNX Runtime (ORT) to optimize and accelerate the performance of Meta LLaMA 3 models.

Motivation

Currently, there is no direct support for integrating Meta LLaMA 3 with ORTModelForCausalLM on Hugging Face. This lack of integration leads to slower inference times, which can be a significant bottleneck in applications requiring real-time or near-real-time responses. Providing support for this integration would greatly enhance the performance and usability of Meta LLaMA 3 models, particularly in production environments where inference speed is critical.

Your contribution

While I may not have the expertise to implement this feature myself, I am willing to assist with testing and providing feedback on the integration process. Additionally, I can help with documentation and usage examples once the feature is implemented.

IlyasMoutawwakil · 2024-05-21T13:03:09Z

Hi! are you sure llama3 doesn't work ? it's the same architecture/model_type of llama2 so it should work out of the box
I'm running a script locally to export it to see if it works (the export is going smoothly with meta-llama/Meta-Llama-3-8B)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Meta LLaMA 3 with ORTModelForCausalLM for Faster Inference #1856

Support for Meta LLaMA 3 with ORTModelForCausalLM for Faster Inference #1856

saleshwaram commented May 15, 2024

IlyasMoutawwakil commented May 21, 2024

Support for Meta LLaMA 3 with ORTModelForCausalLM for Faster Inference #1856

Support for Meta LLaMA 3 with ORTModelForCausalLM for Faster Inference #1856

Comments

saleshwaram commented May 15, 2024

Feature request

Motivation

Your contribution

IlyasMoutawwakil commented May 21, 2024