Skip to content

Get up and running with Llama 2 and other large language models locally

License

Notifications You must be signed in to change notification settings

InterwebAlchemy/ollama

 
 

Repository files navigation

logo

Ollama

Discord

Run, create, and share large language models (LLMs).

Note: Ollama is in early preview. Please report any issues you find.

Download

Quickstart

To run and chat with Llama 2, the new model by Meta:

ollama run llama2

Model library

Ollama supports a list of open-source models available on ollama.ai/library

Here are some example open-source models that can be downloaded:

Model Parameters Size Download
Llama2 7B 3.8GB ollama pull llama2
Llama2 13B 13B 7.3GB ollama pull llama2:13b
Llama2 70B 70B 39GB ollama pull llama2:70b
Llama2 Uncensored 7B 3.8GB ollama pull llama2-uncensored
Code Llama 7B 3.8GB ollama pull codellama
Orca Mini 3B 1.9GB ollama pull orca-mini
Vicuna 7B 3.8GB ollama pull vicuna
Nous-Hermes 7B 3.8GB ollama pull nous-hermes
Nous-Hermes 13B 13B 7.3GB ollama pull nous-hermes:13b
Wizard Vicuna Uncensored 13B 7.3GB ollama pull wizard-vicuna

Note: You should have at least 8 GB of RAM to run the 3B models, 16 GB to run the 7B models, and 32 GB to run the 13B models.

Examples

Pull a public model

ollama pull llama2

This command can also be used to update a local model. Only updated changes will be pulled.

Run a model interactively

ollama run llama2
>>> hi
Hello! How can I help you today?

For multiline input, you can wrap text with """:

>>> """Hello,
... world!
... """
I'm a basic program that prints the famous "Hello, world!" message to the console.

Run a model non-interactively

$ ollama run llama2 'tell me a joke'
 Sure! Here's a quick one:
 Why did the scarecrow win an award? Because he was outstanding in his field!
$ cat <<EOF >prompts.txt
tell me a joke about llamas
tell me another one
EOF
$ ollama run llama2 <prompts.txt
>>> tell me a joke about llamas
 Why did the llama refuse to play hide-and-seek?
 nobody likes to be hided!

>>> tell me another one
 Sure, here's another one:

Why did the llama go to the bar?
To have a hay-often good time!

Run a model on contents of a text file

$ ollama run llama2 "summarize this file:" "$(cat README.md)"
 Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.

Customize a model

Pull a base model:

ollama pull llama2

Create a Modelfile:

FROM llama2

# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1

# set the system prompt
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""

Next, create and run the model:

ollama create mario -f ./Modelfile
ollama run mario
>>> hi
Hello! It's your friend Mario.

For more examples, see the examples directory. For more information on creating a Modelfile, see the Modelfile documentation.

Listing local models

ollama list

Removing local models

ollama rm llama2

Model packages

Overview

Ollama bundles model weights, configurations, and data into a single package, defined by a Modelfile.

logo

Building

Install cmake and go:

brew install cmake
brew install go

Then generate dependencies and build:

go generate ./...
go build .

Next, start the server:

./ollama serve

Finally, in a separate shell, run a model:

./ollama run llama2

REST API

See the API documentation for all endpoints.

Ollama has an API for running and managing models. For example to generate text from a model:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt":"Why is the sky blue?"
}'

Community Projects using Ollama

Project Description
LangChain and LangChain.js Also, there is a question-answering example.
Continue Embeds Ollama inside Visual Studio Code. The extension lets you highlight code to add to the prompt, ask questions in the sidebar, and generate code inline.
LiteLLM Lightweight Python package to simplify LLM API calls.
Discord AI Bot Interact with Ollama as a chatbot on Discord.
Raycast Ollama Raycast extension to use Ollama for local llama inference on Raycast.
Simple HTML UI Also, there is a Chrome extension.
Emacs client

About

Get up and running with Llama 2 and other large language models locally

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 86.7%
  • TypeScript 7.8%
  • Python 4.0%
  • Shell 0.8%
  • Dockerfile 0.3%
  • CSS 0.2%
  • Other 0.2%