6.0 KiB
Executable File
6.0 KiB
Executable File
Nvidia Instructions
To enable your Nvidia GPU in Docker:
-
You need to install the NVIDIA Container Toolkit
-
And configure Docker to use Nvidia driver
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Usage
⚠️ This app runs on port 11435. Take this into account when configuring tools connecting to the app.
Use with a frontend
Try the REST API
Ollama has a REST API for running and managing models.
Generate a response
curl http://localhost:11434/api/generate -d '{
"model": "llama3",
"prompt":"Why is the sky blue?"
}'
Chat with a model
curl http://localhost:11434/api/chat -d '{
"model": "llama3",
"messages": [
{ "role": "user", "content": "why is the sky blue?" }
]
}'
Compatible GPUs
Ollama supports Nvidia GPUs with compute capability 5.0+.
Check your compute compatibility to see if your card is supported: https://developer.nvidia.com/cuda-gpus
Compute Capability | Family | Cards |
---|---|---|
9.0 | NVIDIA | H100 |
8.9 | GeForce RTX 40xx | RTX 4090 RTX 4080 RTX 4070 Ti RTX 4060 Ti |
NVIDIA Professional | L4 L40 RTX 6000 |
|
8.6 | GeForce RTX 30xx | RTX 3090 Ti RTX 3090 RTX 3080 Ti RTX 3080 RTX 3070 Ti RTX 3070 RTX 3060 Ti RTX 3060 |
NVIDIA Professional | A40 RTX A6000 RTX A5000 RTX A4000 RTX A3000 RTX A2000 A10 A16 A2 |
|
8.0 | NVIDIA | A100 A30 |
7.5 | GeForce GTX/RTX | GTX 1650 Ti TITAN RTX RTX 2080 Ti RTX 2080 RTX 2070 RTX 2060 |
NVIDIA Professional | T4 RTX 5000 RTX 4000 RTX 3000 T2000 T1200 T1000 T600 T500 |
|
Quadro | RTX 8000 RTX 6000 RTX 5000 RTX 4000 |
|
7.0 | NVIDIA | TITAN V V100 Quadro GV100 |
6.1 | NVIDIA TITAN | TITAN Xp TITAN X |
GeForce GTX | GTX 1080 Ti GTX 1080 GTX 1070 Ti GTX 1070 GTX 1060 GTX 1050 |
|
Quadro | P6000 P5200 P4200 P3200 P5000 P4000 P3000 P2200 P2000 P1000 P620 P600 P500 P520 |
|
Tesla | P40 P4 |
|
6.0 | NVIDIA | Tesla P100 Quadro GP100 |
5.2 | GeForce GTX | GTX TITAN X GTX 980 Ti GTX 980 GTX 970 GTX 960 GTX 950 |
Quadro | M6000 24GB M6000 M5000 M5500M M4000 M2200 M2000 M620 |
|
Tesla | M60 M40 |
|
5.0 | GeForce GTX | GTX 750 Ti GTX 750 NVS 810 |
Quadro | K2200 K1200 K620 M1200 M520 M5000M M4000M M3000M M2000M M1000M K620M M600M M500M |
Model library
Ollama supports a list of models available on ollama.com/library
Here are some example models that can be downloaded:
Model | Parameters | Size | Download |
---|---|---|---|
Llama 3 | 8B | 4.7GB | ollama run llama3 |
Llama 3 | 70B | 40GB | ollama run llama3:70b |
Phi-3 | 3,8B | 2.3GB | ollama run phi3 |
Mistral | 7B | 4.1GB | ollama run mistral |
Neural Chat | 7B | 4.1GB | ollama run neural-chat |
Starling | 7B | 4.1GB | ollama run starling-lm |
Code Llama | 7B | 3.8GB | ollama run codellama |
Llama 2 Uncensored | 7B | 3.8GB | ollama run llama2-uncensored |
LLaVA | 7B | 4.5GB | ollama run llava |
Gemma | 2B | 1.4GB | ollama run gemma:2b |
Gemma | 7B | 4.8GB | ollama run gemma:7b |
Solar | 10.7B | 6.1GB | ollama run solar |
Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.