app-store/apps/ollama-nvidia/metadata/description.md
2024-05-11 11:47:20 +02:00

6.0 KiB
Executable File

Nvidia Instructions

To enable your Nvidia GPU in Docker:

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Usage

⚠️ This app runs on port 11435. Take this into account when configuring tools connecting to the app.

Use with a frontend


Try the REST API

Ollama has a REST API for running and managing models.

Generate a response

curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt":"Why is the sky blue?"
}'

Chat with a model

curl http://localhost:11434/api/chat -d '{
  "model": "llama3",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'

Compatible GPUs

Ollama supports Nvidia GPUs with compute capability 5.0+.

Check your compute compatibility to see if your card is supported: https://developer.nvidia.com/cuda-gpus

Compute Capability Family Cards
9.0 NVIDIA H100
8.9 GeForce RTX 40xx RTX 4090 RTX 4080 RTX 4070 Ti RTX 4060 Ti
NVIDIA Professional L4 L40 RTX 6000
8.6 GeForce RTX 30xx RTX 3090 Ti RTX 3090 RTX 3080 Ti RTX 3080 RTX 3070 Ti RTX 3070 RTX 3060 Ti RTX 3060
NVIDIA Professional A40 RTX A6000 RTX A5000 RTX A4000 RTX A3000 RTX A2000 A10 A16 A2
8.0 NVIDIA A100 A30
7.5 GeForce GTX/RTX GTX 1650 Ti TITAN RTX RTX 2080 Ti RTX 2080 RTX 2070 RTX 2060
NVIDIA Professional T4 RTX 5000 RTX 4000 RTX 3000 T2000 T1200 T1000 T600 T500
Quadro RTX 8000 RTX 6000 RTX 5000 RTX 4000
7.0 NVIDIA TITAN V V100 Quadro GV100
6.1 NVIDIA TITAN TITAN Xp TITAN X
GeForce GTX GTX 1080 Ti GTX 1080 GTX 1070 Ti GTX 1070 GTX 1060 GTX 1050
Quadro P6000 P5200 P4200 P3200 P5000 P4000 P3000 P2200 P2000 P1000 P620 P600 P500 P520
Tesla P40 P4
6.0 NVIDIA Tesla P100 Quadro GP100
5.2 GeForce GTX GTX TITAN X GTX 980 Ti GTX 980 GTX 970 GTX 960 GTX 950
Quadro M6000 24GB M6000 M5000 M5500M M4000 M2200 M2000 M620
Tesla M60 M40
5.0 GeForce GTX GTX 750 Ti GTX 750 NVS 810
Quadro K2200 K1200 K620 M1200 M520 M5000M M4000M M3000M M2000M M1000M K620M M600M M500M

Model library

Ollama supports a list of models available on ollama.com/library

Here are some example models that can be downloaded:

Model Parameters Size Download
Llama 3 8B 4.7GB ollama run llama3
Llama 3 70B 40GB ollama run llama3:70b
Phi-3 3,8B 2.3GB ollama run phi3
Mistral 7B 4.1GB ollama run mistral
Neural Chat 7B 4.1GB ollama run neural-chat
Starling 7B 4.1GB ollama run starling-lm
Code Llama 7B 3.8GB ollama run codellama
Llama 2 Uncensored 7B 3.8GB ollama run llama2-uncensored
LLaVA 7B 4.5GB ollama run llava
Gemma 2B 1.4GB ollama run gemma:2b
Gemma 7B 4.8GB ollama run gemma:7b
Solar 10.7B 6.1GB ollama run solar

Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.