app-store/description.md at main

tipi/app-store

Nicolas Meienberger 4b09d44d56 chore(ollama-nvidia): small config adjustments

2024-05-11 11:47:20 +02:00

Nvidia Instructions

To enable your Nvidia GPU in Docker:

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

⚠️ This app runs on port 11435. Take this into account when configuring tools connecting to the app.

Ollama has a REST API for running and managing models.

Generate a response

curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt":"Why is the sky blue?"
}'

Chat with a model

curl http://localhost:11434/api/chat -d '{
  "model": "llama3",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'

Ollama supports Nvidia GPUs with compute capability 5.0+.

Check your compute compatibility to see if your card is supported: https://developer.nvidia.com/cuda-gpus

Compute Capability	Family	Cards
9.0	NVIDIA	`H100`
8.9	GeForce RTX 40xx	`RTX 4090` `RTX 4080` `RTX 4070 Ti` `RTX 4060 Ti`
	NVIDIA Professional	`L4` `L40` `RTX 6000`
8.6	GeForce RTX 30xx	`RTX 3090 Ti` `RTX 3090` `RTX 3080 Ti` `RTX 3080` `RTX 3070 Ti` `RTX 3070` `RTX 3060 Ti` `RTX 3060`
	NVIDIA Professional	`A40` `RTX A6000` `RTX A5000` `RTX A4000` `RTX A3000` `RTX A2000` `A10` `A16` `A2`
8.0	NVIDIA	`A100` `A30`
7.5	GeForce GTX/RTX	`GTX 1650 Ti` `TITAN RTX` `RTX 2080 Ti` `RTX 2080` `RTX 2070` `RTX 2060`
	NVIDIA Professional	`T4` `RTX 5000` `RTX 4000` `RTX 3000` `T2000` `T1200` `T1000` `T600` `T500`
	Quadro	`RTX 8000` `RTX 6000` `RTX 5000` `RTX 4000`
7.0	NVIDIA	`TITAN V` `V100` `Quadro GV100`
6.1	NVIDIA TITAN	`TITAN Xp` `TITAN X`
	GeForce GTX	`GTX 1080 Ti` `GTX 1080` `GTX 1070 Ti` `GTX 1070` `GTX 1060` `GTX 1050`
	Quadro	`P6000` `P5200` `P4200` `P3200` `P5000` `P4000` `P3000` `P2200` `P2000` `P1000` `P620` `P600` `P500` `P520`
	Tesla	`P40` `P4`
6.0	NVIDIA	`Tesla P100` `Quadro GP100`
5.2	GeForce GTX	`GTX TITAN X` `GTX 980 Ti` `GTX 980` `GTX 970` `GTX 960` `GTX 950`
	Quadro	`M6000 24GB` `M6000` `M5000` `M5500M` `M4000` `M2200` `M2000` `M620`
	Tesla	`M60` `M40`
5.0	GeForce GTX	`GTX 750 Ti` `GTX 750` `NVS 810`
	Quadro	`K2200` `K1200` `K620` `M1200` `M520` `M5000M` `M4000M` `M3000M` `M2000M` `M1000M` `K620M` `M600M` `M500M`

Ollama supports a list of models available on ollama.com/library

Here are some example models that can be downloaded:

Model	Parameters	Size	Download
Llama 3	8B	4.7GB	`ollama run llama3`
Llama 3	70B	40GB	`ollama run llama3:70b`
Phi-3	3,8B	2.3GB	`ollama run phi3`
Mistral	7B	4.1GB	`ollama run mistral`
Neural Chat	7B	4.1GB	`ollama run neural-chat`
Starling	7B	4.1GB	`ollama run starling-lm`
Code Llama	7B	3.8GB	`ollama run codellama`
Llama 2 Uncensored	7B	3.8GB	`ollama run llama2-uncensored`
LLaVA	7B	4.5GB	`ollama run llava`
Gemma	2B	1.4GB	`ollama run gemma:2b`
Gemma	7B	4.8GB	`ollama run gemma:7b`
Solar	10.7B	6.1GB	`ollama run solar`

Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.