Every model on this page can be running on your machine in minutes. Here's the path most people take:
Models are engines. These tools are the steering wheel. Each serves a different workflow — most people end up using 2–3 of them.
Every VRAM estimate on this page uses Q4_K_M as the baseline — the community default. Here's how the same 70B model looks at each compression level:
| Level | Bits | Size on Disk | VRAM | Quality | Like JPEG... |
|---|---|---|---|---|---|
| FP16 | 16-bit | ~140 GB | ~142 GB | Perfect — baseline | 100% — lossless |
| Q8_0 | 8-bit | ~70 GB | ~74 GB | Essentially identical | 95% — can't tell |
| Q6_K | 6.6-bit | ~58 GB | ~62 GB | Extremely close | 90% — pixel peeping only |
| Q4_K_M ★ | 4.8-bit | ~42 GB | ~46 GB | 99.5% preserved | 80% — the sweet spot |
| Q3_K_M | 3.9-bit | ~34 GB | ~38 GB | Subtle degradation | 60% — trained eyes notice |
| Q2_K | 2.6-bit | ~24 GB | ~28 GB | Noticeable loss | 30% — artifacts visible |
| IQ2_XS | 2.3-bit | ~21 GB | ~25 GB | Significant loss | 15% — it runs, barely |
Not all tasks degrade equally. Here's what you lose first as you compress harder — from most sensitive to most resilient:
Theory is nice, but here's what the tradeoffs actually look like with models from this page:
You have two strong options with 24 GB:
Two paths to a capable coding assistant:
The extreme case — a 109B model on consumer hardware:
GGUF = llama.cpp/Ollama format. Best for CPU+GPU split. Most models available here. GPTQ = GPU-only, fast batch inference. AWQ = activation-aware, better quality at same bits. EXL2 = variable bits-per-layer, optimal quality. For Ollama: GGUF. For vLLM: AWQ/GPTQ.
ollama pull llama3.3 gives you Q4_K_M by default. For specific quants: ollama pull llama3.3:70b-instruct-q8_0 or ollama pull llama3.3:70b-instruct-q2_K. Check available tags on ollama.com/library.
Stop thinking about individual models — here's what a complete local AI stack looks like at each hardware tier. Every model listed below is on this page with full details. All LLM VRAM at Q4_K_M unless noted.
ollama pull gemma3:1b
ollama pull gemma3:4b
ollama pull qwen3:0.6b
ollama pull qwen3:4b
ollama pull phi4-mini
ollama pull qwen3:8b
ollama pull mistral
ollama pull mistral-nemo
ollama pull deepseek-r1:8b
ollama pull smollm2:1.7b
ollama pull phi4:14b
ollama pull qwen3:14b
ollama pull deepseek-r1:14b
ollama pull nemotron-nano:12b
ollama pull mistral-small
ollama pull granite4
ollama pull granite4:small-h
ollama pull gpt-oss:20b
ollama pull devstral-small
ollama pull gemma3:27b
ollama pull glm4:latest
llama.cpp recommended — use --jinja flag
ollama pull nemotron-nano
ollama pull qwen3:32b
ollama pull deepseek-r1:32b
ollama pull qwq
ollama pull olmo-3.1
ollama pull olmo-3.1:7b
ollama pull qwen3:72b
ollama pull deepseek-r1:70b
ollama pull gpt-oss:120b
ollama pull qwen3:235b
ollama pull glm4.7
ollama pull deepseek-v3.2-exp
ollama pull glm5
ollama pull mistral-large
ollama pull qwen3-coder
ollama pull deepseek-coder-v2
ollama pull kimi-k2.5
ollama pull command-r-plus
ollama pull llava
ollama pull llama3.2-vision
ollama pull nomic-embed-text
ollama pull bge-m3
pip install openai-whisper
or faster-whisper for CTranslate2
pip install transformers accelerate
pip install nemo_toolkit[asr]
pip install nemo_toolkit[asr]
pip install moonshine
pip install orpheus-tts
pip install kokoro
pip install fish-speech
pip install TTS
pip install piper-tts
pip install git+https://github.com/suno-ai/bark.git
pip install melotts
ollama pull nomic-embed-text
ollama pull bge-m3
pip install sentence-transformers
ollama pull snowflake-arctic-embed
pip install FlagEmbedding
pip install colbert-ai
pip install diffusers transformers accelerate
SDXL 1.0 · SDXL Turbo · SDXL Lightning
pip install diffusers transformers
git clone https://github.com/comfyanonymous/ComfyUI.git
pip install diffusers transformers
pip install diffusers
pip install diffusers
pip install diffusers
pip install realesrgan
pip install diffusers
ComfyUI recommended
pip install diffusers
pip install diffusers
pip install cogkit
pip install diffusers
Missing a model? Found incorrect info? Have a feature request? Help make this reference better for everyone.