Model Targeting System
// select mission parameters > lock hardware target > acquire optimal model
// BRIEFING
This tool recommends the right local AI model for your setup. Two constraints drive the pick: how much VRAM you have (which caps model size) and what you're doing — chat, coding, reasoning, embeddings, or vision each favor different models. Run a mission below, or read how to choose a model.
PHASE 01 Select Mission Type
PHASE 02 Lock Hardware Target
PHASE 03 Model Acquisition
BRIEFING How to Choose a Local Model
Two questions decide which local model is right for you: what will it fit and what is it for. The targeting system above weighs both — here's the logic it uses, so the picks make sense.
// 1 — match model size to your VRAM (at 4-bit)
| Your VRAM | Target size | Example families |
|---|---|---|
| 8 GB | 7–8B | Llama 8B · Mistral 7B · Qwen 7B |
| 12–16 GB | 13–14B | Qwen 14B · Phi medium |
| 24 GB | 32B (tight) | Qwen 32B · Gemma 27B |
| 48 GB | 70B (tight) | Llama 70B · Qwen 72B |
Same VRAM math as the Stack Builder: weights ≈ params × 0.5 bytes at 4-bit, plus headroom. Model names are examples — the tool above picks current specifics from the live catalog.
// 2 — match the model to the mission
- Chat / general instruction-tuned all-rounders — Llama, Qwen, Mistral, Gemma.
- Coding code-specialized models — Qwen-Coder, Codestral, DeepSeek-Coder families.
- Reasoning / math reasoning-tuned models — DeepSeek-R1 family, QwQ.
- Embeddings / search embedding models, not chat models — BGE, Arctic-Embed, Nomic.
- Vision vision-language models (VLMs) — Qwen-VL, Llava, Pixtral.
▶ Common questions
What's the single best local model right now?
There isn't one — “best” depends on your VRAM and mission. A 7B that runs fast on your card beats a 70B that won't load. Use the targeting tool above for a pick matched to your exact setup.
Bigger model or higher quantization?
Generally a larger model at 4-bit beats a smaller model at 8-bit for the same VRAM — 4-bit costs little quality, and parameter count buys more capability. Drop below 4-bit only when nothing else fits.
Can I run a model the tool says won't fit?
Sometimes — with a smaller quant, shorter context, or CPU offload (much slower). The “won't fit” verdict assumes a usable configuration; treat it as the safe default, not an absolute wall.