Model Targeting System — Local AI Models

// BRIEFING

This tool recommends the right local AI model for your setup. Two constraints drive the pick: how much VRAM you have (which caps model size) and what you're doing — chat, coding, reasoning, embeddings, or vision each favor different models. Run a mission below, or read how to choose a model.

Phase 1: Mission

Phase 2: Hardware

Phase 3: Acquire

PHASE 01 Select Mission Type

PHASE 02 Lock Hardware Target

Available VRAM

or pick a specific GPU

or enter custom VRAM

PHASE 03 Model Acquisition

BRIEFING How to Choose a Local Model

Two questions decide which local model is right for you: what will it fit and what is it for. The targeting system above weighs both — here's the logic it uses, so the picks make sense.

// 1 — match model size to your VRAM (at 4-bit)

Your VRAM	Target size	Example families
8 GB	7–8B	Llama 8B · Mistral 7B · Qwen 7B
12–16 GB	13–14B	Qwen 14B · Phi medium
24 GB	32B (tight)	Qwen 32B · Gemma 27B
48 GB	70B (tight)	Llama 70B · Qwen 72B

Same VRAM math as the Stack Builder: weights ≈ params × 0.5 bytes at 4-bit, plus headroom. Model names are examples — the tool above picks current specifics from the live catalog.

// 2 — match the model to the mission

Chat / general instruction-tuned all-rounders — Llama, Qwen, Mistral, Gemma.
Coding code-specialized models — Qwen-Coder, Codestral, DeepSeek-Coder families.
Reasoning / math reasoning-tuned models — DeepSeek-R1 family, QwQ.
Embeddings / search embedding models, not chat models — BGE, Arctic-Embed, Nomic.
Vision vision-language models (VLMs) — Qwen-VL, Llava, Pixtral.

▶ Common questions

What's the single best local model right now?

There isn't one — “best” depends on your VRAM and mission. A 7B that runs fast on your card beats a 70B that won't load. Use the targeting tool above for a pick matched to your exact setup.

Bigger model or higher quantization?

Generally a larger model at 4-bit beats a smaller model at 8-bit for the same VRAM — 4-bit costs little quality, and parameter count buys more capability. Drop below 4-bit only when nothing else fits.

Can I run a model the tool says won't fit?

Sometimes — with a smaller quant, shorter context, or CPU offload (much slower). The “won't fit” verdict assumes a usable configuration; treat it as the safe default, not an absolute wall.