Model Targeting System

// select mission parameters > lock hardware target > acquire optimal model

// BRIEFING

This tool recommends the right local AI model for your setup. Two constraints drive the pick: how much VRAM you have (which caps model size) and what you're doing — chat, coding, reasoning, embeddings, or vision each favor different models. Run a mission below, or read how to choose a model.

1
Phase 1: Mission
2
Phase 2: Hardware
3
Phase 3: Acquire

PHASE 01 Select Mission Type

PHASE 02 Lock Hardware Target

Available VRAM
or pick a specific GPU
or enter custom VRAM

PHASE 03 Model Acquisition

BRIEFING How to Choose a Local Model

Two questions decide which local model is right for you: what will it fit and what is it for. The targeting system above weighs both — here's the logic it uses, so the picks make sense.

// 1 — match model size to your VRAM (at 4-bit)

Your VRAMTarget sizeExample families
8 GB7–8BLlama 8B · Mistral 7B · Qwen 7B
12–16 GB13–14BQwen 14B · Phi medium
24 GB32B (tight)Qwen 32B · Gemma 27B
48 GB70B (tight)Llama 70B · Qwen 72B

Same VRAM math as the Stack Builder: weights ≈ params × 0.5 bytes at 4-bit, plus headroom. Model names are examples — the tool above picks current specifics from the live catalog.

// 2 — match the model to the mission

Common questions

What's the single best local model right now?

There isn't one — “best” depends on your VRAM and mission. A 7B that runs fast on your card beats a 70B that won't load. Use the targeting tool above for a pick matched to your exact setup.

Bigger model or higher quantization?

Generally a larger model at 4-bit beats a smaller model at 8-bit for the same VRAM — 4-bit costs little quality, and parameter count buys more capability. Drop below 4-bit only when nothing else fits.

Can I run a model the tool says won't fit?

Sometimes — with a smaller quant, shorter context, or CPU offload (much slower). The “won't fit” verdict assumes a usable configuration; treat it as the safe default, not an absolute wall.