Recommended Stacks — Complete Local AI Setups by Hardware

Picking individual models is fun. Picking a complete stack is the actual job. These six are vetted combinations — VRAM-checked, license-checked, and battle-tested. Filter by hardware, copy the stack, ship.

RTX 4090 · 24 GB · Power User

Full-Stack Agent Workstation

LLMQwen3 32B @ Q4_K_M~22 GB

RAGNomic Embed Text V2~0.5 GB

STTWhisper V3 Turbo~3 GB

TTSKokoro 82M~0.3 GB

Peak (LLM + RAG concurrent)~22.5 GB

STT/TTS load on-demand, not concurrent with LLM. Swap LLM to Llama 3.3 70B@Q4 for broader knowledge (needs CPU offload). Add FLUX.1 Schnell for image gen — runs separately at ~8 GB.

RTX 4070 Ti Super · 16 GB · Mid Range

Coding & RAG Assistant

LLMPhi-4 14B @ Q8~16 GB

RAGSnowflake Arctic 335M~0.3 GB

STTDistil-Whisper~3 GB

TTSPiper (VITS)~0.1 GB

Peak (LLM + RAG)~16.3 GB

Tight fit — LLM fills the card. STT/TTS run after unloading LLM, or use CPU. Alt LLM: GLM-4.7-Flash 30B MoE@Q4 for coding (needs offload but faster via MoE).

RTX 3060/4060 · 12 GB · Budget

Capable Local Chat

LLMGemma 3 12B @ Q4~8 GB

RAGNomic Embed Text V2~0.5 GB

STTMoonshine Base~0.2 GB

TTSPiper (VITS)~0.1 GB

Peak (all concurrent)~8.8 GB

Room to breathe. Gemma 3 12B is surprisingly capable with vision. For image gen, SDXL Turbo fits at ~8 GB when LLM is unloaded.

Mac M-Series · 32 GB Unified · Apple Silicon

Unified Memory Advantage

LLMLlama 3.3 70B @ Q4~40 GB

RAGBGE-M3~0.5 GB

STTWhisper V3 Turbo~3 GB

TTSKokoro 82M~0.3 GB

Total memory footprint~44 GB

64 GB M-series runs 70B easily. 32 GB fits with swap pressure — reduce context or use Qwen3 32B@Q6 for comfort. MLX framework recommended. Generation ~15–25 t/s on M3 Max.

Air-Gapped · 24 GB · Compliance

Zero Cloud, Fully Licensed

LLMQwen3 32B @ Q4_K_M (Apache 2.0)~22 GB

RAGNomic Embed V2 (Apache 2.0)~0.5 GB

STTWhisper V3 Turbo (MIT)~3 GB

TTSPiper VITS (MIT)~0.1 GB

RerankBGE Reranker v2-M3 (Apache 2.0)~0.5 GB

Peak (LLM + RAG + Rerank)~23 GB

Every component Apache 2.0 or MIT — audit-ready for HIPAA, FedRAMP, regulated environments. Qwen3 supports native tool calling. No internet required post-deployment.

RTX 4090 · 24 GB · Creative

Image + Video + Voice Production

ImageFLUX.1 Schnell (Apache 2.0)~8 GB

VideoWan 2.1 14B (Apache 2.0)~14 GB

UpscaleReal-ESRGAN 4x~1 GB

TTSOrpheus 3B~4 GB

STTWhisper V3 Turbo (MIT)~3 GB

Peak (one model at a time)~14 GB

Run sequentially, not concurrent — load Wan for video, unload, load FLUX for images. ComfyUI manages model swapping automatically. Add Qwen3 8B (~6 GB) as a prompt enhancer between generations.

Complete stacks