Picking individual models is fun. Picking a complete stack is the actual job. These six are vetted combinations — VRAM-checked, license-checked, and battle-tested. Filter by hardware, copy the stack, ship.
RTX 4090 · 24 GB · Power User
Full-Stack Agent Workstation
LLMQwen3 32B @ Q4_K_M~22 GB
RAGNomic Embed Text V2~0.5 GB
STTWhisper V3 Turbo~3 GB
TTSKokoro 82M~0.3 GB
Peak (LLM + RAG concurrent)~22.5 GB
STT/TTS load on-demand, not concurrent with LLM. Swap LLM to Llama 3.3 70B@Q4 for broader knowledge (needs CPU offload). Add FLUX.1 Schnell for image gen — runs separately at ~8 GB.
RTX 4070 Ti Super · 16 GB · Mid Range
Coding & RAG Assistant
LLMPhi-4 14B @ Q8~16 GB
RAGSnowflake Arctic 335M~0.3 GB
STTDistil-Whisper~3 GB
TTSPiper (VITS)~0.1 GB
Peak (LLM + RAG)~16.3 GB
Tight fit — LLM fills the card. STT/TTS run after unloading LLM, or use CPU. Alt LLM: GLM-4.7-Flash 30B MoE@Q4 for coding (needs offload but faster via MoE).
RTX 3060/4060 · 12 GB · Budget
Capable Local Chat
LLMGemma 3 12B @ Q4~8 GB
RAGNomic Embed Text V2~0.5 GB
STTMoonshine Base~0.2 GB
TTSPiper (VITS)~0.1 GB
Peak (all concurrent)~8.8 GB
Room to breathe. Gemma 3 12B is surprisingly capable with vision. For image gen, SDXL Turbo fits at ~8 GB when LLM is unloaded.
Mac M-Series · 32 GB Unified · Apple Silicon
Unified Memory Advantage
LLMLlama 3.3 70B @ Q4~40 GB
RAGBGE-M3~0.5 GB
STTWhisper V3 Turbo~3 GB
TTSKokoro 82M~0.3 GB
Total memory footprint~44 GB
64 GB M-series runs 70B easily. 32 GB fits with swap pressure — reduce context or use Qwen3 32B@Q6 for comfort. MLX framework recommended. Generation ~15–25 t/s on M3 Max.
Air-Gapped · 24 GB · Compliance
Zero Cloud, Fully Licensed
LLMQwen3 32B @ Q4_K_M (Apache 2.0)~22 GB
RAGNomic Embed V2 (Apache 2.0)~0.5 GB
STTWhisper V3 Turbo (MIT)~3 GB
TTSPiper VITS (MIT)~0.1 GB
RerankBGE Reranker v2-M3 (Apache 2.0)~0.5 GB
Peak (LLM + RAG + Rerank)~23 GB
Every component Apache 2.0 or MIT — audit-ready for HIPAA, FedRAMP, regulated environments. Qwen3 supports native tool calling. No internet required post-deployment.
RTX 4090 · 24 GB · Creative
Image + Video + Voice Production
ImageFLUX.1 Schnell (Apache 2.0)~8 GB
VideoWan 2.1 14B (Apache 2.0)~14 GB
UpscaleReal-ESRGAN 4x~1 GB
TTSOrpheus 3B~4 GB
STTWhisper V3 Turbo (MIT)~3 GB
Peak (one model at a time)~14 GB
Run sequentially, not concurrent — load Wan for video, unload, load FLUX for images. ComfyUI manages model swapping automatically. Add Qwen3 8B (~6 GB) as a prompt enhancer between generations.