Local AI is simpler than it looks. Four steps below get you from nothing to chatting with a real LLM on your own hardware. No accounts or API keys are required for the local setup itself.
The Path
Four steps. Five minutes.
Click any command to copy it. Each step builds on the previous one — start at #1 and stop whenever you have what you need.
Security note: terminal commands and install scripts run on your own machine. Verify the source, inspect scripts before executing them, and use backups or an isolated environment when appropriate.
1
Install Ollama
One installer, all platforms. Handles model downloading, GPU detection, quantization, and an OpenAI-compatible API on localhost:11434. The single dependency for the rest of this guide.
curl -fsSL https://ollama.com/install.sh | sh
Or download from ollama.com for Windows/Mac.
2
Pull your first model
One command downloads and runs it. Ollama auto-selects the best quantization for your hardware. Llama 3.2 3B is a good starter — small, fast, and still very capable.
ollama run llama3.2:3b
~2 GB download. You're chatting in under a minute.
3
Add a web UI
Ollama's CLI works, but a web interface makes it actually enjoyable. Open WebUI is the community standard — ChatGPT-like with multi-user, document upload, RAG, web search, and plugins.
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway ghcr.io/open-webui/open-webui:main
Open localhost:3000. ChatGPT-like interface, your machine.
4
Scale up
Ready for more? Pull bigger models, add RAG with your documents, or set up image/video generation. Browse the catalog to find what fits your GPU.
ollama pull qwen3:32b
For 24GB GPUs. Or hit the model index to compare options.
Frontend Tools
Your steering wheel
Models are engines. These tools are how you actually drive them. Most people end up using 2–3.