Running AI Locally
One of ClawDesk's best features is running AI models directly on your computer. No internet needed, no API keys, no monthly bills. Completely free.
Why Run AI Locally?
| Benefit | Description |
|---|---|
| Free | No API costs, ever |
| Private | Your data never leaves your computer |
| Offline | Works without internet |
| Fast | No network latency — responses start instantly |
| No limits | No rate limits, no usage caps |
Do I Have the Right Computer?
Local AI models run best with decent hardware. Here's what you need:
Minimum Requirements
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 8 GB | 16 GB or more |
| Storage | 5 GB free | 20 GB free |
| GPU | Not required (CPU works) | Any modern GPU with 6+ GB VRAM |
What Can My Computer Run?
| Your RAM | What you can run | Quality |
|---|---|---|
| 8 GB | Small models (1-3B parameters) | Good for simple tasks |
| 16 GB | Medium models (7-8B parameters) | Great for most tasks |
| 32 GB | Large models (13-14B parameters) | Excellent quality |
| 64 GB+ | Any model | Best quality |
ClawDesk's Local Models page automatically detects your hardware and recommends models that will run well on your system. It shows a "fit score" for each model:
- Perfect — Will run great
- Good — Will run well
- Marginal — Will run but may be slow
- Too tight — Won't fit in your memory
Method 1: ClawDesk Built-in (Easiest)
ClawDesk has a built-in model manager that handles everything for you.
Step 1: Open Local Models
Click Local Models in the left sidebar (under the "System" group).
Step 2: Browse Available Models
You'll see a list of recommended models with:
- Model name and description
- Size — How much disk space and RAM it needs
- Fit score — How well it matches your hardware
Step 3: Download a Model
- Find a model marked as "Perfect" or "Good" fit
- Click Download
- Wait for the download to finish (you'll see a progress bar)
The model is saved to ~/.clawdesk/models/ as a GGUF file.
Step 4: Start the Model
- Once downloaded, click Start next to the model
- ClawDesk starts a local inference server automatically
- The model appears as "Local (Built-in)" in your provider list
Step 5: Chat!
- Go to the Chat page
- Select the local model from the model picker
- Start chatting — everything runs on your machine!
Method 2: Using Ollama
Ollama is a popular free tool for running local models. ClawDesk integrates with it seamlessly.
Step 1: Install Ollama
Download from ollama.com and install it.
import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';
Download the macOS app from ollama.com and drag it to Applications.
curl -fsSL https://ollama.com/install.sh | sh
Download the Windows installer from ollama.com and run it.
Step 2: Download a Model
Open your terminal and run:
ollama pull llama3.2
This downloads the Llama 3.2 model (~2 GB). Wait for it to finish.
Step 3: Connect to ClawDesk
- Go to Settings → Providers
- Find Ollama in the list
- Make sure URL is
http://localhost:11434 - Click Save
ClawDesk automatically detects your Ollama models.
Step 4: Chat!
Go to Chat, select your Ollama model from the picker, and start chatting.
Recommended Models
Here are the best models to start with, ordered by size:
Small (runs on any computer with 8 GB RAM)
| Model | Size | Good for | Command |
|---|---|---|---|
| Llama 3.2 (3B) | ~2 GB | Quick chat, simple tasks | ollama pull llama3.2 |
| Gemma 2 (2B) | ~1.5 GB | Fast responses | ollama pull gemma2:2b |
| Phi-3 Mini | ~2.3 GB | Reasoning, coding | ollama pull phi3 |
Medium (16 GB RAM recommended)
| Model | Size | Good for | Command |
|---|---|---|---|
| Llama 3.2 (8B) | ~4.7 GB | Great all-rounder | ollama pull llama3.2:8b |
| Mistral (7B) | ~4.1 GB | Fast and capable | ollama pull mistral |
| CodeLlama (7B) | ~3.8 GB | Programming | ollama pull codellama |
Large (32 GB+ RAM recommended)
| Model | Size | Good for | Command |
|---|---|---|---|
| Llama 3.1 (70B) | ~40 GB | Highest quality | ollama pull llama3.1:70b |
| Mixtral 8x7B | ~26 GB | Expert-level tasks | ollama pull mixtral |
| DeepSeek Coder V2 | ~8.9 GB | Best coding | ollama pull deepseek-coder-v2 |
Managing Running Models
Starting and Stopping
On the Local Models page:
- Click Start to begin running a model
- Click Stop to shut it down and free up memory
- Only run models you're actively using — they consume RAM
TTL (Time to Live)
You can set a TTL for each model — it automatically stops after being idle for a set period. This saves memory when you forget to stop it.
Monitoring
The Local Models page shows:
- Which models are running
- How much memory each uses
- Response speed metrics
Local vs. Cloud: A Comparison
| Local Models | Cloud AI (Claude, GPT-4, etc.) | |
|---|---|---|
| Cost | Free | Pay per use |
| Privacy | 100% private | Data sent to servers |
| Internet | Not needed | Required |
| Quality | Good to great | Best available |
| Speed | Depends on hardware | Consistent |
| Setup | Download once | Just add API key |
Our recommendation: Use local models for everyday tasks and privacy-sensitive work. Use cloud AI (Claude, GPT-4) when you need the highest quality output.
Troubleshooting
| Problem | Solution |
|---|---|
| Model runs slowly | Try a smaller model. Close other apps to free up RAM. |
| "Out of memory" | The model is too large for your computer. Try a smaller variant. |
| Ollama not detected | Make sure Ollama is running (ollama serve in terminal). Check the URL in settings. |
| Download stuck | Check your internet connection. For Ollama: ollama pull model-name to retry. |
| "llama-server not found" | For built-in models, ClawDesk needs llama-server. Install llama.cpp or use Ollama instead. |
Related Guides
- Setting Up Providers → — Configure local and cloud providers
- Chatting with AI → — Use local models in conversation
- Privacy & Security → — Maximum privacy with local models