Running AI Locally

One of ClawDesk's best features is running AI models directly on your computer. No internet needed, no API keys, no monthly bills. Completely free.

Why Run AI Locally?

Benefit	Description
Free	No API costs, ever
Private	Your data never leaves your computer
Offline	Works without internet
Fast	No network latency — responses start instantly
No limits	No rate limits, no usage caps

Do I Have the Right Computer?

Local AI models run best with decent hardware. Here's what you need:

Minimum Requirements

Component	Minimum	Recommended
RAM	8 GB	16 GB or more
Storage	5 GB free	20 GB free
GPU	Not required (CPU works)	Any modern GPU with 6+ GB VRAM

What Can My Computer Run?

Your RAM	What you can run	Quality
8 GB	Small models (1-3B parameters)	Good for simple tasks
16 GB	Medium models (7-8B parameters)	Great for most tasks
32 GB	Large models (13-14B parameters)	Excellent quality
64 GB+	Any model	Best quality

tip

ClawDesk's Local Models page automatically detects your hardware and recommends models that will run well on your system. It shows a "fit score" for each model:

Perfect — Will run great
Good — Will run well
Marginal — Will run but may be slow
Too tight — Won't fit in your memory

Method 1: ClawDesk Built-in (Easiest)

ClawDesk has a built-in model manager that handles everything for you.

Step 1: Open Local Models

Click Local Models in the left sidebar (under the "System" group).

Step 2: Browse Available Models

You'll see a list of recommended models with:

Model name and description
Size — How much disk space and RAM it needs
Fit score — How well it matches your hardware

Step 3: Download a Model

Find a model marked as "Perfect" or "Good" fit
Click Download
Wait for the download to finish (you'll see a progress bar)

The model is saved to ~/.clawdesk/models/ as a GGUF file.

Step 4: Start the Model

Once downloaded, click Start next to the model
ClawDesk starts a local inference server automatically
The model appears as "Local (Built-in)" in your provider list

Step 5: Chat!

Go to the Chat page
Select the local model from the model picker
Start chatting — everything runs on your machine!

Method 2: Using Ollama

Ollama is a popular free tool for running local models. ClawDesk integrates with it seamlessly.

Step 1: Install Ollama

Download from ollama.com and install it.

import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';

Download the macOS app from ollama.com and drag it to Applications.

curl -fsSL https://ollama.com/install.sh | sh

Download the Windows installer from ollama.com and run it.

Step 2: Download a Model

Open your terminal and run:

ollama pull llama3.2

This downloads the Llama 3.2 model (~2 GB). Wait for it to finish.

Step 3: Connect to ClawDesk

Go to Settings → Providers
Find Ollama in the list
Make sure URL is http://localhost:11434
Click Save

ClawDesk automatically detects your Ollama models.

Step 4: Chat!

Go to Chat, select your Ollama model from the picker, and start chatting.

Recommended Models

Here are the best models to start with, ordered by size:

Small (runs on any computer with 8 GB RAM)

Model	Size	Good for	Command
Llama 3.2 (3B)	~2 GB	Quick chat, simple tasks	`ollama pull llama3.2`
Gemma 2 (2B)	~1.5 GB	Fast responses	`ollama pull gemma2:2b`
Phi-3 Mini	~2.3 GB	Reasoning, coding	`ollama pull phi3`

Medium (16 GB RAM recommended)

Model	Size	Good for	Command
Llama 3.2 (8B)	~4.7 GB	Great all-rounder	`ollama pull llama3.2:8b`
Mistral (7B)	~4.1 GB	Fast and capable	`ollama pull mistral`
CodeLlama (7B)	~3.8 GB	Programming	`ollama pull codellama`

Large (32 GB+ RAM recommended)

Model	Size	Good for	Command
Llama 3.1 (70B)	~40 GB	Highest quality	`ollama pull llama3.1:70b`
Mixtral 8x7B	~26 GB	Expert-level tasks	`ollama pull mixtral`
DeepSeek Coder V2	~8.9 GB	Best coding	`ollama pull deepseek-coder-v2`

Managing Running Models

Starting and Stopping

On the Local Models page:

Click Start to begin running a model
Click Stop to shut it down and free up memory
Only run models you're actively using — they consume RAM

TTL (Time to Live)

You can set a TTL for each model — it automatically stops after being idle for a set period. This saves memory when you forget to stop it.

Monitoring

The Local Models page shows:

Which models are running
How much memory each uses
Response speed metrics

Local vs. Cloud: A Comparison

	Local Models	Cloud AI (Claude, GPT-4, etc.)
Cost	Free	Pay per use
Privacy	100% private	Data sent to servers
Internet	Not needed	Required
Quality	Good to great	Best available
Speed	Depends on hardware	Consistent
Setup	Download once	Just add API key

Our recommendation: Use local models for everyday tasks and privacy-sensitive work. Use cloud AI (Claude, GPT-4) when you need the highest quality output.

Troubleshooting

Problem	Solution
Model runs slowly	Try a smaller model. Close other apps to free up RAM.
"Out of memory"	The model is too large for your computer. Try a smaller variant.
Ollama not detected	Make sure Ollama is running (`ollama serve` in terminal). Check the URL in settings.
Download stuck	Check your internet connection. For Ollama: `ollama pull model-name` to retry.
"llama-server not found"	For built-in models, ClawDesk needs `llama-server`. Install `llama.cpp` or use Ollama instead.

Setting Up Providers → — Configure local and cloud providers
Chatting with AI → — Use local models in conversation
Privacy & Security → — Maximum privacy with local models

Why Run AI Locally?​

Do I Have the Right Computer?​

Minimum Requirements​

What Can My Computer Run?​

Method 1: ClawDesk Built-in (Easiest)​

Step 1: Open Local Models​

Step 2: Browse Available Models​

Step 3: Download a Model​

Step 4: Start the Model​

Step 5: Chat!​

Method 2: Using Ollama​

Step 1: Install Ollama​

Step 2: Download a Model​

Step 3: Connect to ClawDesk​

Step 4: Chat!​

Recommended Models​

Small (runs on any computer with 8 GB RAM)​

Medium (16 GB RAM recommended)​

Large (32 GB+ RAM recommended)​

Managing Running Models​

Starting and Stopping​

TTL (Time to Live)​

Monitoring​

Local vs. Cloud: A Comparison​

Troubleshooting​

Related Guides​