Skip to main content

Running AI Locally

One of ClawDesk's best features is running AI models directly on your computer. No internet needed, no API keys, no monthly bills. Completely free.


Why Run AI Locally?

BenefitDescription
FreeNo API costs, ever
PrivateYour data never leaves your computer
OfflineWorks without internet
FastNo network latency — responses start instantly
No limitsNo rate limits, no usage caps

Do I Have the Right Computer?

Local AI models run best with decent hardware. Here's what you need:

Minimum Requirements

ComponentMinimumRecommended
RAM8 GB16 GB or more
Storage5 GB free20 GB free
GPUNot required (CPU works)Any modern GPU with 6+ GB VRAM

What Can My Computer Run?

Your RAMWhat you can runQuality
8 GBSmall models (1-3B parameters)Good for simple tasks
16 GBMedium models (7-8B parameters)Great for most tasks
32 GBLarge models (13-14B parameters)Excellent quality
64 GB+Any modelBest quality
tip

ClawDesk's Local Models page automatically detects your hardware and recommends models that will run well on your system. It shows a "fit score" for each model:

  • Perfect — Will run great
  • Good — Will run well
  • Marginal — Will run but may be slow
  • Too tight — Won't fit in your memory

Method 1: ClawDesk Built-in (Easiest)

ClawDesk has a built-in model manager that handles everything for you.

Step 1: Open Local Models

Click Local Models in the left sidebar (under the "System" group).

Step 2: Browse Available Models

You'll see a list of recommended models with:

  • Model name and description
  • Size — How much disk space and RAM it needs
  • Fit score — How well it matches your hardware

Step 3: Download a Model

  1. Find a model marked as "Perfect" or "Good" fit
  2. Click Download
  3. Wait for the download to finish (you'll see a progress bar)

The model is saved to ~/.clawdesk/models/ as a GGUF file.

Step 4: Start the Model

  1. Once downloaded, click Start next to the model
  2. ClawDesk starts a local inference server automatically
  3. The model appears as "Local (Built-in)" in your provider list

Step 5: Chat!

  1. Go to the Chat page
  2. Select the local model from the model picker
  3. Start chatting — everything runs on your machine!

Method 2: Using Ollama

Ollama is a popular free tool for running local models. ClawDesk integrates with it seamlessly.

Step 1: Install Ollama

Download from ollama.com and install it.

import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';

Download the macOS app from ollama.com and drag it to Applications.

curl -fsSL https://ollama.com/install.sh | sh

Download the Windows installer from ollama.com and run it.

Step 2: Download a Model

Open your terminal and run:

ollama pull llama3.2

This downloads the Llama 3.2 model (~2 GB). Wait for it to finish.

Step 3: Connect to ClawDesk

  1. Go to SettingsProviders
  2. Find Ollama in the list
  3. Make sure URL is http://localhost:11434
  4. Click Save

ClawDesk automatically detects your Ollama models.

Step 4: Chat!

Go to Chat, select your Ollama model from the picker, and start chatting.


Here are the best models to start with, ordered by size:

Small (runs on any computer with 8 GB RAM)

ModelSizeGood forCommand
Llama 3.2 (3B)~2 GBQuick chat, simple tasksollama pull llama3.2
Gemma 2 (2B)~1.5 GBFast responsesollama pull gemma2:2b
Phi-3 Mini~2.3 GBReasoning, codingollama pull phi3
ModelSizeGood forCommand
Llama 3.2 (8B)~4.7 GBGreat all-rounderollama pull llama3.2:8b
Mistral (7B)~4.1 GBFast and capableollama pull mistral
CodeLlama (7B)~3.8 GBProgrammingollama pull codellama
ModelSizeGood forCommand
Llama 3.1 (70B)~40 GBHighest qualityollama pull llama3.1:70b
Mixtral 8x7B~26 GBExpert-level tasksollama pull mixtral
DeepSeek Coder V2~8.9 GBBest codingollama pull deepseek-coder-v2

Managing Running Models

Starting and Stopping

On the Local Models page:

  • Click Start to begin running a model
  • Click Stop to shut it down and free up memory
  • Only run models you're actively using — they consume RAM

TTL (Time to Live)

You can set a TTL for each model — it automatically stops after being idle for a set period. This saves memory when you forget to stop it.

Monitoring

The Local Models page shows:

  • Which models are running
  • How much memory each uses
  • Response speed metrics

Local vs. Cloud: A Comparison

Local ModelsCloud AI (Claude, GPT-4, etc.)
CostFreePay per use
Privacy100% privateData sent to servers
InternetNot neededRequired
QualityGood to greatBest available
SpeedDepends on hardwareConsistent
SetupDownload onceJust add API key

Our recommendation: Use local models for everyday tasks and privacy-sensitive work. Use cloud AI (Claude, GPT-4) when you need the highest quality output.


Troubleshooting

ProblemSolution
Model runs slowlyTry a smaller model. Close other apps to free up RAM.
"Out of memory"The model is too large for your computer. Try a smaller variant.
Ollama not detectedMake sure Ollama is running (ollama serve in terminal). Check the URL in settings.
Download stuckCheck your internet connection. For Ollama: ollama pull model-name to retry.
"llama-server not found"For built-in models, ClawDesk needs llama-server. Install llama.cpp or use Ollama instead.