Skip to main content

Running AI Locally

One of ClawDesk's most powerful features is the ability to run AI models directly on your computer. This means:

  • No internet needed — Works on an airplane, in a cabin, anywhere
  • Completely free — No API keys, no subscriptions, no per-message costs
  • Completely private — Your conversations never leave your machine
  • No rate limits — Chat as much as you want

What Does "Running AI Locally" Mean?

When you use ChatGPT or Claude in a browser, your messages travel over the internet to powerful computers owned by those companies. They process your message and send the response back.

Running AI locally means downloading the AI brain (called a "model") onto your own computer. Your computer does all the thinking, and nothing ever goes to the internet.


Do I Have the Right Computer?

Local AI needs some computing power. Here's a simple guide:

Minimum Requirements

ComponentMinimumRecommendedBest
RAM8 GB16 GB32+ GB
Storage10 GB free50 GB free100+ GB free
CPUAny modern CPUApple M1+ / Recent Intel/AMDApple M2+ / Latest CPUs
GPUNot requiredNVIDIA GPU (6+ GB VRAM)NVIDIA RTX (12+ GB VRAM)

What Can Your Computer Run?

Your SetupModels You Can RunQuality
8 GB RAM, no GPUSmall models (1-3B parameters)Basic conversations, simple tasks
16 GB RAM, no GPUMedium models (7-8B parameters)Good conversations, coding help
16 GB RAM + GPUMedium-large models (7-13B)Very good quality
32+ GB RAM or GPU with 12+ GBLarge models (34-70B)Excellent, near cloud quality
Apple M1/M2/M3 MacThanks to unified memory, even 8GB Macs run 7B models wellGood to excellent
tip

Apple Silicon Macs (M1/M2/M3/M4) are especially good at running local AI because they share memory between the CPU and GPU. A Mac with 16 GB of RAM can run models that would need a dedicated GPU on other systems.


Method 1: ClawDesk Built-in Local Models

The easiest way — no extra software needed.

Step 1: Open the Local Models Page

In ClawDesk, click "Local Models" in the sidebar.

Step 2: Check Your System

ClawDesk automatically detects your hardware:

  • CPU type and speed
  • Amount of RAM
  • GPU (if any)
  • Available storage space

Based on your hardware, ClawDesk recommends models that will run well on your machine:

ModelSizeWhat It's Good AtMin RAM
Llama 3.1 8B~4.5 GBGeneral chat, writing8 GB
Mistral 7B~4 GBFast conversations8 GB
Code Llama 7B~3.8 GBProgramming help8 GB
Phi-3 Mini~2.3 GBQuick, lightweight tasks4 GB
Llama 3.1 70B~40 GBNear cloud quality64 GB

Step 4: Download a Model

  1. Find a model you want to try
  2. Click the Download button
  3. Wait for the download (this can take a few minutes depending on model size)
  4. Models are stored in ~/.clawdesk/models/

Step 5: Start the Model

  1. After downloading, click Start next to the model
  2. ClawDesk launches a local inference server
  3. A green status indicator shows the model is running

Step 6: Use It in Chat

  1. Go to the Chat page
  2. In the model dropdown, select "Local (Built-in)"
  3. Choose your running model
  4. Start chatting — everything happens on your computer!

Method 2: Using Ollama

Ollama is a popular tool for running local AI. ClawDesk integrates with it seamlessly.

Step 1: Install Ollama

macOS:

brew install ollama

Windows / Linux: Download from ollama.com/download

Step 2: Download a Model

Open your terminal (or command prompt) and run:

ollama pull llama3.1

This downloads the Llama 3.1 model. You can replace llama3.1 with any model from ollama.com/library.

Step 3: Start Ollama

Ollama usually starts automatically. If not:

ollama serve

Step 4: Connect ClawDesk

  1. In ClawDesk, go to Settings → Providers
  2. Find Ollama and click Configure
  3. Base URL: http://localhost:11434 (this is the default)
  4. Click Save
  5. Your Ollama models now appear in the chat model dropdown
# General purpose — great all-rounder
ollama pull llama3.1

# For coding
ollama pull codellama

# Very fast, lightweight
ollama pull mistral

# Google's model
ollama pull gemma2

# For creative writing
ollama pull llama3.1:70b # Need 64GB RAM

Tips for Best Performance

1. Close Other Apps

Local AI uses a lot of memory. Close browser tabs and other heavy apps while using local models.

2. Start with Small Models

If you're not sure about your hardware, start with a small model like Phi-3 or Mistral 7B. If it runs smoothly, try a larger one.

3. Watch Your Temperature

On laptops, local AI can make your computer warm. Make sure your laptop has good ventilation. If it gets too hot, switch to a smaller model.

4. GPU Acceleration

If you have an NVIDIA GPU, make sure you have the latest drivers installed. ClawDesk and Ollama will automatically use your GPU for faster responses.

5. Model Quality vs. Speed


Comparison: Local vs. Cloud

AspectLocal AICloud AI (Claude, GPT)
CostFreePay per use
Privacy100% privateData goes to servers
InternetNot neededRequired
SpeedDepends on your hardwareUsually very fast
QualityGood (can be excellent with big models)Excellent
SetupDownload model (one time)Get API key (instant)
Context lengthLimited by RAMVery large (128K-1M+ tokens)

When Should You Use Each?

Use Local AI when:

  • You're chatting casually or brainstorming
  • You're offline or have slow internet
  • You care deeply about privacy
  • You don't want any costs
  • You're processing sensitive documents

Use Cloud AI when:

  • You need the best possible quality
  • You're working on complex coding tasks
  • You're processing very long documents
  • You need the latest AI capabilities
  • Speed is critical and you have a slower computer

Troubleshooting

ProblemSolution
Model is very slowTry a smaller model, close other apps, check if GPU is being used
"Not enough memory"Choose a smaller model or add more RAM
Ollama won't startRun ollama serve in terminal, check if port 11434 is free
Model gives poor answersTry a larger model, or use cloud AI for complex tasks
Download stuckCheck internet connection, try again, check disk space
Computer fan is loudNormal with local AI — ensure good ventilation

Next Steps