Memory & Skills
ClawDesk features a built-in memory system for long-term knowledge retention and a composable skill system for extending agent capabilities with reusable prompt fragments and tool bindings.
Memory System
The memory system gives agents persistent knowledge beyond the conversation context window. It combines BM25 keyword search, vector embeddings, and hybrid retrieval to find relevant information from past conversations and ingested documents.
Architecture
Key Components
| Component | Module | Description |
|---|---|---|
MemoryManager | pipeline | Top-level orchestrator for memory operations |
HybridSearcher | hybrid | Combines BM25 and vector search with RRF |
BatchPipeline | pipeline | Batch ingestion with chunking and embedding |
Bm25Index | bm25 | BM25 keyword index (Okapi BM25F) |
EmbeddingProvider | embedding | Abstraction over embedding backends |
Reranker | reranker | Cross-encoder reranking for precision |
Configuration
[memory]
enabled = true
data_dir = "${CLAWDESK_DATA_DIR}/memory"
# Chunking settings
[memory.chunking]
strategy = "semantic" # "fixed" | "sentence" | "semantic"
chunk_size = 512 # target tokens per chunk
chunk_overlap = 64 # overlap between chunks
max_chunk_size = 1024 # hard limit
# Embedding configuration
[memory.embedding]
provider = "ollama" # "ollama" | "openai" | "local"
model = "nomic-embed-text" # embedding model
dimensions = 768 # embedding dimensions
batch_size = 32 # chunks per embedding batch
# BM25 configuration
[memory.bm25]
k1 = 1.2 # term frequency saturation
b = 0.75 # document length normalization
avg_doc_length = 256 # estimated average document length
# Hybrid search
[memory.search]
strategy = "hybrid" # "bm25" | "vector" | "hybrid"
bm25_weight = 0.3 # weight for BM25 in RRF
vector_weight = 0.7 # weight for vector search in RRF
top_k = 10 # results to retrieve
rerank = true # enable cross-encoder reranking
rerank_model = "cross-encoder/ms-marco-MiniLM-L-6-v2"
# Auto-ingestion
[memory.auto_ingest]
enabled = true
conversations = true # ingest conversation turns
min_message_length = 50 # skip short messages
exclude_channels = [] # channel IDs to exclude
BM25 Search
The BM25 module provides fast keyword-based retrieval using the Okapi BM25F algorithm:
pub struct Bm25Index {
documents: Vec<IndexedDocument>,
inverted_index: HashMap<String, Vec<(DocId, f32)>>,
config: Bm25Config,
}
impl Bm25Index {
/// Add a document to the index
pub fn index(&mut self, doc: Document) -> DocId;
/// Search with a text query
pub fn search(&self, query: &str, top_k: usize) -> Vec<SearchResult>;
/// Remove a document
pub fn remove(&mut self, doc_id: &DocId);
/// Persist the index to disk
pub fn save(&self, path: &Path) -> Result<()>;
/// Load from disk
pub fn load(path: &Path) -> Result<Self>;
}
BM25 excels at finding exact keyword matches and is complementary to vector search for factual queries:
# BM25 is great for:
# - "What is the API key for Anthropic?" (exact term: "API key", "Anthropic")
# - "Error code 429" (exact match: "429")
# - "John's phone number" (exact name: "John")
Embedding Provider
The embedding module supports multiple backends:
pub enum EmbeddingProvider {
Ollama(OllamaEmbedder),
OpenAI(OpenAIEmbedder),
Local(LocalEmbedder),
}
impl EmbeddingProvider {
/// Embed a single text
pub async fn embed(&self, text: &str) -> Result<Vec<f32>>;
/// Embed a batch of texts
pub async fn embed_batch(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>>;
/// Get the embedding dimensions
pub fn dimensions(&self) -> usize;
}
Supported embedding models:
| Provider | Model | Dimensions | Speed |
|---|---|---|---|
| Ollama | nomic-embed-text | 768 | Fast |
| Ollama | mxbai-embed-large | 1024 | Medium |
| OpenAI | text-embedding-3-small | 1536 | Fast |
| OpenAI | text-embedding-3-large | 3072 | Medium |
| Local | all-MiniLM-L6-v2 | 384 | Fastest |
Hybrid Search (RRF)
The HybridSearcher combines BM25 and vector search using Reciprocal Rank Fusion:
pub struct HybridSearcher {
bm25: Arc<Bm25Index>,
vector_store: Arc<VectorStore>,
config: HybridConfig,
}
impl HybridSearcher {
pub async fn search(&self, query: &str, top_k: usize) -> Result<Vec<SearchResult>> {
// 1. Run BM25 search
let bm25_results = self.bm25.search(query, top_k * 2);
// 2. Run vector search
let embedding = self.embedder.embed(query).await?;
let vector_results = self.vector_store
.search(&embedding, top_k * 2)
.await?;
// 3. Fuse with RRF
let fused = reciprocal_rank_fusion(
&bm25_results,
&vector_results,
self.config.bm25_weight,
self.config.vector_weight,
60, // RRF constant k
);
// 4. Rerank (optional)
if self.config.rerank {
self.reranker.rerank(query, &fused, top_k).await
} else {
Ok(fused.into_iter().take(top_k).collect())
}
}
}
The RRF formula:
$$ \text{RRF}(d) = \sum_{r \in R} \frac{w_r}{k + \text{rank}_r(d)} $$
where $R$ is the set of result lists, $w_r$ is the weight for list $r$, $k$ is a constant (default 60), and $\text{rank}_r(d)$ is the rank of document $d$ in list $r$.
Hybrid search consistently outperforms either BM25 or vector search alone. The default weights (0.3 BM25/0.7 vector) work well for most use cases. Increase BM25 weight for more factual/keyword-heavy workloads.
Reranker
The cross-encoder reranker provides a second pass of scoring for improved precision:
pub struct Reranker {
model: CrossEncoderModel,
}
impl Reranker {
pub async fn rerank(
&self,
query: &str,
candidates: &[SearchResult],
top_k: usize,
) -> Result<Vec<SearchResult>> {
let mut scored: Vec<(f32, &SearchResult)> = Vec::new();
for candidate in candidates {
let score = self.model.score(query, &candidate.text).await?;
scored.push((score, candidate));
}
scored.sort_by(|a, b| b.0.partial_cmp(&a.0).unwrap());
Ok(scored.into_iter().take(top_k).map(|(_, r)| r.clone()).collect())
}
}
Ingestion Pipeline
pub struct BatchPipeline {
chunker: Box<dyn Chunker>,
embedder: Arc<EmbeddingProvider>,
bm25: Arc<RwLock<Bm25Index>>,
vector_store: Arc<VectorStore>,
}
impl BatchPipeline {
/// Ingest a document
pub async fn ingest(&self, document: Document) -> Result<IngestResult> {
// 1. Chunk the document
let chunks = self.chunker.chunk(&document)?;
// 2. Generate embeddings (batched)
let texts: Vec<&str> = chunks.iter().map(|c| c.text.as_str()).collect();
let embeddings = self.embedder.embed_batch(&texts).await?;
// 3. Index in BM25
{
let mut bm25 = self.bm25.write().await;
for chunk in &chunks {
bm25.index(chunk.into());
}
}
// 4. Store embeddings
self.vector_store.insert_batch(&chunks, &embeddings).await?;
Ok(IngestResult {
chunks_created: chunks.len(),
document_id: document.id,
})
}
}
CLI Operations
# Ingest a file
clawdesk memory ingest ./docs/manual.pdf
# Ingest a directory
clawdesk memory ingest ./knowledge-base/ --recursive
# Search memory
clawdesk memory search "How do I configure Telegram?"
# Show memory stats
clawdesk memory stats
# Output:
# Documents: 142
# Chunks: 3,847
# BM25 index size: 2.1 MB
# Vector store size: 48.3 MB
# Embedding model: nomic-embed-text (768 dims)
# Clear memory
clawdesk memory clear --confirm
# Export memory
clawdesk memory export --format jsonl > backup.jsonl
# Import memory
clawdesk memory import backup.jsonl
Skills System
Skills are composable units of agent capability. Each skill is a combination of a prompt fragment, tool bindings, parameters, and dependencies—packaged for reuse and hot-reloading.
Skill Structure
Defining a Skill
Skills are defined in TOML files:
# skills/customer-support.toml
[skill]
name = "customer_support"
version = "1.0.0"
description = "Customer support agent with ticket management"
author = "Acme Corp"
[skill.prompt]
fragment = """
You are a customer support agent for Acme Corp.
Always be polite and professional.
When a customer reports an issue:
1. Acknowledge the problem
2. Search the knowledge base for solutions
3. If no solution found, create a support ticket
4. Provide the ticket number to the customer
"""
position = "prepend" # "prepend" | "append" | "replace"
[skill.tools]
required = ["knowledge_base", "ticket_create", "ticket_status"]
optional = ["email_send"]
[skill.params]
company_name = { type = "string", default = "Acme Corp" }
escalation_email = { type = "string", required = true }
max_ticket_priority = { type = "integer", default = 3 }
auto_assign = { type = "boolean", default = true }
[skill.deps]
knowledge_base = ">=1.0.0"
Skill Selection: Weighted Knapsack
When multiple skills are available, ClawDesk uses a weighted knapsack algorithm to select the optimal subset that fits within the token budget:
The selection algorithm runs in $O(k \log k)$ where $k$ is the number of candidate skills:
pub struct SkillSelector {
token_budget: usize,
}
impl SkillSelector {
/// Select skills that fit within the token budget
pub fn select(
&self,
candidates: &[ScoredSkill],
budget: usize,
) -> Vec<&ScoredSkill> {
// Sort by score/token_cost ratio (descending)
let mut ranked: Vec<_> = candidates.iter()
.map(|s| (s, s.relevance_score / s.token_cost as f32))
.collect();
ranked.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
// Greedy knapsack
let mut selected = Vec::new();
let mut remaining_budget = budget;
for (skill, _ratio) in ranked {
if skill.token_cost <= remaining_budget {
selected.push(skill);
remaining_budget -= skill.token_cost;
}
}
selected
}
}
Configuration
[skills]
enabled = true
directory = "${CLAWDESK_DATA_DIR}/skills"
hot_reload = true
reload_interval_secs = 5
# Token budget for skills (from the total context window)
token_budget = 2048
# Skill-specific overrides
[skills.overrides.customer_support]
enabled = true
params.company_name = "My Company"
params.escalation_email = "support@mycompany.com"
[skills.overrides.code_review]
enabled = false # disable this skill
Hot Reloading
Skills support hot-reloading. When a skill file changes on disk, ClawDesk automatically picks up the changes:
# Watch for skill changes (in gateway mode, this happens automatically)
$ clawdesk gateway
[INFO] Watching skills directory: /home/user/.clawdesk/data/skills
[INFO] Loaded 5 skills: customer_support, code_review, writing, research, math
# ... edit a skill file ...
[INFO] Skill reloaded: customer_support (v1.0.0 → v1.1.0)
Hot-reload only applies to TOML/YAML skill definitions. If a skill has compiled Rust tool bindings, those require a restart or plugin hot-reload.
Creating a Skill
# Scaffold a new skill
clawdesk skills create my-skill
# This creates:
# skills/my-skill/
# ├── skill.toml # Skill definition
# ├── prompt.md # Prompt fragment (optional, can inline in TOML)
# └── README.md # Documentation
Built-in Skills
| Skill | Description | Token Cost |
|---|---|---|
general_assistant | General-purpose helpful assistant | ~200 |
code_review | Code review with best practices | ~350 |
writing | Writing assistance with style guides | ~300 |
research | Web research and summarization | ~250 |
math | Mathematical reasoning and computation | ~150 |
customer_support | Customer support with ticket management | ~400 |
data_analysis | Data analysis and visualization | ~300 |
Skill Dependencies
Skills can depend on other skills, creating a dependency graph:
# skills/advanced-support.toml
[skill]
name = "advanced_support"
[skill.deps]
customer_support = ">=1.0.0"
knowledge_base = ">=1.0.0"
[skill.prompt]
fragment = """
In addition to standard support procedures, you can:
- Escalate to engineering with detailed technical reports
- Access the internal documentation system
- Review customer subscription and billing information
"""
When a skill with dependencies is selected, all its dependencies are automatically included (and their token costs counted against the budget).
Integration: Memory + Skills
Memory and skills work together to create powerful agent experiences:
# A skill that leverages memory
[skill]
name = "contextual_helper"
[skill.prompt]
fragment = """
Before answering, always search memory for relevant context.
Use the knowledge_base tool to find previous conversations
and documentation that might be relevant.
Cite your sources when using information from memory.
"""
[skill.tools]
required = ["knowledge_base"]
[skill.params]
search_top_k = { type = "integer", default = 5 }
include_sources = { type = "boolean", default = true }
For best results, pair the hybrid search memory system with skills that instruct the agent on how and when to query memory. This gives the agent both the capability (tools) and the knowledge (prompt) to effectively use long-term memory.