Deep Dive: Context Assembly
How ContextQueryBuilder in SochDB assembles context for LLM requests under a token budget, using prioritized sections, the TOON output format, and mathematical analysis of the I/O reduction achieved.
The Context Problem
LLMs have a finite context window. A conversation might involve:
| Context Source | Typical Tokens | Priority |
|---|---|---|
| System prompt | 200–800 | Critical |
| Active skills prompt fragments | 100–500 each | High |
| Recent conversation history | 500–50,000 | High |
| Retrieved knowledge (RAG) | 200–5,000 | Medium |
| User profile / preferences | 50–200 | Medium |
| Tool schemas | 100–300 each | Conditional |
| Previous tool results | 50–2,000 | Low |
Total potential context: 60,000+ tokens. Typical model limit: 8,000–128,000 tokens. We need to fit the most important context within the budget.
This is formally a 0/1 knapsack problem.
Token Budgeting as Knapsack
Formal Definition
Given:
- A set of context sections $S = {s_1, s_2, \ldots, s_m}$
- Each section $s_i$ has a token cost $c_i$ and a value (priority) $v_i$
- A total token budget $B$
- A reserved allocation for the LLM's response: $R$
$$ \max \sum_{i=1}^{m} v_i \cdot x_i \quad \text{subject to} \quad \sum_{i=1}^{m} c_i \cdot x_i \leq B - R $$
Where $x_i \in {0, 1}$ indicates whether section $s_i$ is included.
ClawDesk's Approach
The full 0/1 knapsack is NP-hard, but ClawDesk uses a priority-tier greedy approach that runs in $O(m \log m)$:
ContextQueryBuilder API
SochDB's ContextQueryBuilder provides a fluent API for assembling context:
/// In clawdesk-sochdb/src/context.rs
pub struct ContextQueryBuilder {
budget: TokenBudget,
sections: Vec<ContextSection>,
output_format: OutputFormat,
}
impl ContextQueryBuilder {
/// Create a new builder with the given token budget.
pub fn new(budget: usize) -> Self {
Self {
budget: TokenBudget::new(budget),
sections: Vec::new(),
output_format: OutputFormat::Toon,
}
}
/// Reserve tokens for the LLM's response.
pub fn reserve_response(mut self, tokens: usize) -> Self {
self.budget.reserve(tokens);
self
}
/// Add a critical section (always included, panics if over budget).
pub fn system_prompt(mut self, prompt: &str) -> Self {
let section = ContextSection {
content: prompt.to_string(),
priority: Priority::Critical,
token_cost: estimate_tokens(prompt),
shrinkable: false,
};
self.budget.allocate_critical(section.token_cost);
self.sections.push(section);
self
}
/// Add skill prompt fragments.
pub fn skills(mut self, skills: &[SelectedSkill]) -> Self {
for skill in skills {
self.sections.push(ContextSection {
content: skill.prompt_fragment.clone(),
priority: Priority::High,
token_cost: skill.token_cost,
shrinkable: false,
});
}
self
}
/// Add conversation history (shrinkable — can be trimmed from oldest).
pub fn history(mut self, entries: &[HistoryEntry]) -> Self {
self.sections.push(ContextSection {
content: format_history(entries),
priority: Priority::High,
token_cost: estimate_tokens_history(entries),
shrinkable: true,
});
self
}
/// Add retrieved knowledge chunks.
pub fn knowledge(mut self, chunks: &[KnowledgeChunk]) -> Self {
for chunk in chunks {
self.sections.push(ContextSection {
content: chunk.content.clone(),
priority: Priority::Medium,
token_cost: chunk.token_count,
shrinkable: false,
});
}
self
}
/// Build the final context, fitting within the budget.
pub fn build(self) -> Result<AssembledContext, ContextError> {
let available = self.budget.available();
let mut included = Vec::new();
let mut remaining = available;
// Sort by priority (descending), then by value density
let mut sections = self.sections;
sections.sort_by(|a, b| {
b.priority.cmp(&a.priority)
.then_with(|| {
let density_a = a.priority.weight() as f64 / a.token_cost as f64;
let density_b = b.priority.weight() as f64 / b.token_cost as f64;
density_b.partial_cmp(&density_a).unwrap()
})
});
for section in sections {
if section.token_cost <= remaining {
remaining -= section.token_cost;
included.push(section);
} else if section.shrinkable && remaining > 0 {
// Trim shrinkable sections to fit
let trimmed = section.trim_to(remaining);
remaining -= trimmed.token_cost;
included.push(trimmed);
}
// else: skip
}
let assembled = match self.output_format {
OutputFormat::Toon => format_toon(&included),
OutputFormat::Raw => format_raw(&included),
};
Ok(AssembledContext {
content: assembled,
tokens_used: available - remaining,
tokens_available: available,
sections_included: included.len(),
sections_skipped: self.sections.len() - included.len(),
})
}
}
Usage Example
let context = ContextQueryBuilder::new(8192)
.reserve_response(2048)
.system_prompt("You are a helpful AI assistant.")
.skills(&selected_skills)
.history(&conversation_history)
.knowledge(&rag_results)
.build()?;
// context.content is now a token-budget-optimized string
// ready to be sent to the LLM
TOON Output Format
TOON (Token-Optimized Output Notation) is ClawDesk's compact serialization format for context sections. It achieves 58–67% fewer tokens compared to verbose JSON or Markdown.
Comparison
Standard JSON format:
{
"conversation_history": [
{
"role": "user",
"content": "What's the weather in London?",
"timestamp": "2026-02-17T10:30:00Z"
},
{
"role": "assistant",
"content": "The current weather in London is 12°C with partly cloudy skies.",
"timestamp": "2026-02-17T10:30:05Z"
}
]
}
Token count: ~85 tokens
TOON format:
[H]
U|10:30|What's the weather in London?
A|10:30|The current weather in London is 12°C with partly cloudy skies.
Token count: ~32 tokens
Savings: $\frac{85 - 32}{85} = 62%$
TOON Encoding Rules
| Element | TOON Syntax | Example |
|---|---|---|
| Section header | [X] | [H] = History, [S] = System, [K] = Knowledge |
| User message | U|time|text | U|10:30|Hello |
| Assistant message | A|time|text | A|10:30|Hi there |
| Tool call | T|name|args | T|get_weather|{"city":"London"} |
| Tool result | R|name|result | R|get_weather|12°C cloudy |
| Knowledge chunk | K|source|text | K|wiki|London is the capital... |
| Metadata separator | --- | Between sections |
Implementation
/// In clawdesk-sochdb/src/toon.rs
pub fn format_toon(sections: &[ContextSection]) -> String {
let mut output = String::new();
for (i, section) in sections.iter().enumerate() {
if i > 0 {
output.push_str("---\n");
}
match section.priority {
Priority::Critical => {
output.push_str("[S]\n");
output.push_str(§ion.content);
output.push('\n');
}
Priority::High if section.is_history() => {
output.push_str("[H]\n");
for entry in section.as_history() {
let role = match entry.role {
Role::User => "U",
Role::Assistant => "A",
Role::System => "S",
};
let time = entry.timestamp.format("%H:%M");
writeln!(output, "{}|{}|{}", role, time, entry.content).ok();
}
}
Priority::Medium if section.is_knowledge() => {
output.push_str("[K]\n");
output.push_str(§ion.content);
output.push('\n');
}
_ => {
output.push_str(§ion.content);
output.push('\n');
}
}
}
output
}
I/O Reduction Analysis
Single Session Lookup
Consider a conversation with 100 messages stored in SochDB. With a traditional approach:
Naive approach:
- Query database for all 100 messages → 100 rows
- Serialize each to JSON → ~100 × 200 = 20,000 tokens
- Send all 20,000 tokens to LLM
- LLM processes 20,000 context tokens + generates response
ContextQueryBuilder approach:
- Query SochDB with
conversation_idindex → 1 indexed lookup (vs. full scan) - Priority filter: keep only last 20 messages → 4,000 tokens raw
- TOON format: compress to ~1,500 tokens
- LLM processes 1,500 context tokens + generates response
I/O reduction:
$$ \text{Token reduction} = \frac{20{,}000 - 1{,}500}{20{,}000} = 92.5% $$
Database I/O reduction:
For traditional key-value stores, fetching 100 messages requires 100 random reads. SochDB uses a single B-tree range scan:
$$ \text{I/O reduction} = \frac{100 \text{ random reads}}{1 \text{ range scan}} \approx 10{,}000\times $$
(One range scan touches ~10 pages vs. 100 random reads at 1 page each, but with cache misses the random approach is far worse.)
Token Cost Savings Over Time
For an active conversation with $N$ total messages and a budget of $B$ tokens:
| Approach | Tokens Sent to LLM | Cost at $$3/\text{M tokens}$ (1000 messages) |
|---|---|---|
| Send everything | $O(N)$ → 200,000 | $0.60 |
| Sliding window (last 20) | $O(\min(N, 20))$ → 4,000 | $0.012 |
| ContextQueryBuilder + TOON | $O(\min(N, B))$ → 1,500 | $0.0045 |
Over 1,000 conversations/day:
$$ \text{Daily savings} = 1000 \times ($0.60 - $0.0045) = $595.50/\text{day} $$
Priority-Based Assembly Algorithm
The complete algorithm:
Complexity Analysis
| Step | Time Complexity |
|---|---|
| Token estimation | $O(n)$ per section (character counting) |
| Priority sorting | $O(m \log m)$ where $m$ = section count |
| Greedy packing | $O(m)$ |
| TOON formatting | $O(T)$ where $T$ = total tokens |
| Total | $O(m \log m + T)$ |
For typical values ($m \leq 20$ sections, $T \leq 8192$ tokens), this completes in microseconds.
Comparison to Alternatives
Hand-Rolled Compaction
Many LLM applications use ad-hoc string truncation:
# ❌ Typical hand-rolled approach
context = system_prompt
for msg in history[-20:]: # arbitrary cutoff
if len(context) + len(msg) < MAX_CHARS: # character-based, not token-based
context += msg
Problems:
- Character-based, not token-based (1 character ≠ 1 token)
- No priority weighting — recent history always wins
- No shrinkability — all or nothing per section
- No format optimization — raw text wastes tokens
LangChain's ConversationBufferWindowMemory
memory = ConversationBufferWindowMemory(k=20)
Better, but still:
- Fixed window size, not budget-aware
- No priority among different context sources
- No output format optimization
ClawDesk's ContextQueryBuilder
| Feature | Hand-rolled | LangChain | ContextQueryBuilder |
|---|---|---|---|
| Token-based budgeting | ❌ | ⚠️ | ✅ |
| Priority tiers | ❌ | ❌ | ✅ |
| Shrinkable sections | ❌ | ❌ | ✅ |
| Output optimization (TOON) | ❌ | ❌ | ✅ |
| Type-safe builder API | ❌ | ❌ | ✅ |
| Formal knapsack model | ❌ | ❌ | ✅ |
Key Takeaways
- Context assembly is a knapsack problem — formalize it, don't ad-hoc it
- Priority tiers beat flat ordering — system prompts must never be cut
- TOON format saves 58–67% tokens by eliminating JSON boilerplate
- Shrinkable sections (history) gracefully degrade under budget pressure
- SochDB indexed lookups reduce database I/O by orders of magnitude
Further Reading
- Structured Concurrency — how context assembly runs concurrently with other pipeline stages
- Build a Skill — skills interact with the context budget
- Storage Layer Architecture — SochDB internals