Deep Dive: Context Assembly

What you'll learn

How ContextQueryBuilder in SochDB assembles context for LLM requests under a token budget, using prioritized sections, the TOON output format, and mathematical analysis of the I/O reduction achieved.

The Context Problem

LLMs have a finite context window. A conversation might involve:

Context Source	Typical Tokens	Priority
System prompt	200–800	Critical
Active skills prompt fragments	100–500 each	High
Recent conversation history	500–50,000	High
Retrieved knowledge (RAG)	200–5,000	Medium
User profile / preferences	50–200	Medium
Tool schemas	100–300 each	Conditional
Previous tool results	50–2,000	Low

Total potential context: 60,000+ tokens. Typical model limit: 8,000–128,000 tokens. We need to fit the most important context within the budget.

This is formally a 0/1 knapsack problem.

Token Budgeting as Knapsack

Formal Definition

Given:

A set of context sections $S = {s_1, s_2, \ldots, s_m}$
Each section $s_i$ has a token cost $c_i$ and a value (priority) $v_i$
A total token budget $B$
A reserved allocation for the LLM's response: $R$

$$ \max \sum_{i=1}^{m} v_i \cdot x_i \quad \text{subject to} \quad \sum_{i=1}^{m} c_i \cdot x_i \leq B - R $$

Where $x_i \in {0, 1}$ indicates whether section $s_i$ is included.

ClawDesk's Approach

The full 0/1 knapsack is NP-hard, but ClawDesk uses a priority-tier greedy approach that runs in $O(m \log m)$:

ContextQueryBuilder API

SochDB's ContextQueryBuilder provides a fluent API for assembling context:

/// In clawdesk-sochdb/src/context.rs
pub struct ContextQueryBuilder {
    budget: TokenBudget,
    sections: Vec<ContextSection>,
    output_format: OutputFormat,
}

impl ContextQueryBuilder {
    /// Create a new builder with the given token budget.
    pub fn new(budget: usize) -> Self {
        Self {
            budget: TokenBudget::new(budget),
            sections: Vec::new(),
            output_format: OutputFormat::Toon,
        }
    }

    /// Reserve tokens for the LLM's response.
    pub fn reserve_response(mut self, tokens: usize) -> Self {
        self.budget.reserve(tokens);
        self
    }

    /// Add a critical section (always included, panics if over budget).
    pub fn system_prompt(mut self, prompt: &str) -> Self {
        let section = ContextSection {
            content: prompt.to_string(),
            priority: Priority::Critical,
            token_cost: estimate_tokens(prompt),
            shrinkable: false,
        };
        self.budget.allocate_critical(section.token_cost);
        self.sections.push(section);
        self
    }

    /// Add skill prompt fragments.
    pub fn skills(mut self, skills: &[SelectedSkill]) -> Self {
        for skill in skills {
            self.sections.push(ContextSection {
                content: skill.prompt_fragment.clone(),
                priority: Priority::High,
                token_cost: skill.token_cost,
                shrinkable: false,
            });
        }
        self
    }

    /// Add conversation history (shrinkable — can be trimmed from oldest).
    pub fn history(mut self, entries: &[HistoryEntry]) -> Self {
        self.sections.push(ContextSection {
            content: format_history(entries),
            priority: Priority::High,
            token_cost: estimate_tokens_history(entries),
            shrinkable: true,
        });
        self
    }

    /// Add retrieved knowledge chunks.
    pub fn knowledge(mut self, chunks: &[KnowledgeChunk]) -> Self {
        for chunk in chunks {
            self.sections.push(ContextSection {
                content: chunk.content.clone(),
                priority: Priority::Medium,
                token_cost: chunk.token_count,
                shrinkable: false,
            });
        }
        self
    }

    /// Build the final context, fitting within the budget.
    pub fn build(self) -> Result<AssembledContext, ContextError> {
        let available = self.budget.available();
        let mut included = Vec::new();
        let mut remaining = available;

        // Sort by priority (descending), then by value density
        let mut sections = self.sections;
        sections.sort_by(|a, b| {
            b.priority.cmp(&a.priority)
                .then_with(|| {
                    let density_a = a.priority.weight() as f64 / a.token_cost as f64;
                    let density_b = b.priority.weight() as f64 / b.token_cost as f64;
                    density_b.partial_cmp(&density_a).unwrap()
                })
        });

        for section in sections {
            if section.token_cost <= remaining {
                remaining -= section.token_cost;
                included.push(section);
            } else if section.shrinkable && remaining > 0 {
                // Trim shrinkable sections to fit
                let trimmed = section.trim_to(remaining);
                remaining -= trimmed.token_cost;
                included.push(trimmed);
            }
            // else: skip
        }

        let assembled = match self.output_format {
            OutputFormat::Toon => format_toon(&included),
            OutputFormat::Raw => format_raw(&included),
        };

        Ok(AssembledContext {
            content: assembled,
            tokens_used: available - remaining,
            tokens_available: available,
            sections_included: included.len(),
            sections_skipped: self.sections.len() - included.len(),
        })
    }
}

Usage Example

let context = ContextQueryBuilder::new(8192)
    .reserve_response(2048)
    .system_prompt("You are a helpful AI assistant.")
    .skills(&selected_skills)
    .history(&conversation_history)
    .knowledge(&rag_results)
    .build()?;

// context.content is now a token-budget-optimized string
// ready to be sent to the LLM

TOON Output Format

TOON (Token-Optimized Output Notation) is ClawDesk's compact serialization format for context sections. It achieves 58–67% fewer tokens compared to verbose JSON or Markdown.

Comparison

Standard JSON format:

{
  "conversation_history": [
    {
      "role": "user",
      "content": "What's the weather in London?",
      "timestamp": "2026-02-17T10:30:00Z"
    },
    {
      "role": "assistant",
      "content": "The current weather in London is 12°C with partly cloudy skies.",
      "timestamp": "2026-02-17T10:30:05Z"
    }
  ]
}

Token count: ~85 tokens

TOON format:

[H]
U|10:30|What's the weather in London?
A|10:30|The current weather in London is 12°C with partly cloudy skies.

Token count: ~32 tokens

Savings: $\frac{85 - 32}{85} = 62%$

TOON Encoding Rules

Element	TOON Syntax	Example
Section header	`[X]`	`[H]` = History, `[S]` = System, `[K]` = Knowledge
User message	`U\|time\|text`	`U\|10:30\|Hello`
Assistant message	`A\|time\|text`	`A\|10:30\|Hi there`
Tool call	`T\|name\|args`	`T\|get_weather\|{"city":"London"}`
Tool result	`R\|name\|result`	`R\|get_weather\|12°C cloudy`
Knowledge chunk	`K\|source\|text`	`K\|wiki\|London is the capital...`
Metadata separator	`---`	Between sections

Implementation

/// In clawdesk-sochdb/src/toon.rs

pub fn format_toon(sections: &[ContextSection]) -> String {
    let mut output = String::new();

    for (i, section) in sections.iter().enumerate() {
        if i > 0 {
            output.push_str("---\n");
        }

        match section.priority {
            Priority::Critical => {
                output.push_str("[S]\n");
                output.push_str(&section.content);
                output.push('\n');
            }
            Priority::High if section.is_history() => {
                output.push_str("[H]\n");
                for entry in section.as_history() {
                    let role = match entry.role {
                        Role::User => "U",
                        Role::Assistant => "A",
                        Role::System => "S",
                    };
                    let time = entry.timestamp.format("%H:%M");
                    writeln!(output, "{}|{}|{}", role, time, entry.content).ok();
                }
            }
            Priority::Medium if section.is_knowledge() => {
                output.push_str("[K]\n");
                output.push_str(&section.content);
                output.push('\n');
            }
            _ => {
                output.push_str(&section.content);
                output.push('\n');
            }
        }
    }

    output
}

I/O Reduction Analysis

Single Session Lookup

Consider a conversation with 100 messages stored in SochDB. With a traditional approach:

Naive approach:

Query database for all 100 messages → 100 rows
Serialize each to JSON → ~100 × 200 = 20,000 tokens
Send all 20,000 tokens to LLM
LLM processes 20,000 context tokens + generates response

ContextQueryBuilder approach:

Query SochDB with conversation_id index → 1 indexed lookup (vs. full scan)
Priority filter: keep only last 20 messages → 4,000 tokens raw
TOON format: compress to ~1,500 tokens
LLM processes 1,500 context tokens + generates response

I/O reduction:

$$ \text{Token reduction} = \frac{20{,}000 - 1{,}500}{20{,}000} = 92.5% $$

Database I/O reduction:

For traditional key-value stores, fetching 100 messages requires 100 random reads. SochDB uses a single B-tree range scan:

$$ \text{I/O reduction} = \frac{100 \text{ random reads}}{1 \text{ range scan}} \approx 10{,}000\times $$

(One range scan touches ~10 pages vs. 100 random reads at 1 page each, but with cache misses the random approach is far worse.)

Token Cost Savings Over Time

For an active conversation with $N$ total messages and a budget of $B$ tokens:

Approach	Tokens Sent to LLM	Cost at $$3/\text{M tokens}$ (1000 messages)
Send everything	$O(N)$ → 200,000	$0.60
Sliding window (last 20)	$O(\min(N, 20))$ → 4,000	$0.012
ContextQueryBuilder + TOON	$O(\min(N, B))$ → 1,500	$0.0045

Over 1,000 conversations/day:

$$ \text{Daily savings} = 1000 \times ($0.60 - $0.0045) = $595.50/\text{day} $$

Priority-Based Assembly Algorithm

The complete algorithm:

Complexity Analysis

Step	Time Complexity
Token estimation	$O(n)$ per section (character counting)
Priority sorting	$O(m \log m)$ where $m$ = section count
Greedy packing	$O(m)$
TOON formatting	$O(T)$ where $T$ = total tokens
Total	$O(m \log m + T)$

For typical values ($m \leq 20$ sections, $T \leq 8192$ tokens), this completes in microseconds.

Comparison to Alternatives

Hand-Rolled Compaction

Many LLM applications use ad-hoc string truncation:

# ❌ Typical hand-rolled approach
context = system_prompt
for msg in history[-20:]:  # arbitrary cutoff
    if len(context) + len(msg) < MAX_CHARS:  # character-based, not token-based
        context += msg

Problems:

Character-based, not token-based (1 character ≠ 1 token)
No priority weighting — recent history always wins
No shrinkability — all or nothing per section
No format optimization — raw text wastes tokens

LangChain's ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=20)

Better, but still:

Fixed window size, not budget-aware
No priority among different context sources
No output format optimization

ClawDesk's ContextQueryBuilder

Feature	Hand-rolled	LangChain	ContextQueryBuilder
Token-based budgeting	❌	⚠️	✅
Priority tiers	❌	❌	✅
Shrinkable sections	❌	❌	✅
Output optimization (TOON)	❌	❌	✅
Type-safe builder API	❌	❌	✅
Formal knapsack model	❌	❌	✅

Key Takeaways

Context assembly is a knapsack problem — formalize it, don't ad-hoc it
Priority tiers beat flat ordering — system prompts must never be cut
TOON format saves 58–67% tokens by eliminating JSON boilerplate
Shrinkable sections (history) gracefully degrade under budget pressure
SochDB indexed lookups reduce database I/O by orders of magnitude

The Context Problem​

Token Budgeting as Knapsack​

Formal Definition​

ClawDesk's Approach​

ContextQueryBuilder API​

Usage Example​

TOON Output Format​

Comparison​

TOON Encoding Rules​

Implementation​

I/O Reduction Analysis​

Single Session Lookup​

Token Cost Savings Over Time​

Priority-Based Assembly Algorithm​

Complexity Analysis​

Comparison to Alternatives​

Hand-Rolled Compaction​

LangChain's ConversationBufferWindowMemory​

ClawDesk's ContextQueryBuilder​

Key Takeaways​

Further Reading​