Deep Dive: Type Algebra

What you'll learn

Why InboundMessage is a 13-variant enum (sum type) instead of a 60-field struct (product type), the information-theoretic argument, exhaustive matching guarantees, and how NormalizedMessage serves as the canonical form.

The Core Problem

ClawDesk supports 13 messaging platforms. Each platform sends different data:

Platform	Unique Fields	Examples
Telegram	8	`chat_id`, `reply_to_message_id`, `media_group_id`
Discord	7	`guild_id`, `channel_id`, `thread_id`, `embeds`
Slack	6	`team_id`, `channel`, `thread_ts`, `blocks`
WhatsApp	5	`phone_number`, `wa_id`, `media_url`
Signal	4	`source_uuid`, `group_id`, `attachments`
iMessage	3	`handle_id`, `service`, `tapback`
Matrix	5	`room_id`, `event_id`, `formatted_body`
WebChat	3	`session_id`, `browser_info`
Email	6	`from`, `to`, `cc`, `subject`, `html_body`
SMS	3	`phone_number`, `carrier`
CLI	2	`command`, `args`
IRC	4	`nick`, `channel`, `server`
Custom	4	`channel_name`, `metadata`
Total	~60

How do we represent "a message from any of these 13 platforms" in a single type?

Approach A: Product Type (Struct with Options)

The naïve approach — a single struct with Option<T> for every field:

// ❌ The product-type approach
pub struct InboundMessage {
    // Common
    pub text: Option<String>,
    pub sender_id: Option<String>,

    // Telegram
    pub telegram_chat_id: Option<i64>,
    pub telegram_reply_to: Option<i64>,
    pub telegram_media_group: Option<String>,

    // Discord
    pub discord_guild_id: Option<u64>,
    pub discord_channel_id: Option<u64>,
    pub discord_thread_id: Option<u64>,

    // Slack
    pub slack_team_id: Option<String>,
    pub slack_thread_ts: Option<String>,

    // ... 50 more Option<_> fields ...
}

Information Theory Analysis

A product type with $n$ optional fields has $2^n$ possible states:

$$ |\text{StateSpace}_{\text{product}}| = 2^{60} \approx 1.15 \times 10^{18} $$

But only 13 states are valid — each message comes from exactly one platform. The remaining $2^{60} - 13$ states are nonsensical:

$$ \text{Invalid states} = 2^{60} - 13 \approx 2^{60} $$

The ratio of valid to total states:

$$ \frac{|\text{Valid}|}{|\text{Total}|} = \frac{13}{2^{60}} \approx 1.13 \times 10^{-17} $$

In information-theoretic terms, the product type wastes:

$$ \text{Wasted entropy} = \log_2(2^{60}) - \log_2(13) = 60 - 3.7 = 56.3 \text{ bits} $$

That's 56.3 bits of "lie" — states the type says are possible but never occur at runtime.

Runtime Bugs from Product Types

With this approach, every consumer must defensively check which fields are populated:

// 😰 What channel is this? Have to guess from populated fields
fn process(msg: &InboundMessage) {
    if let Some(chat_id) = msg.telegram_chat_id {
        // Probably Telegram... but what if discord_guild_id is also set?
        // Nothing prevents both being Some simultaneously.
    }
}

Approach B: Sum Type (Enum)

ClawDesk's actual approach – a sum type (tagged union / enum):

// ✅ The sum-type approach
pub enum InboundMessage {
    Telegram(TelegramInbound),
    Discord(DiscordInbound),
    Slack(SlackInbound),
    WhatsApp(WhatsAppInbound),
    Signal(SignalInbound),
    IMessage(IMessageInbound),
    Matrix(MatrixInbound),
    WebChat(WebChatInbound),
    Email(EmailInbound),
    Sms(SmsInbound),
    Cli(CliInbound),
    Irc(IrcInbound),
    Custom(CustomInbound),
}

// Each variant carries ONLY its platform's fields
pub struct TelegramInbound {
    pub chat_id: i64,           // not Option — always present
    pub from: TelegramUser,     // not Option — always present
    pub text: Option<String>,   // optional: could be media-only
    pub reply_to_message_id: Option<i64>,
    pub media: Option<TelegramMedia>,
    pub media_group_id: Option<String>,
    pub forward_from: Option<TelegramUser>,
    pub entities: Vec<MessageEntity>,
}

Information Theory Analysis

A sum type with $k$ variants has exactly $k$ top-level states:

$$ |\text{StateSpace}_{\text{sum}}| = 13 $$

This requires:

$$ \lceil \log_2(13) \rceil = 4 \text{ bits} $$

Zero wasted entropy. Every representable state is a valid state.

Comparison

Property	Product Type (`struct`)	Sum Type (`enum`)
Top-level states	$2^{60} \approx 10^{18}$	$13$
Bits needed	60	4
Invalid states	$2^{60} - 13$	0
Compile-time exhaustiveness	❌	✅
Field access safety	Runtime checks	Pattern matching

Exhaustive Matching

The compiler enforces that every match handles all 13 variants:

// If you forget a variant, the compiler refuses to build
fn message_source(msg: &InboundMessage) -> &'static str {
    match msg {
        InboundMessage::Telegram(_) => "telegram",
        InboundMessage::Discord(_) => "discord",
        InboundMessage::Slack(_) => "slack",
        InboundMessage::WhatsApp(_) => "whatsapp",
        InboundMessage::Signal(_) => "signal",
        InboundMessage::IMessage(_) => "imessage",
        InboundMessage::Matrix(_) => "matrix",
        InboundMessage::WebChat(_) => "webchat",
        InboundMessage::Email(_) => "email",
        InboundMessage::Sms(_) => "sms",
        InboundMessage::Cli(_) => "cli",
        InboundMessage::Irc(_) => "irc",
        InboundMessage::Custom(_) => "custom",
        // ← Remove any arm and the compiler emits an error
    }
}

Adding a 14th Channel

When you add InboundMessage::Teams(TeamsInbound), the compiler emits errors at every non-exhaustive match statement. This guarantees no handler silently ignores the new channel.

In the product-type approach, adding a new channel means adding more Option<T> fields — and no compiler warning tells you which functions need updating.

The NormalizedMessage: Canonical Form

After the sum-type InboundMessage captures platform-specific data faithfully, the normalize() function maps it to a unified canonical form:

pub struct NormalizedMessage {
    pub channel_id: ChannelId,
    pub conversation_id: ConversationId,
    pub sender: Option<Sender>,
    pub content: Content,
    pub thread_id: Option<ThreadId>,
    pub timestamp: DateTime<Utc>,
    pub metadata: Metadata,
}

This is the funnel: 13 shapes in, 1 shape out. Everything downstream works with NormalizedMessage only.

Why Not Skip InboundMessage?

"Why not normalize directly from JSON?"

Because the intermediate InboundMessage step:

Validates at the boundary — each variant struct enforces that platform-specific required fields (like Telegram's chat_id) are present
Documents the contract — the struct definition is living documentation of what each platform provides
Enables platform-specific logic — before normalization, you can apply platform-specific preprocessing (e.g., Telegram entity parsing, Discord embed extraction)

Error Algebra

ClawDesk applies the same principle to errors. Instead of a single Error type with string messages, errors form a closed union hierarchy:

/// Top-level error — a sum type over subsystem errors
#[derive(Debug, thiserror::Error)]
pub enum ClawDeskError {
    #[error("Channel error: {0}")]
    Channel(#[from] ChannelError),

    #[error("Pipeline error: {0}")]
    Pipeline(#[from] PipelineError),

    #[error("Provider error: {0}")]
    Provider(#[from] ProviderError),

    #[error("Storage error: {0}")]
    Storage(#[from] StorageError),

    #[error("Security error: {0}")]
    Security(#[from] SecurityError),
}

/// Each subsystem error is itself a sum type
#[derive(Debug, thiserror::Error)]
pub enum ChannelError {
    #[error("Failed to start channel: {0}")]
    Start(String),

    #[error("Failed to send message: {0}")]
    Send(String),

    #[error("Stream error: {0}")]
    Stream(String),

    #[error("Configuration error: {0}")]
    Config(String),
}

Why Closed Unions?

Property	String errors	Closed union
Exhaustive handling	❌	✅
Programmatic branching	❌ (parse strings)	✅ (match arms)
Refactoring safety	❌	✅
Documentation	❌	✅ (enum variants are self-documenting)

The #[from] attribute with thiserror enables the ? operator to automatically convert between error levels, maintaining the hierarchy without boilerplate.

Mathematical Summary

Let $C$ be the set of channels, $F_c$ be the fields unique to channel $c$, and $F = \bigcup_{c \in C} F_c$ be all fields.

Product type state space:

$$ |\mathcal{S}{\text{prod}}| = \prod{f \in F} (|\text{dom}(f)| + 1) \geq 2^{|F|} $$

Sum type state space:

$$ |\mathcal{S}{\text{sum}}| = \sum{c \in C} \prod_{f \in F_c} |\text{dom}(f)| $$

For ClawDesk with $|C| = 13$ and $|F| = 60$:

$$ \frac{|\mathcal{S}{\text{prod}}|}{|\mathcal{S}{\text{sum}}|} \geq \frac{2^{60}}{13 \cdot 2^{8}} \approx 3.5 \times 10^{14} $$

The product type allows $3.5 \times 10^{14}\times$ more states than necessary — every one of those extra states is a potential bug.

Key Takeaways

Design Principle

Make illegal states unrepresentable. If a state shouldn't exist at runtime, the type system should prevent it from being constructed at compile time.

Sum types (enums) encode "one of N" — use them when data comes in distinct shapes
Product types (structs) encode "all of N" — use them when all fields co-occur
Exhaustive matching catches missing cases at compile time
Canonical forms (like NormalizedMessage) reduce downstream complexity
Error hierarchies with thiserror give you programmatic error handling without string parsing

The Core Problem​

Approach A: Product Type (Struct with Options)​

Information Theory Analysis​

Runtime Bugs from Product Types​

Approach B: Sum Type (Enum)​

Information Theory Analysis​

Comparison​

Exhaustive Matching​

Adding a 14th Channel​

The NormalizedMessage: Canonical Form​

Why Not Skip InboundMessage?​

Error Algebra​

Why Closed Unions?​

Mathematical Summary​

Key Takeaways​

Further Reading​

The Core Problem

Approach A: Product Type (Struct with Options)

Information Theory Analysis

Runtime Bugs from Product Types

Approach B: Sum Type (Enum)

Information Theory Analysis

Comparison

Exhaustive Matching

Adding a 14th Channel

The NormalizedMessage: Canonical Form

Why Not Skip InboundMessage?

Error Algebra

Why Closed Unions?

Mathematical Summary

Key Takeaways

Further Reading