Skip to main content

Deep Dive: Type Algebra

What you'll learn

Why InboundMessage is a 13-variant enum (sum type) instead of a 60-field struct (product type), the information-theoretic argument, exhaustive matching guarantees, and how NormalizedMessage serves as the canonical form.

The Core Problem

ClawDesk supports 13 messaging platforms. Each platform sends different data:

PlatformUnique FieldsExamples
Telegram8chat_id, reply_to_message_id, media_group_id
Discord7guild_id, channel_id, thread_id, embeds
Slack6team_id, channel, thread_ts, blocks
WhatsApp5phone_number, wa_id, media_url
Signal4source_uuid, group_id, attachments
iMessage3handle_id, service, tapback
Matrix5room_id, event_id, formatted_body
WebChat3session_id, browser_info
Email6from, to, cc, subject, html_body
SMS3phone_number, carrier
CLI2command, args
IRC4nick, channel, server
Custom4channel_name, metadata
Total~60

How do we represent "a message from any of these 13 platforms" in a single type?


Approach A: Product Type (Struct with Options)

The naïve approach — a single struct with Option<T> for every field:

// ❌ The product-type approach
pub struct InboundMessage {
// Common
pub text: Option<String>,
pub sender_id: Option<String>,

// Telegram
pub telegram_chat_id: Option<i64>,
pub telegram_reply_to: Option<i64>,
pub telegram_media_group: Option<String>,

// Discord
pub discord_guild_id: Option<u64>,
pub discord_channel_id: Option<u64>,
pub discord_thread_id: Option<u64>,

// Slack
pub slack_team_id: Option<String>,
pub slack_thread_ts: Option<String>,

// ... 50 more Option<_> fields ...
}

Information Theory Analysis

A product type with $n$ optional fields has $2^n$ possible states:

$$ |\text{StateSpace}_{\text{product}}| = 2^{60} \approx 1.15 \times 10^{18} $$

But only 13 states are valid — each message comes from exactly one platform. The remaining $2^{60} - 13$ states are nonsensical:

$$ \text{Invalid states} = 2^{60} - 13 \approx 2^{60} $$

The ratio of valid to total states:

$$ \frac{|\text{Valid}|}{|\text{Total}|} = \frac{13}{2^{60}} \approx 1.13 \times 10^{-17} $$

In information-theoretic terms, the product type wastes:

$$ \text{Wasted entropy} = \log_2(2^{60}) - \log_2(13) = 60 - 3.7 = 56.3 \text{ bits} $$

That's 56.3 bits of "lie" — states the type says are possible but never occur at runtime.

Runtime Bugs from Product Types

With this approach, every consumer must defensively check which fields are populated:

// 😰 What channel is this? Have to guess from populated fields
fn process(msg: &InboundMessage) {
if let Some(chat_id) = msg.telegram_chat_id {
// Probably Telegram... but what if discord_guild_id is also set?
// Nothing prevents both being Some simultaneously.
}
}

Approach B: Sum Type (Enum)

ClawDesk's actual approach – a sum type (tagged union / enum):

// ✅ The sum-type approach
pub enum InboundMessage {
Telegram(TelegramInbound),
Discord(DiscordInbound),
Slack(SlackInbound),
WhatsApp(WhatsAppInbound),
Signal(SignalInbound),
IMessage(IMessageInbound),
Matrix(MatrixInbound),
WebChat(WebChatInbound),
Email(EmailInbound),
Sms(SmsInbound),
Cli(CliInbound),
Irc(IrcInbound),
Custom(CustomInbound),
}

// Each variant carries ONLY its platform's fields
pub struct TelegramInbound {
pub chat_id: i64, // not Option — always present
pub from: TelegramUser, // not Option — always present
pub text: Option<String>, // optional: could be media-only
pub reply_to_message_id: Option<i64>,
pub media: Option<TelegramMedia>,
pub media_group_id: Option<String>,
pub forward_from: Option<TelegramUser>,
pub entities: Vec<MessageEntity>,
}

Information Theory Analysis

A sum type with $k$ variants has exactly $k$ top-level states:

$$ |\text{StateSpace}_{\text{sum}}| = 13 $$

This requires:

$$ \lceil \log_2(13) \rceil = 4 \text{ bits} $$

Zero wasted entropy. Every representable state is a valid state.

Comparison

PropertyProduct Type (struct)Sum Type (enum)
Top-level states$2^{60} \approx 10^{18}$$13$
Bits needed604
Invalid states$2^{60} - 13$0
Compile-time exhaustiveness
Field access safetyRuntime checksPattern matching

Exhaustive Matching

The compiler enforces that every match handles all 13 variants:

// If you forget a variant, the compiler refuses to build
fn message_source(msg: &InboundMessage) -> &'static str {
match msg {
InboundMessage::Telegram(_) => "telegram",
InboundMessage::Discord(_) => "discord",
InboundMessage::Slack(_) => "slack",
InboundMessage::WhatsApp(_) => "whatsapp",
InboundMessage::Signal(_) => "signal",
InboundMessage::IMessage(_) => "imessage",
InboundMessage::Matrix(_) => "matrix",
InboundMessage::WebChat(_) => "webchat",
InboundMessage::Email(_) => "email",
InboundMessage::Sms(_) => "sms",
InboundMessage::Cli(_) => "cli",
InboundMessage::Irc(_) => "irc",
InboundMessage::Custom(_) => "custom",
// ← Remove any arm and the compiler emits an error
}
}

Adding a 14th Channel

When you add InboundMessage::Teams(TeamsInbound), the compiler emits errors at every non-exhaustive match statement. This guarantees no handler silently ignores the new channel.

In the product-type approach, adding a new channel means adding more Option<T> fields — and no compiler warning tells you which functions need updating.


The NormalizedMessage: Canonical Form

After the sum-type InboundMessage captures platform-specific data faithfully, the normalize() function maps it to a unified canonical form:

pub struct NormalizedMessage {
pub channel_id: ChannelId,
pub conversation_id: ConversationId,
pub sender: Option<Sender>,
pub content: Content,
pub thread_id: Option<ThreadId>,
pub timestamp: DateTime<Utc>,
pub metadata: Metadata,
}

This is the funnel: 13 shapes in, 1 shape out. Everything downstream works with NormalizedMessage only.

Why Not Skip InboundMessage?

"Why not normalize directly from JSON?"

Because the intermediate InboundMessage step:

  1. Validates at the boundary — each variant struct enforces that platform-specific required fields (like Telegram's chat_id) are present
  2. Documents the contract — the struct definition is living documentation of what each platform provides
  3. Enables platform-specific logic — before normalization, you can apply platform-specific preprocessing (e.g., Telegram entity parsing, Discord embed extraction)

Error Algebra

ClawDesk applies the same principle to errors. Instead of a single Error type with string messages, errors form a closed union hierarchy:

/// Top-level error — a sum type over subsystem errors
#[derive(Debug, thiserror::Error)]
pub enum ClawDeskError {
#[error("Channel error: {0}")]
Channel(#[from] ChannelError),

#[error("Pipeline error: {0}")]
Pipeline(#[from] PipelineError),

#[error("Provider error: {0}")]
Provider(#[from] ProviderError),

#[error("Storage error: {0}")]
Storage(#[from] StorageError),

#[error("Security error: {0}")]
Security(#[from] SecurityError),
}

/// Each subsystem error is itself a sum type
#[derive(Debug, thiserror::Error)]
pub enum ChannelError {
#[error("Failed to start channel: {0}")]
Start(String),

#[error("Failed to send message: {0}")]
Send(String),

#[error("Stream error: {0}")]
Stream(String),

#[error("Configuration error: {0}")]
Config(String),
}

Why Closed Unions?

PropertyString errorsClosed union
Exhaustive handling
Programmatic branching❌ (parse strings)✅ (match arms)
Refactoring safety
Documentation✅ (enum variants are self-documenting)

The #[from] attribute with thiserror enables the ? operator to automatically convert between error levels, maintaining the hierarchy without boilerplate.


Mathematical Summary

Let $C$ be the set of channels, $F_c$ be the fields unique to channel $c$, and $F = \bigcup_{c \in C} F_c$ be all fields.

Product type state space:

$$ |\mathcal{S}{\text{prod}}| = \prod{f \in F} (|\text{dom}(f)| + 1) \geq 2^{|F|} $$

Sum type state space:

$$ |\mathcal{S}{\text{sum}}| = \sum{c \in C} \prod_{f \in F_c} |\text{dom}(f)| $$

For ClawDesk with $|C| = 13$ and $|F| = 60$:

$$ \frac{|\mathcal{S}{\text{prod}}|}{|\mathcal{S}{\text{sum}}|} \geq \frac{2^{60}}{13 \cdot 2^{8}} \approx 3.5 \times 10^{14} $$

The product type allows $3.5 \times 10^{14}\times$ more states than necessary — every one of those extra states is a potential bug.


Key Takeaways

Design Principle

Make illegal states unrepresentable. If a state shouldn't exist at runtime, the type system should prevent it from being constructed at compile time.

  1. Sum types (enums) encode "one of N" — use them when data comes in distinct shapes
  2. Product types (structs) encode "all of N" — use them when all fields co-occur
  3. Exhaustive matching catches missing cases at compile time
  4. Canonical forms (like NormalizedMessage) reduce downstream complexity
  5. Error hierarchies with thiserror give you programmatic error handling without string parsing

Further Reading