Deep Dive: Type Algebra
Why InboundMessage is a 13-variant enum (sum type) instead of a 60-field struct (product type), the information-theoretic argument, exhaustive matching guarantees, and how NormalizedMessage serves as the canonical form.
The Core Problem
ClawDesk supports 13 messaging platforms. Each platform sends different data:
| Platform | Unique Fields | Examples |
|---|---|---|
| Telegram | 8 | chat_id, reply_to_message_id, media_group_id |
| Discord | 7 | guild_id, channel_id, thread_id, embeds |
| Slack | 6 | team_id, channel, thread_ts, blocks |
| 5 | phone_number, wa_id, media_url | |
| Signal | 4 | source_uuid, group_id, attachments |
| iMessage | 3 | handle_id, service, tapback |
| Matrix | 5 | room_id, event_id, formatted_body |
| WebChat | 3 | session_id, browser_info |
| 6 | from, to, cc, subject, html_body | |
| SMS | 3 | phone_number, carrier |
| CLI | 2 | command, args |
| IRC | 4 | nick, channel, server |
| Custom | 4 | channel_name, metadata |
| Total | ~60 |
How do we represent "a message from any of these 13 platforms" in a single type?
Approach A: Product Type (Struct with Options)
The naïve approach — a single struct with Option<T> for every field:
// ❌ The product-type approach
pub struct InboundMessage {
// Common
pub text: Option<String>,
pub sender_id: Option<String>,
// Telegram
pub telegram_chat_id: Option<i64>,
pub telegram_reply_to: Option<i64>,
pub telegram_media_group: Option<String>,
// Discord
pub discord_guild_id: Option<u64>,
pub discord_channel_id: Option<u64>,
pub discord_thread_id: Option<u64>,
// Slack
pub slack_team_id: Option<String>,
pub slack_thread_ts: Option<String>,
// ... 50 more Option<_> fields ...
}
Information Theory Analysis
A product type with $n$ optional fields has $2^n$ possible states:
$$ |\text{StateSpace}_{\text{product}}| = 2^{60} \approx 1.15 \times 10^{18} $$
But only 13 states are valid — each message comes from exactly one platform. The remaining $2^{60} - 13$ states are nonsensical:
$$ \text{Invalid states} = 2^{60} - 13 \approx 2^{60} $$
The ratio of valid to total states:
$$ \frac{|\text{Valid}|}{|\text{Total}|} = \frac{13}{2^{60}} \approx 1.13 \times 10^{-17} $$
In information-theoretic terms, the product type wastes:
$$ \text{Wasted entropy} = \log_2(2^{60}) - \log_2(13) = 60 - 3.7 = 56.3 \text{ bits} $$
That's 56.3 bits of "lie" — states the type says are possible but never occur at runtime.
Runtime Bugs from Product Types
With this approach, every consumer must defensively check which fields are populated:
// 😰 What channel is this? Have to guess from populated fields
fn process(msg: &InboundMessage) {
if let Some(chat_id) = msg.telegram_chat_id {
// Probably Telegram... but what if discord_guild_id is also set?
// Nothing prevents both being Some simultaneously.
}
}
Approach B: Sum Type (Enum)
ClawDesk's actual approach – a sum type (tagged union / enum):
// ✅ The sum-type approach
pub enum InboundMessage {
Telegram(TelegramInbound),
Discord(DiscordInbound),
Slack(SlackInbound),
WhatsApp(WhatsAppInbound),
Signal(SignalInbound),
IMessage(IMessageInbound),
Matrix(MatrixInbound),
WebChat(WebChatInbound),
Email(EmailInbound),
Sms(SmsInbound),
Cli(CliInbound),
Irc(IrcInbound),
Custom(CustomInbound),
}
// Each variant carries ONLY its platform's fields
pub struct TelegramInbound {
pub chat_id: i64, // not Option — always present
pub from: TelegramUser, // not Option — always present
pub text: Option<String>, // optional: could be media-only
pub reply_to_message_id: Option<i64>,
pub media: Option<TelegramMedia>,
pub media_group_id: Option<String>,
pub forward_from: Option<TelegramUser>,
pub entities: Vec<MessageEntity>,
}
Information Theory Analysis
A sum type with $k$ variants has exactly $k$ top-level states:
$$ |\text{StateSpace}_{\text{sum}}| = 13 $$
This requires:
$$ \lceil \log_2(13) \rceil = 4 \text{ bits} $$
Zero wasted entropy. Every representable state is a valid state.
Comparison
| Property | Product Type (struct) | Sum Type (enum) |
|---|---|---|
| Top-level states | $2^{60} \approx 10^{18}$ | $13$ |
| Bits needed | 60 | 4 |
| Invalid states | $2^{60} - 13$ | 0 |
| Compile-time exhaustiveness | ❌ | ✅ |
| Field access safety | Runtime checks | Pattern matching |
Exhaustive Matching
The compiler enforces that every match handles all 13 variants:
// If you forget a variant, the compiler refuses to build
fn message_source(msg: &InboundMessage) -> &'static str {
match msg {
InboundMessage::Telegram(_) => "telegram",
InboundMessage::Discord(_) => "discord",
InboundMessage::Slack(_) => "slack",
InboundMessage::WhatsApp(_) => "whatsapp",
InboundMessage::Signal(_) => "signal",
InboundMessage::IMessage(_) => "imessage",
InboundMessage::Matrix(_) => "matrix",
InboundMessage::WebChat(_) => "webchat",
InboundMessage::Email(_) => "email",
InboundMessage::Sms(_) => "sms",
InboundMessage::Cli(_) => "cli",
InboundMessage::Irc(_) => "irc",
InboundMessage::Custom(_) => "custom",
// ← Remove any arm and the compiler emits an error
}
}
Adding a 14th Channel
When you add InboundMessage::Teams(TeamsInbound), the compiler emits errors at every non-exhaustive match statement. This guarantees no handler silently ignores the new channel.
In the product-type approach, adding a new channel means adding more Option<T> fields — and no compiler warning tells you which functions need updating.
The NormalizedMessage: Canonical Form
After the sum-type InboundMessage captures platform-specific data faithfully, the normalize() function maps it to a unified canonical form:
pub struct NormalizedMessage {
pub channel_id: ChannelId,
pub conversation_id: ConversationId,
pub sender: Option<Sender>,
pub content: Content,
pub thread_id: Option<ThreadId>,
pub timestamp: DateTime<Utc>,
pub metadata: Metadata,
}
This is the funnel: 13 shapes in, 1 shape out. Everything downstream works with NormalizedMessage only.
Why Not Skip InboundMessage?
"Why not normalize directly from JSON?"
Because the intermediate InboundMessage step:
- Validates at the boundary — each variant struct enforces that platform-specific required fields (like Telegram's
chat_id) are present - Documents the contract — the struct definition is living documentation of what each platform provides
- Enables platform-specific logic — before normalization, you can apply platform-specific preprocessing (e.g., Telegram entity parsing, Discord embed extraction)
Error Algebra
ClawDesk applies the same principle to errors. Instead of a single Error type with string messages, errors form a closed union hierarchy:
/// Top-level error — a sum type over subsystem errors
#[derive(Debug, thiserror::Error)]
pub enum ClawDeskError {
#[error("Channel error: {0}")]
Channel(#[from] ChannelError),
#[error("Pipeline error: {0}")]
Pipeline(#[from] PipelineError),
#[error("Provider error: {0}")]
Provider(#[from] ProviderError),
#[error("Storage error: {0}")]
Storage(#[from] StorageError),
#[error("Security error: {0}")]
Security(#[from] SecurityError),
}
/// Each subsystem error is itself a sum type
#[derive(Debug, thiserror::Error)]
pub enum ChannelError {
#[error("Failed to start channel: {0}")]
Start(String),
#[error("Failed to send message: {0}")]
Send(String),
#[error("Stream error: {0}")]
Stream(String),
#[error("Configuration error: {0}")]
Config(String),
}
Why Closed Unions?
| Property | String errors | Closed union |
|---|---|---|
| Exhaustive handling | ❌ | ✅ |
| Programmatic branching | ❌ (parse strings) | ✅ (match arms) |
| Refactoring safety | ❌ | ✅ |
| Documentation | ❌ | ✅ (enum variants are self-documenting) |
The #[from] attribute with thiserror enables the ? operator to automatically convert between error levels, maintaining the hierarchy without boilerplate.
Mathematical Summary
Let $C$ be the set of channels, $F_c$ be the fields unique to channel $c$, and $F = \bigcup_{c \in C} F_c$ be all fields.
Product type state space:
$$ |\mathcal{S}{\text{prod}}| = \prod{f \in F} (|\text{dom}(f)| + 1) \geq 2^{|F|} $$
Sum type state space:
$$ |\mathcal{S}{\text{sum}}| = \sum{c \in C} \prod_{f \in F_c} |\text{dom}(f)| $$
For ClawDesk with $|C| = 13$ and $|F| = 60$:
$$ \frac{|\mathcal{S}{\text{prod}}|}{|\mathcal{S}{\text{sum}}|} \geq \frac{2^{60}}{13 \cdot 2^{8}} \approx 3.5 \times 10^{14} $$
The product type allows $3.5 \times 10^{14}\times$ more states than necessary — every one of those extra states is a potential bug.
Key Takeaways
Make illegal states unrepresentable. If a state shouldn't exist at runtime, the type system should prevent it from being constructed at compile time.
- Sum types (enums) encode "one of N" — use them when data comes in distinct shapes
- Product types (structs) encode "all of N" — use them when all fields co-occur
- Exhaustive matching catches missing cases at compile time
- Canonical forms (like
NormalizedMessage) reduce downstream complexity - Error hierarchies with
thiserrorgive you programmatic error handling without string parsing
Further Reading
- Fallback FSM Deep Dive — another application of state machines in ClawDesk
- Architecture: Type System — full type system reference
- Message Flow Tutorial — see these types in action