SKEIN Mesh — Emoji Address Encoding
SKEIN addresses are text-canonical (alias::myproject::sha256::<digest>). On top of that text form sits an optional, fully reversible emoji encoding — a compact, pasteable representation of an address that still resolves to exactly the same content. It is a layer over the canonical text, not a separate addressing scheme.
What it encodes, and why it resolves
A five-emoji folio identity carries 50 bits, while a full content digest is 256 bits — so the emoji form cannot carry the whole digest. Instead it encodes the station completely and the folio as a 50-bit short-hash prefix.
Resolution works by:
- Decoding the station exactly (the station is always fully encoded).
- Asking that station to resolve the 50-bit prefix to a full folio.
- Recovering the full 256-bit digest from the resolved folio — never from the emoji itself.
Because the station is always encoded in full, resolution always reaches the right station; only the folio prefix is expanded there.
Alphabet
The alphabet is 1024 single-codepoint emoji, drawn from Unicode 9.0 or earlier, with no zero-width joiners, modifiers, or variation selectors — visually distinct and culturally neutral. 1024 is near the practical ceiling for clean single-codepoint emoji: larger alphabets would require joiner sequences that break codepoint parsing and degrade on some platforms, and would not help anyway, since station encoding is bottlenecked at two characters per emoji regardless.
Code ranges
The 1024 values are disjoint ranges, decoded by position:
- 0–783 — two-character letter clusters over the alphabet
a–z . -(28² = 784). - 784–811 — 28 singletons.
- 812–821 — the ten digits.
- 822–838 — control codes (brand, route subset, type subset).
- 839–1023 — reserved.
Canonical encoder
Greedy left-to-right two-character pairing: a singleton is emitted only on a forced boundary or a final odd character, and a digit terminates the current cluster and is emitted as its own digit emoji.
The decoder rejects non-canonical input
The decoder maps each emoji to its fixed fragment by range, then re-encodes canonically and rejects the input if the result differs. This is what makes the emoji → text → emoji round-trip hold: a non-canonical stream is invalid input, never something silently accepted.
Stream layout
The stream is [brand][route?][station…][type][identity×5], decoded right-anchored:
- the last five emoji are the identity,
- the one before them is the type,
- a variable run before that is the station,
- an optional single emoji is the route (present only when a control-range emoji sits in that position),
- and the first emoji is the brand.
The identity is fixed at exactly five emoji, which is what makes the right-anchored decode unambiguous.
Fixed-length trade-off
Because the emoji identity cannot grow, two folios in one station that collide on 50 bits share an identical emoji address and fall back to the text full hash, which always works. Collisions are improbable in practice — the 50% birthday point is on the order of tens of millions of folios in a single station — and the fallback is graceful.
Status
The encoding's constraints are fixed; the remaining work is curating the final 1024-emoji alphabet, locking the encoding specification, and the rendering fallback.