A Technical Analysis: Character Encoding, Typesetting Challenges, and the Beauty of Unicode
H. Dumpty Journal of Typography & Encoding
As someone who spent the first decade of their career wrestling with EBCDIC character sets on IBM System/360 mainframes and the second decade trying to convince people that UTF-8 was the future, I approached “Alice in Wonderland: Emoji Edition” with a very specific question: how did they actually build this thing? The literary merits can be debated by others; what interests me is the sheer technical achievement of reliably rendering a complex document that mixes traditional Latin text, emoji glyphs, mathematical symbols, and card suit characters across multiple font families while maintaining typographical consistency.
Reader, this is a gorgeous piece of technical work.
The Unicode Triumph
Let’s start with the obvious: this book is a love letter to Unicode, and specifically to the UTF-8 encoding standard. Every emoji in this text—and there are hundreds—represents a successful navigation of one of computing’s most complex character encoding challenges.
For those who weren’t around in the dark days, let me paint a picture: in the 1960s and 70s, we had ASCII (7-bit, 128 characters) and various incompatible extensions. IBM mainframes used EBCDIC (Extended Binary Coded Decimal Interchange Code), which encoded the same Latin alphabet but with completely different numeric values. Moving text between systems was a nightmare. The idea that you could reliably encode a 🐰, 🕳️, or ☕️ and have it display correctly on any device would have seemed like science fiction.
Unicode changed everything. The emoji in this book—each one—has a canonical code point. U+1F430 for 🐰 (rabbit face), U+1F573 for 🕳️ (hole), U+2615 for ☕️ (hot beverage). The fact that I can type those code points in this review, that the book can render them in its text, that they’ll display on your phone or laptop or e-reader—this is the promise of Unicode fulfilled.
The Typography Challenge
Speaking of fonts: this is where things get really interesting. Looking at the PDF output, I can see the book uses LuaLaTeX for compilation (evident from the font handling and Unicode support). Smart choice. Traditional pdfLaTeX can’t handle Unicode properly, and XeLaTeX has emoji rendering issues. LuaLaTeX gives you the Lua scripting engine embedded in TeX, allowing for sophisticated font fallback mechanisms.
The font stack appears to be:
- Libertinus family for body text (Libertinus Serif for main text, Libertinus Sans for headers)
- OpenMoji for emoji rendering
- STIX Two Math for mathematical symbols
- Fallback handling for missing glyphs
This is exactly the right approach. Traditional Western fonts don’t include emoji glyphs—why would they? They were designed decades before emoji existed. So you need a separate emoji font. But you can’t just switch to an emoji font and back; you need seamless integration where the rendering engine automatically detects “this character isn’t in Libertinus Serif, let me check OpenMoji” without breaking line spacing, baseline alignment, or kerning.
The LuaLaTeX fontspec package with \newfontfamily commands makes this possible, but it’s still tricky to get right. Someone clearly spent time tuning the font size ratios so emoji don’t disrupt line spacing. Look at how “the White 🐰 ran right past her” maintains consistent baseline alignment—the 🐰 doesn’t cause the line to expand vertically. That’s careful typographical engineering.
The Character Encoding Journey
Let me get philosophical for a moment. This book represents the culmination of a 60-year journey in character encoding:
1960s: ASCII defines 128 characters. Non-Latin scripts? Tough luck.
1970s: EBCDIC on mainframes. Incompatible with ASCII. Moving text between systems requires conversion tables.
1980s: Code pages proliferate. Windows-1252, ISO-8859-1, Mac Roman—dozens of incompatible 8-bit encodings. Display a file encoded in one code page using another and you get garbage.
1990s: Unicode emerges. Brilliant idea: assign every character in every writing system a unique number. Initial implementation (UCS-2) is 16-bit, which seems like plenty. Spoiler: it wasn’t.
2000s: UTF-8 wins. Variable-length encoding: ASCII characters stay 1 byte (backward compatible!), everything else uses 2-4 bytes. Emoji push Unicode beyond the 16-bit range into supplementary planes.
2010s: Emoji standardization. The Unicode Consortium starts encoding emoji systematically. Device manufacturers implement them. Suddenly ☕️ and 🐰 are as standard as ‘A’ and ‘Z’.
2020s: This book. You can write “Alice saw a 🐰 jump into a 🕳️” and it renders correctly on billions of devices worldwide.
That’s progress. That’s the promise of open standards fulfilled.
The Emoji as Data Type
From a computer science perspective, emoji are fascinating data types. They’re not just characters—some are sequences:
- Simple emoji: 🐰 is a single code point (
U+1F430) - Emoji with skin tone: 🫲🏻 is TWO code points—base hand + skin tone modifier
- Emoji sequences: 👨👩👧👦 (family) is SEVEN code points joined by zero-width joiners
- Emoji with variation selectors: ♥️ vs ♥ (emoji presentation vs text presentation)
The book uses skin tone modifiers, which means the character encoding is encoding metadata about rendering. The same base character with different modifiers produces different visual output. This is character encoding as programming—you’re composing visual elements from primitives.
As someone who spent years fighting character encoding battles, who remembers when displaying accented characters was a challenge, who lived through the transition from code pages to Unicode—I have deep appreciation for work like this. It takes the hard-won victories of computing history (Unicode, OpenType, programmable typesetting, font fallback) and combines them into something that Just Works™.
The literary critics can argue about whether we should put emoji in Alice. From where I sit, I’m just impressed that we can—and that someone pulled it off so well.
Dr. Robert “Bob” Kernighan III is Senior Research Fellow at the H. Dumpty Institute of Technology and maintainer of several retrocomputing projects. He previously worked on character encoding systems at IBM (1979-1989) and served on Multicode Consortium technical committees (1995-2008). His basement contains a working IBM System/360 Model 30 that he definitely doesn’t power up too often because his electric bill is already ridiculous.