Emoji Driven Development in Ruby

Tom Lord

Published in

Carwow Product, Design & Engineering

5 min readAug 22, 2019

As we all know, emojis are the pinnacle of human communication.

In contrast to traditional written communication, emojis transcend languages, reduce verbosity and help reveal intent 😉

Let’s take a look at how Emojis can not only brighten up your text messages, but also your Ruby code.

In order to understand how (and why) emojis can be used in code, it’s important to know: What is an emoji?

From 🅰️ to 💤

Computers store all data in binary; a sequence of 0s and 1s. (Apologies if I’m teaching a 👵🏻 to suck 🥚!)

Due to their strictly mathematical nature, we have a few universally agreed standards for storing numbers in binary format. There are some subtle variations, such as big/little-endianness, and a few different approaches to storing negative/fractional values (along with some complications), but generally speaking these standardised formats have all been agreed on.

Storing language in binary, however, is a whole different ⚾ 🏟️. One of the earliest, and best-known, conversions from numbers → letters is the following:

This “ASCII” table (now preferably referred to as US-ASCII), created in 1963, was an early form of character encoding. It was designed to suit the needs of it original users; in particular, it was designed for:

The English language…
…with data entry via typewriters…
…for use in telegraphs.

As such, it includes 33 non-printing control codes; most of which are now obsolete.

Today, it’s little more than an odd quirk of history that some of these characters — including “bell” (\a), “vertical tab” (\v) and “carriage return” (\r)— are still defined within this standard character set. (And this history is why Windows operating systems still use two characters, \r\n, to define new lines; mimicking the physical action of typewriters!)

But alas, this table is insufficient for the wider 🌏. What about languages that…

Use accented letters (á, ë, î, etc.)?
Use entirely different alphabets (Greek, Russian, etc.)?
Write from right-to-left (Arabic, Hebrew, etc.), or even from top-to-bottom?
Don’t use spaces, or instead use varying-width spaces?

Over the years, many new character encodings were created to solve these problems; typically by creating “multi-byte encodings” — i.e. using two 8-bit numbers to represent each character, rather than one, since 2⁸ simply isn’t big enough to store all the necessary characters for most languages.

It was a mess. Eventually, someone decided to f̶o̶r̶g̶e̶ ̶a̶ ̶r̶i̶n̶g̶ ̶t̶o̶ ̶b̶i̶n̶d̶ ̶t̶h̶e̶m̶ ̶a̶l̶l̶ create one encoding standard to unify them. Enter: Unicode!

Unicode (more specifically, UTF-8) is a superset of US-ASCII. Without going into too much detail, there are 2²¹ = 2,097,152 “available slots” for characters to be defined.

The latest Unicode standard — at the time of writing — is version 12.1 (released May 2019; supported in ruby version 2.6.3), and defines 137,994 characters.

The current draft for version 13.0 (planned for release in March 2020) expands this to 143,859.

These characters include everything from Hebrew, to Thai, to Japanese, to — you guessed it — Emoji!

Did you know?
Since Ruby version 2.4.0, you can access the current Ruby’s Unicode version via:
RbConfig::CONFIG['UNICODE_VERSION']

Emojis are just characters

Well, some emojis are actually represented by multiple characters: An emoji base, and emoji modifiers. Typically, a “modifier” is either a skin-tone (👍🏼) or gender ( 💁‍♂️); but in theory it could be anything.

The way that each emoji is displayed on your screen is effectively just a font. This is why the appearance of each emoji may look slightly different between devices. Lest we forget the infamous hamburger emoji debacle of 2017!

Migrating to Emoji 🛳

Ruby’s historical support for UTF-8 (and therefore Emojis) can be summarised as follows:

Ruby 1.8 and below: No concept of string encodings at all. Strings were more or less byte arrays.
Ruby 1.9: Default string encoding is US-ASCII everywhere. Alternate encodings (e.g. UTF-8) can be specified with the magic comment:
# encoding: utf-8
Ruby 2.0 and above: Default string encoding is UTF-8.

In Ruby, constants — such as class names — must begin with an upper-case character.

Up until Ruby 2.5, this is restricted to US-ASCII characters only (A-Z); but Ruby 2.6+ has relaxed this constraint, which means an additional 1,827 UTF-8 characters can be used.

…But other than this slight caveat, since emojis are really no different to other characters, this makes it possible to freely use them in your code:

I’m not sure this is a good idea…

…Without tests! So, naturally, let’s use the minitest-emoji gem to ensure our 🐄 is 🆗.

Here I have defined a carwow theme for MiniTest — which replaces the traditional ., F, E and S characters with some more exciting custom emojis.

This method is a little archaic, as the gem has maintained Ruby 1.8 compatibility; therefore Array#unpack('U*') is used to convert hex codes into Emoji. For example, the “pedestrian” emoji (🚶) is represented by the code point: U+1F6B6.

Here is the output from running these tests:

Run options: --theme carwow --seed 52069# Running:🚶 🏎 🚔 🚐Finished in 0.001669s, 2396.6448 runs/s, 1198.3224 assertions/s.
1) Failure:Cow::🔪#test_0002_returns a pig [edd.rb:27]:Expected: "🐖"Actual: "\u{1F969}"
2) Error:Cow::🌧#test_0001_does not respond to rain:NoMethodError: undefined method `🌧' for #<Cow:0x00007f919103a438>edd.rb:44:in `block (3 levels) in <main>'4 runs, 2 assertions, 1 failures, 1 errors, 1 skips

Much clearer! 🙄

Emoji Driven Development in Ruby

From 🅰️ to 💤

Emojis are just characters

Migrating to Emoji 🛳

Further reading 📖

Written by Tom Lord