Fusion Powered Strings!

Feb 24

In the last few articles, we've gone through the different String types in Haskell, and I've made a few mentions of "efficiency". But what does this mean? It should be a bit more clear with Bytestrings. By using a more compact representation of the data, and operating on a lower level type like Word8, our strings will take up less space in memory, and they'll have more continuity. This makes operations on them more efficient. But what is it, exactly, about Text that makes it more efficient than using String?

One part of the answer to that is the concept of fusion. Throughout the documentation on Data.Text, you'll see the phrase "subject" to fusion. The short explanation is that GHC knows how to "fuse" Text operations together so that they happen more quickly and take less memory. Let's see a quick example of this.

Suppose we want to write a string manipulation function. This will drop the first 6 letters, add the letter 'a' on the beginning, append the word "goodbye" to the end, and then capitalize the whole thing. Here's what that function might look like using String:

transformString :: String -> String
transformString s = map toUpper $ 'a' : (drop 6 s) ++ ", goodbye."

...

>> transformString "Hello Friend"
"AFRIEND, GOODBYE."

In the process of making our final string ("AFRIEND, GOODBYE."), our program will actually make 4 different strings for each of the operations we run.

First, it will drop the 6 letters, and allocate space for the string "Friend".
Then, it will add the "a" to the front, giving us a new string "aFriend".
Next, it will append "goodbye", and we'll have "aFriend, goodbye.".
Finally, it will uppercase everything, giving "AFRIEND, GOODBYE.".

Allocating all this memory seems a bit wasteful. After all, the first three strings were just intermediate computations! But Haskell can't modify these values in place, because values are immutable in Haskell!

However, the Text type is specifically designed to work around this. It is written so that when compiled with optimizations, fusion functions can all be combined into a single step. So suppose we write the equivalent function with Text:

{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Text as T

transformString :: T.Text -> T.Text
transformString s = T.map toUpper $ T.append (T.cons 'a' (T.drop 6 s)) ", goodbye."

...

>> transformString (T.pack "Hello Friend")
"AFRIEND, GOODBYE."

If we compile this code with sufficient optimization, it will only allocate memory for the final Text value. There will be no intermediate allocations, because the whole function is "fused"!

Fusion is a tricky subject, but if you're just starting out with Haskell, you don't need to worry too much about it. You should focus on the basics, like getting conversions right between these types. For more tips and tricks that will help you on your Haskell journey, download our free Beginners Checklist. It'll give you some more help and guide you to other resources that will help you learn!

James Bowen

Fusion Powered Strings!

Treating Strings like Lists

Loading Different Strings