Using Binary Mode in Haskell

So far in our IO adventures, we've only been dealing with plain text files. But a lot of data isn't meant to be read as string data. Some of the most interesting and important problems in computing today are about reading image data and processing it so our programs can understand what's going on. Executable program files are also in a binary format, rather than human readable. So today, we're going to explore how IO works with binary files.

First, it's important to understand that handles have encodings, which we can retrieve using hGetEncoding. For the most part, your files will default as UTF-8.

hGetEncoding :: Handle -> IO (Maybe TextEncoding)

main :: IO ()
main = do
  hGetEncoding stdin >>= print
  hGetEncoding stdout >>= print
  h <- openFile "testfile.txt" ReadMode
  hGetEncoding h >>= print

...

Just UTF-8
Just UTF-8
Just UTF-8

There are other encodings of course, like char8, latin1, and utf16. These are different ways of turning text into bytes, and each TextEncoding expression refers to one of these. If you know you have a file written in UTF16, you can change the encoding using hSetEncoding:

hSetEncoding :: Handle -> TextEncoding -> IO ()

main :: IO ()
main = do
  h <- openFile "myutf16file.txt" ReadMode
  hSetEncoding h utf16
  myString <- hGetLine h
  ...

But now notice that hGetEncoding returns a Maybe value. For binary files, there is no encoding! We are only allowed to read raw data. You can set a file to read as binary by using hSetBinaryMode True, or by just using openBinaryFile.

hSetBinaryMode :: Handle -> Bool -> IO ()

openBinaryFile :: FilePath -> IOMode -> IO Handle

main :: IO ()
main = do
  h <- openBinaryFile "pic_1.bmp" ReadMode
  ...

When it comes to processing binary data, it is best to parse your input into a ByteString rather than a string. Using the unpack function will then allow you to operate on the raw list of bytes:

import qualified Data.ByteString as B

main :: IO ()
main = do
  h <- openBinaryFile "pic_1.bmp" ReadMode
  inputBytes <- B.hGetContents h
  print $ length inputBytes

In this example, I've opened up an image files, and converted its data into a list of bytes (using the Word type).

Further processing of the image will require some knowledge of the image format. As a basic example, I made a 24-bit bitmap with horizontal stripes throughout. The size was 16 pixels by 16 pixels. With 3 bytes (24 bits) per pixel, the total size of the "image" would be 768. So then upon seeing that my program above printed "822", I could figure out that the first 54 bits were just header data.

I could then separate my data into "lines" (48-byte chunks) and I successfully observed that each of these chunks followed a specific pattern. Many lines were all white (the only value was 255), and other lines had three repeating values.

import qualified Data.ByteString as B
import Data.List.Split (chunksOf)

main :: IO ()
main = do
  h <- openBinaryFile "pic_1.bmp" ReadMode
  inputBytes <- B.unpack <$> B.hGetContents h
  let lines = chunksOf 48 (drop 54 inputBytes)
  forM_ lines print

...

[255, 255, 255, ...]
[36, 28, 237, 36, 28, 237, ...]
[255, 255, 255, ...]
[76, 177, 34, 76, 177, 34 ...]
[255, 255, 255, ...]
[36, 28, 237, 36, 28, 237, ...]
[255, 255, 255, ...]
[76, 177, 34, 76, 177, 34 ...]
[255, 255, 255, ...]
[0, 242, 255, 0, 242, 255, ...]
[255, 255, 255, ...]
[232, 162, 0, 232, 162, 0, ...]
[255, 255, 255, ...]
[0, 242, 255, 0, 242, 255, ...]
[255, 255, 255, ...]
[232, 162, 0, 232, 162, 0, ...]

Now that the data is broken into simple numbers, it would be possible to do many kinds of mathematical algorithms on it if there were some interesting data to process!

In our last couple of IO articles, we'll keep looking at some issues with binary data. If you want monthly summaries of what we're writing here at Monday Morning Haskell, make sure to subscribe to our monthly newsletter! This will also give you access to our subscriber resources!

Previous
Previous

Sizing Up our Files

Next
Next

Interactive IO