Ex-Hack: a Haskell Example-based Documentation

Ex-Hack logo: a low-fi portrait of Jamy
Gourmaud

Abstract

Ex-Hack is an example-based documentation automatically generated using the packages posted on Stackage. There’s a live demo here. We’ve just released the alpha version; you can have a look to the code here.

We are actively looking for new contributors.

After briefly introducing the project’s incentives and explaining how this software works, we discuss the current roadmap and what we need to do before releasing the V1.0.0.

An Effort to Fill the Documentation Gap

Accordingly to the 2017 Haskell survey results, the lack of good documentation seems to be a major issue in the Haskell community.

How can we fix this? The obvious solution is to write more documentation. However, writing more documentation does not necessarily mean writing a better one. The best documentation writers I met are not necessarily the best programmers I met. Being able to write a clear, concise and useful documentation is a very specific skill that takes time to acquire.

However, there’s one thing you can add to your documentation requiring very little writing skill and still being extremely useful: examples. We, as humans, tend to often learn by mimicking other’s behaviour. Adding examples to a documentation is in my experience a local maximum: it’s not really hard to do, yet it dramatically increases your documentation’s quality.

This is where I started thinking to automatically generate these examples. The Haskell community is fortunate enough to have a centralized code repository: Hackage. The packages stored in this repository are usually getting their dependencies from this very same repository. Maybe we could use all this code to extract some real-world examples.

It would be great if on top of the current type documentation we could add an automatically generated example section in Haddock.

This was the starting point of the whole ex-hack project: generate an example-based documentation we could later use in Haddock.

Let’s get this straight right away: we still have a lot of work to do before merging this project to Haddock, but we made some serious progress in the last 5 months!

So, How does it Work?

What do we need to generate such a database?

Well, basically, for each Stackage/Hackage package, we need to:

  1. Download it.
  2. Build it.
  3. Extract the symbols exported by its main library.
  4. Index the symbols coming other packages.
  5. Generate a static HTML documentation displaying the previously extracted informations.

The whole project is centered around the processing step abstraction. Each processing step is dependent from the previous one and is a dependency for the next one.

This means that ideally, we could run this software in a map/reduce configuration and distribute the load over several nodes. This comes handy, because as stated previously: we’ll need to build the whole Stackage/Hackage. Needless to say, this is a seriously time-consuming process. The more we can distribute the load, the faster we’ll generate the documentation.

So far, we’ve been able to generate the documentation for Stackage (~2,400 packages). The last run (which populated the current demo) took ~25 hour to complete on my 7 YO desktop. If we want to scale this project to the whole Hackage (~13 000 packages) repository, we’ll probably need to distribute this generation on several nodes.

Project’s Roadmap

Which bring us to the million dollar question: what’s next?

Before entering in the beta phase, we first need to address several issues making ex-hack documentation not fully usable:

  • First, we need to re-write the symbol occurrence indexing system. The current approach clearly does not work.
  • We need to setup a proper environment to execute the whole database generation. Some packages need some system dependencies (such as libgl, zlib, etc.) to be built. We are currently unable to build them. Hence, we are indexing 2082 packages instead of the ~2,400 contained in Stackage.

Then, it’ll be time to address some usability issues before releasing the first stable version:

  • Writing a second UI targeted to package maintainers; showing them what are the most used parts of their API. It could come handy when performing a major API refactoring ;)
  • Writing a Haskell IDE Engine plugin.
  • Partially re-write the current HTML documentation to make it more pleasant to use.

Ultimate Goal

Once ex-hack will reach its definitive form, I would love to see this merged back to Haddock. I think having usage examples next to the type and author’s comments for each library symbol would make a great documentation.

However, entirely merging this to Haddock is going to be challenging. First, we would need a new Haddock component in charge to build the ex-hack documentation for every new package/stackage release.

We would also need a way to retrieve an already populated ex-hack database before generating a Haddock documentation for a library: you can’t expect to build the whole Hackage/Stackage on a developer’s machine!

Anyways, looks challenging, but you always need a long-term goal dream right? :D

How to Join Us?

So far, I’ve been mostly working on this in isolation; which was arguably not my best move. The more people will work on this, the more progress we’ll make.

Most of the annoying early project exploration has already been made, we mostly know were we are heading now. We mostly need to “industrialize” the project. You don’t need to be a Haskell guru to contribute, as a matter of fact I’m far from being one.

For now, the GitHub repository will act as the main community hub. Pick an issue — some are labeled as newcomer-friendly — and just start hacking on it :)

If you have some questions/suggestions or just want to say hi, there’s also a more informal communication channel: the #ex-hack room on the Freenode IRC network.