The importance and methods of code documentation

I’ve had the chance to work on a lot of large-scale codebases over the last few years – from very good, to ripping your hair out what-the-hell-were-they-thinking. And a theme throughout any codebase is how that code is documented. I’m not sure if it’s something that’s explicitly taught in various schools of learning to code, so I’ll quickly bullet point what I mean here, and go in-depth into them later on.

  • ‘Self-documenting’ code
  • Comments within the code
  • Documentation outside of the codebase
  • Testing

Is code documentation important? Where do each of these fail and succeed? And what sort of enforcements should there be for each? Let’s start with the one that’s thrown around a lot – I suspect because it doesn’t actually mean anything: ‘self-documenting’ code.

Self-documenting code

When people throw this phrase around, initially I thought they had a point. Now I see it as the inverse – code that isn’t self documenting? This phrase surely is just referring to good code. If you put a PR up and people can’t understand what you’ve done, the code needs to be changed. So you change it such that people can understand the code by reading it. Tada – self documenting code. It’s just saying to prefer readable code like:

fun getEmail(id: String): Email? { }

over this nonsense:

fun sriguhe(sriguhg: String): Eoihgrf? { }

Self-documenting code seems like a term people use to try and get ‘free’ code documentation from just writing understandable code. It’s literally the bare minimum achievement. I personally don’t think self-documenting code is something to congratulate yourself for, and is instead just the minimum you should be trying to achieve.

Basically: don’t try to be clever with your code, and instead make it readable – even if that means using an extra line. Name your variables appropriately. Name your functions, classes, objects, appropriately.

Comments within the code

Now that the bare minimum has been achieved and your code is something that people can read, let’s look at going slightly further. Code comments are those lines that aren’t compiled, but just there for additional context/detail about the code.

/**
 * Retrieves a specific email by its id
 *
 * @param id The unique identifier of the email
 * @return An [Email] object if one with the provided [id] exists, else null
 **/

fun getEmail(id: String): Email? = { ... }

These sort of comments should hopefully explain what the function/object/class does. Additionally, the comment describes each parameter and explains what it is. Now yes, perhaps in this example the comment doesn’t exactly provide any additional detail that the function signature itself doesn’t already explain. But in more difficult to understand functions, having a quick explanation of what the function is going to do without having to read the implementation code can be very useful. Without the comment you could assume the [id] parameter was the id of the email, but what if it was the id of the user account? Unlikely, but the comment can help in making sure that it is, in fact, the id for the email.

There can be a few issues with this sort of code documentation. Firstly, who updates the comment? If the implementation of the function changes, and now the comment is out of date – will anyone update that? Everything will still compile just fine, and you’ll get no warnings in your IDE or linter that the comment is out of date. This issue is usually a source for a lot of confusion when reading through older codebases – the comment doesn’t match what’s actually happening. Second, when should you add a comment like this? For every single function? For functions that are public only? For functions that only you think are difficult to understand?

A rule I follow for these sorts of comments is this – if I’m writing a library, every public facing interface/class/object/function will be documented like this. If it’s not a library, and instead just internal code, I’ll try to document the parts I think are a bit tricky to understand, and in the code review process, the people reviewing my code might make a comment that the function is difficult to understand – this is the perfect time to figure out if functions need documenting or not. After all, this is the exact scenario you want to be testing against – people who’ve never read the code you just wrote before.

But the second kind of code comments are comments within the function itself. This should always be encouraged, by you or within code review. If a line of code is difficult to understand, it’s always useful to add a little comment about what it’s doing above it

fun getEmail(id: String): Email? {
    ...
    // Regex to ensure that the id of the email matches something like 7.0.42.0.
    // If the id doesn't match, assume the id is invalid and return null
    val idRegex = "^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]"
    ...
}

Here we have some regex that everyone loves, and just adding a quick comment as to what the hell this weird looking syntax is helps greatly. Adding a small example can also help in understanding what it does.

Like comments outside the function that we’ve previously looked at, these comments inside the function should always be encouraged, especially during code review. And don’t be afraid to ask for comments when you’re reviewing someone else’s PR – if you don’t understand it, it needs to be documented for the next person that comes along that bit of code, too.

Documentation outside the codebase

While code comments can certainly cover a lot of areas that need documentation, sometimes it’s not enough. As a result, tools like Confluence exist as a place to put long form documentation. The problem is that most people aren’t going to read these. If I have to go to an external site and find a specific document that tells me how the code works, I’m probably just going to try and figure it out myself.

That’s not to say external documentation is useless, just that external documentation files should try to explain processes more than what the code is doing. So for example, the process of releasing the app. Or how to add a new sensitive secret to the codebase, or a new user to the repository, or how to sign an application. These are processes that just can’t sit as code comments, since they’re explaining processes rather than the code itself.

These files can certainly exist inside the repository, the main problem here though being the file extension. An application like confluence gives you a lot of tools that .md files just can’t, and so most people will put these process-oriented files on some tool like Confluence.

Testing

The final piece of documentation for this post is tests. These are less about human-readable messages and more-so putting roadblocks in place such that if the implementation changes, the tests will fail. This is in contrast to code comments, which can easily just accidentally become out of date. Tests (assuming your CI pipeline is setup to run them and fail at the appropriate time) will fail your entire PR from being mergeable if you’ve changed the implementation, and so by nature will always be up-to-date. Tests (Unit tests, UI tests, Integration tests, etc) are an entirely other topic that exceeds the scope here, and there are a lot of do’s and don’ts when writing tests.

However, assuming tests have been written in a non-brittle way, using the actual implementations of dependencies, and testing all paths of conditional branches as well as just line coverage, then tests can be a good way of seeing what inputs will provide a certain output. That test suite will also provide future developers a small playground where they can try inputting different data and checking what output comes out of the function. They certainly aren’t a replacement for human readable comments, or external documents, but they are, at least, a way of forcing you to update the ‘documentation’ of functions that you test.

Conclusion

Basically, write comments, and definitely write tests if you can afford the time. If you can’t afford the time, try and talk to your manager to explain the value of writing tests. Self-documenting code is a useless, bare-minimum phrase. When writing code, try to stick to all four that I’ve outlined here where you see appropriate – remember that the code review phase is a great time to ask someone else to write more documentation if you’re unsure of what’s going on. But be careful with code comments, especially comments at the top of a function/class/object – remember that these can easily become out-of-date, so perhaps it’s not in your best interest to write these comments for everything unless you think it’s a difficult to understand piece of code, or you were asked to during code review. All four are very useful, but they’re just tools and it’s up to you to decide when and how to use them. But use them at least!

Leave a comment