Cleanse your Code

Table of contents

Reading Time: 10 minutes

This blog is for only those programmers who want to become better because the code cleansing ability will surely make your work worth ten times more productive.

Have you ever waded through a bad code?
Well, we all have. We slog through a morass of tangled code. We struggle to find our way, hoping for some hint, some clue, of what is going on, but all what we see is more and more senseless code.

Ever realized what is the cost of owning such mess?
As the mess builds, the productivity of the team continues to decrease, asymptotically approaching zero.

In this blog, I will talk about following code cleansing aspects :

Cleansing of naming skills
Cleansing of functions
Cleansing of comments
Cleansing of formatting skills
Cleansing of objects and data structures
Cleansing of error handling skills
Cleansing of classes
Cleansing of unit tests

What actually is “Clean Code” ?

Clean code clearly expose the tensions in the problem to be solved. It build those tensions to a climax to give the reader an obvious solution.

Clean code is focused. Each function, each class, each module exposes a single-minded attitude that remains entirely undistracted and unpolluted, by the surrounding details.

Clean code is obvious, simple, and compelling. Each module will set the stage for the next. Each will tell you how the next will be written.

The programmer writing a code should consider himself as an author, writing for readers who will judge his efforts. There’s no way to write code without reading it. So making it easy to read, actually makes it easier to write.

The key to clean code is “code sense”. If you are a smart programmer, you should understand that clarity is king.

Cleanse your Naming Skills

Names are everywhere in software. Because we do naming so much, we better do it well.

Names should reveal intent. The name of a variable, function, or class, should answer all the big questions. It should tell you why it exists, what it does, and how it is used.

Distinguish names in a meaningful way such that the reader knows what the differences offer. It is not sufficient to add number series or noise words, even though the compiler is satisfied. If names are different, then they should also mean something different.

Avoid including keywords in names. Make names pronounceable and searchable. Single-letter names and numeric constants have a particular problem that they are not easy to locate across a body of text. Avoid encoded names as they give additional overhead of deciphering. Encoded names are seldom pronounceable and are easy to mistype.

Classes and objects should have noun or noun phrase names, it should not be a verb. Methods should have verb or verb phrase names.

Shorter names are generally better than longer ones, so long as they are clear. Add no more context to a name than is necessary.

Cleanse your Functions

The first rule of functions is that they should be small. The second rule of functions is that they should be smaller than that. They should hardly ever be 20 lines long.

Functions should not be large enough to hold nested structures. The indent level of a function should not be greater than one or two. The blocks within if statements, else statements, while statements, and so on should be one line long. Probably that line should be a function call.

Functions should do only one thing and they should do it well. Functions should either do something or answer something, but not both. Either your function should change the state of an object, or it should return some information about that object. Doing both often leads to confusion.

Name of function should explain the intent of the function and the order and intent of the arguments.

The ideal number of arguments for a function is zero (niladic). Next comes one (monadic), followed closely by two (dyadic). Three arguments (triadic) should be avoided wherever possible. More than three (polyadic) requires very special justification but shouldn’t be used anyway. Arguments are hard because they take a lot of conceptual power, and writing test cases to ensure that all the various combinations of arguments work properly is difficult. You can reduce the number of arguments by creating objects out of them, that is, by wrapping them into a class of their own.

Functions should not have side effects. Side effects are the unexpected changes done by a function to the variables of its own class or to the parameters passed into the function or to system globals. In either case they are devious and damaging mistruths that often result in strange temporal couplings and order dependencies. Function with side effects can be called at only those times when it is safe to perform those side effects. Also the function with side effects violates the single responsibility rule.

Thus make your functions short, well named, and nicely organized.

Cleanse your Comments

Nothing can be quite so helpful as a well-placed comment. Nothing can be quite so damaging as an old crufty comment that propagates lies and misinformation.

If your code is expressive enough to express intent, then you would not need comments very much, perhaps not at all. The proper use of comments is to compensate for your failure to express your code.

Code changes and evolves. Chunks of it move from here to there. Unfortunately the comments can’t always follow them. And all too often the comments get separated from the code they describe and become orphaned blurbs of ever-decreasing accuracy.

It is possible to make the point that programmers should be disciplined enough to keep the comments in a high state of repair, relevance, and accuracy. But they should opt for making the code so clear and expressive that it does not need the comments in the first place.

Some comments are necessary or beneficial. Sometimes our corporate coding standards force us to write certain comments for legal reasons like copyright or authorship statements. Sometimes it is just helpful to translate the meaning of some obscure argument or return value in the form of comments. Sometimes it is useful to warn other programmers about certain consequences using comments. Sometimes it is reasonable to leave “To do” notes in the form of comments. Sometimes a comment may be used to amplify the importance of something that may otherwise seem inconsequential.

Cleanse your Formatting Skills

Code formatting is important. When people look into your code, you want them to be impressed with the neatness, consistency, orderliness and attention to detail that they perceive. You should take care that your code is nicely formatted. You should choose a set of simple rules that govern the format of your code, and then you should consistently apply those rules.

Talking about vertical formatting, it appears to be possible to build significant systems with files that are typically 200 lines long, with an upper limit of 500. Smaller files are usually easier to understand than larger files. A source file should be well-written with a simple and explanatory name. The name, by itself, should be sufficient to tell us whether we are in the right module or not. The topmost parts of the source file should provide the high-level concepts and algorithms. Details should increase as we move downward, until at the end we find the lowest level functions and details in the source file.

Talking about vertical openness, nearly all code is read left to right and top to bottom. Each line represents an expression or a clause, and each group of lines represents a complete thought. Those thoughts should be separated from each other with blank lines.

Talking about vertical density, if vertical openness separates concepts, then vertical density implies close association. So lines of code that are tightly related should appear vertically dense.

Talking about vertical distance, concepts that are closely related should belong in the same source file. Variables should be declared as close to their usage as possible. If one function calls another, they should be vertically close.

Talking about vertical ordering, function call dependencies should point in the downward direction. A function that is called should be below a function that does the calling.

Talking about horizontal formatting, you should strive to keep your lines short. You should never have to scroll to the right.

Talking about horizontal openness and density, horizontal white space is used to associate things that are strongly related and to disassociate things that are more weakly related. Assignment operators are surrounded with white spaces to accentuate them. Spaces are not provided between the function names and the opening parenthesis because the function and its arguments are closely related. White space is also used to accentuate the precedence of operators.

Talking about horizontal alignment, there is information that pertains to the file as a whole, to the individual classes within the file, to the methods within the classes, to the blocks within the methods, and recursively to the blocks within the blocks. To make this hierarchy of scopes visible, you should indent the lines of source code in proportion to their position in the hierarchy.

A good software system is composed of a set of documents that read nicely. They need to have a consistent and smooth style. The reader needs to be able to trust that the formatting gestures he or she has seen in one source file will mean the same thing in others.

Cleanse your Objects and Data Structures

Objects hide their data behind abstractions and expose functions that operate on that data. This makes it easy to add new objects without changing existing functions. It also makes it hard to add new functions to existing objects.

Data structures expose their data and have no meaningful functions. This makes it easy to add new functions without changing existing data structures. It also makes it hard to add new data structures to existing functions.

Cleanse your Error Handling Skills

You should write a code that is both clean and robust, that is, a code which handles errors with grace and style.

Error handling is important as input can be abnormal and devices can fail. But if it obscures logic, it’s wrong.

Back in the distant past there were many languages that didn’t have exceptions. Programmers either set an error flag or return an error code that the caller could check. The problem with these approaches is that they clutter the caller as the caller must check for errors immediately after the call. For this reason it is better to throw an exception when an error is encountered. In this way, the calling code is cleaner and its logic is not obscured by error handling.

It is a good practice to start with a try – catch – finally statement when you are writing code that could throw exceptions. ‘try’ blocks are like transactions, your ‘catch’ block has to leave your program in a consistent state, no matter what happens in the ‘try’ block.

Checked exceptions aren’t necessary for the production of robust software. If the lowest level function is modified in such a way that it must throw an exception and if that is a checked exception, then the function signature must add a throws clause. But this means that every function that calls your modified function must also be modified either to catch the new exception or to append the appropriate throws clause to its signature. The net result is a cascade of changes that work their way from the lowest levels of the software to the highest. Due to this, encapsulation is broken because all functions in the path of a throw must know about details of that low-level exception.

Create informative error messages and pass them along with your exceptions. Mention the operation that failed and the type of failure.

Don’t return null. When you return null, you essentially creating work for yourselves and foisting problems upon your callers. All it takes is one missing null check to send an application spinning out of control. If you are calling a null-returning method from a third-party API, consider wrapping that method with a method that either throws an exception or returns a special case object.

Don’t pass null. In most programming languages there is no good way to deal with a null that is passed by a caller accidentally. Thus in this case, the rational approach is to forbid passing null by default.

Clean code is readable, but it must also be robust. These are not conflicting goals. You can write robust clean code if you see error handling as a separate concern.

Cleanse your Classes

A class should begin with a list of variables. Public static constants should come first, then private static variables, followed by private instance variables. Public functions should follow the list of variables. Private utilities called by a public function should be placed right after the public function itself.

Classes should be small. With classes we use a different measure, we count responsibilities. We want our systems to be composed of many small classes, not a few large ones. Each small class encapsulates a single responsibility, that is, has a single reason to change, and collaborates with few others to achieve the desired system behaviors.

The name of a class should describe what responsibilities it fulfills. The more ambiguous the class name, the more likely it has too many responsibilities.

Classes should have a small number of instance variables. Each of the methods of a class should manipulate one or more of those variables. In general the more variables a method manipulates, the more cohesive that method is to its class. Cohesion should be high because it means that the methods and variables of the class are co-dependent and hang together as a logical whole.

There are concrete classes, which contain implementation details, and abstract classes, which represent concepts only. A client class depending upon concrete details is at risk when those details change. You can introduce interfaces and abstract classes to help to isolate the impact of those details.

Coupling should be low because the lack of coupling means that the elements of our system are better isolated from each other and from change. This isolation makes it easier to understand each element of the system.

Cleanse your Unit Tests

Unit tests are important because they preserve and enhance the flexibility, maintainability, and re-usability of the production code.

Without a test suite you can not ensure that changes to one part of the system do not break other parts of the system. Due to which the defect rate increases which will lead to rotting of production code. In the end you will be left with no tests, tangled and bug-riddled production code.

Tests must change as the production code evolves. The dirtier the tests are, the harder they are to change. As you modify the production code, old tests start to fail, and the mess in the test code makes it hard to get those tests to pass again. So the tests become an ever increasing liability.

There should be one assertion per test and single concept per test. Rules for clean tests can be specified as “FIRST” :

Fast – Tests should be fast. They should run quickly. When tests run slow, you won’t want to run them frequently. If you don’t run them frequently, you won’t find problems early enough to fix them easily.
Independent – Tests should not depend on each other. One test should not set up the conditions for the next test. You should be able to run each test independently and run the tests in any order you like.
Repeatable – Tests should be repeatable in any environment. You should be able to run the tests in the production environment and in the QA environment.
Self-validating – The tests should have a boolean output. Either they pass or fail. If the tests aren’t self-validating, then failure can become subjective and running the tests can require a long manual evaluation.
Timely – The tests need to be written in a timely fashion. Unit tests should be written just before the production code that makes them pass. If you write tests after the production code, then you may find the production code to be hard to test.

Test code is just as important as production code. It is not a second-class citizen. It requires thought, design, and care. It must be kept as clean as production code. Having an automated suite of unit tests that cover the production code is the key of keeping your design and architecture as clean as possible.

Always remember, the functionality that you create today has a good chance of changing in the next release, but the readability of your code will have a profound effect on all the changes that will ever be made.