Introduction to Fundamentals of Software Optimization

This is the introduction to a three-post series covering the fundamentals of software optimization. You can find Part I here. You can find the companion GitHub repository here.

Performance is a major topic in software engineering. A quick Google search for “performance” in GitHub issues comes up with about a million results. Everyone wants software to go fast – especially the users!

Gotta Go Fast!

However, as a general problem, software optimization isn’t easy or intuitive. It turns out that software performance follows the Pareto principle — 90% of time is spent in 10% of code. It also turns out that in a large program, people — even professional software performance analysts who have spent their careers optimizing software — are really bad at guessing which 10% that is. So folks who try to make code go faster by guessing where the code is spending its time are actually much more likely to make things worse than better.

Fortunately, excellent tools exist to help people improve software performance these days, and many of them are free. This three-part series will explore these tools and how to apply them to optimize software performance quickly and reliably.

Software Performance Analysis vs Optimization

This series does not cover when to optimize software, which is a business decision; or, given that optimization needs to happen, what to optimize, or how to arrive at that conclusion. These broader ideas are part of a topic called “software performance analysis.” They are a critically important part of the process of improving software performance, but they are also complex and beyond the scope of a short blog series.

Instead, this series merely covers how to optimize after these decisions have been made, and only at a high level. In short: the Fundamentals of Software Optimization.

Software Performance Optimization is Just Science

Changing a program’s code and shipping it, just hoping it goes faster is not optimization. Proper optimization requires a careful analysis of software performance and a reasoned argument for why a program with changes will run faster than the original in production. This kind of study is called software performance analysis, and at its core, software performance analysis is just the scientific method applied to software performance.

Science!

The scientific method is applied to understand complex systems in the following way:

  • Observe system
  • Make one careful change
  • Observe system again to determine effect of change on system
  • Learn
  • Repeat

The scientific process of software optimization works the same way. First, a tool called a benchmark is used to observe and measure a program’s performance. Next, another tool called a profiler is used to identify good places to make program changes to improve performance, and a program change is made. Finally, the benchmark is re-run to determine if the whole program got faster or slower. The developer doing the optimization learns from this result, and this process is repeated over and over again until an acceptable set of program changes that achieves the optimization goals has been identified.

This series will cover in detail how to apply the principles of science and the tools of the performance trade to make software go fast.

A Preview of What’s to Come

This series’ posts will cover the following topics:

  1. Benchmarking — Measuring Software Performance Using JMH
  2. Optimizing Wall Clock Performance — Making Code Go Fast Using VisualVM
  3. Optimizing Memory Allocation — Improving Software Allocation Performance Using VisualVM

This being a (mostly) Java blog, this series will use the Java platform and tools ecosystem to drive the discussion. However, the optimization process this series describes applies to all languages and all platforms. (Having been a software performance analyst for IBM in a past life, I can make that claim with some confidence!) So even if Java isn’t your main language, you should be able to learn some practical lessons from this series.

It’s hard to discuss software performance in the abstract, so this series will use a string processing algorithm to extract the emoji from a string as the example production workload to optimize. It was chosen because the problem is defined by a standard, so workload scope is clear; everybody is familiar with emoji in this day and age, so the concept is easy to grasp; and I just finished writing and optimizing emoji4j, an emoji processing library, so I know the problem has some good fodder for discussion.

For the purpose of this series, imagine that the production workload to optimize is an API endpoint like the below:

  /**
   * @param text Text from a social media post
   * @return The number of emoji in the given text
   */
  @GET
  public Integer countEmoji(String text) {
    int count=0;
    
    GraphemeMatcher m=new GraphemeMatcher(text);
    while(m.find()) {
      count = count+1;
    }

    return count;
  }

Normally, software optimization is undertaken to achieve some business result: reduced server costs, improved UI responsiveness, etc. — which is captured in a formal planning document. This series will ignore the business context surrounding software performance analysis as a function and instead focus on how to actually optimize code for the purposes of education. I may cover the planning process in a separate post in the future — be sure to leave a comment if that sounds interesting!

For the purpose of this series, the optimization goal will be simply to optimize the production workflow’s wall clock performance and memory usage performance “a lot.”

The source code for this project will be added to this blog series’ companion GitHub repository. The skeleton of the workflow is already committed to the main branch for curious readers. New code will be added as each blog post is published.

Next Steps

This series will continue in Part I — Benchmarking. See you there!