Haoyi's Programming Blog

Table of Contents

From First Principles: Why Scala?

Posted 2021-02-05
Message-based Parallelism with ActorsIntroducing the com-lihaoyi Github Organization

Scala, first appearing in 2004, is neither an old stalwart nor a new player in the programming language market. This post will discuss the unique combination of features that Scala provides and how it compares to other languages on the market, diving beneath the superficial experience to explore the fundamentals of the language. From this, you will learn why you might consider including Scala as a valuable addition to your programming toolbox.


About the Author: Haoyi is a software engineer, and the author of many open-source Scala tools such as the Ammonite REPL and the Mill Build Tool. If you enjoyed the contents on this blog, you may also enjoy Haoyi's book Hands-on Scala Programming


The Scala website summarizes the Scala pitch as follows:

Scala combines object-oriented and functional programming in one concise, high-level language. Scala's static types help avoid bugs in complex applications, and its JVM and JavaScript runtimes let you build high-performance systems with easy access to huge ecosystems of libraries.

Scala is a language that provides a unique combination of features that most programmers would appreciate. It combines the performance and large-scale maintainability associated with compiled languages, the tooling and ecosystem of the Java language and virtual machine, and conciseness and ease of use typically associated with scripting languages.

I have been working in the Scala community for the past decade, maintain many of the open source libraries and tools that power the ecosystem, used Scala professionally in a variety of environments, and have seen the language and community evolve over time. Scala has had its hurdles in the past - glacial compilation speeds, confusing libraries and frameworks, and a community more focused on hype than real work. But in the past five years, the Scala ecosystem has managed to move beyond many of these long-standing issues, emerging as a cleaner and more streamlined experience great for Getting Things Done™.

In this post, we will first discuss the user-facing selling points that a programmer may appreciate when using Scala, dive into the fundamentals principles that allow the Scala language to be what it is, before closing with a comparison of how this principles compare with other programming languages you may be considering for your next project.

User-facing selling points of the Scala language

In this section, we will discuss some of the user-facing features of the Scala language. These are mentioned briefly in my book Hands-on Scala Programming, but I will go into more detail in this post.

A Compiled Language that feels Dynamic

Scala is a language that scales well from one-line snippets to million-line production codebases, with the convenience of a scripting language and the performance and scalability of a compiled language. Scala's conciseness makes rapid prototyping a joy, while its optimizing compiler and fast JVM runtime provide great performance to support your heaviest production workloads. Rather than being forced to learn a different language for each use case, Scala lets you re-use your existing skills so you can focus your attention on the actual task at hand.

The dichotomy between compiled languages and scripting languages is pervasive throughout the software industry. At a glance, the tradeoffs are as follows:

Compiled Language Scripting Language
C++, Java, C#, ... Python, Ruby, JS, ...
Verbose Concise
Excellent performance Poor performance
Statically typed Dynamically typed
Great IDE and Tooling Support Poor IDE and Tooling Support
Heavyweight build setups Minimal or Lightweight build setups
Inconvenient in small programs Convenient in small programs
Manageable in large programs Unmanageable in large programs

It is always a dilemma as a programmer which one to pick: almost every large program starts off as a small program! It is often not obvious which small programs will become large and which ones never will. A lot of time and effort is wasted developing small programs in compiled languages that never grow big enough for the compiled language overhead to be worth it, or developing large programs in scripting languages because they grew slowly over time well beyond the size that the scripting language is suitable for.

Hybrid Languages

There is a third category of languages: compiled languages that provide type inference, convenient libraries, and other quality of life improvements that blur the line between the Compiled and Scripting languages:

Compiled Language Scripting Language Hybrid Languages
C++, Java, C#, ... Python, Ruby, JS, ... Scala, F#, Kotlin, ...
Verbose Concise Concise
Excellent performance Poor performance Excellent performance
Statically typed Dynamically typed Statically typed with inference
Great IDE and Tooling Support Poor IDE and Tooling Support Great IDE and Tooling Support
Heavyweight build setups Minimal or Lightweight build setups Minimal or Lightweight build setups
Inconvenient in small programs Convenient in small programs Convenient for small programs
Manageable in large programs Unmanageable in large programs Manageable in large programs

Scala is one of this style of "hybrid" languages, blending compiled and scripting languages in a way that aims to give you the best of both worlds. You can begin writing useful Scala in a single line of code in the interactive REPL such as Ammonite:

@ requests.post(
    "https://api.github.com/repos/lihaoyi/test/issues",
    data = ujson.Obj("title" -> "hello"),
    headers = Map("Authorization" -> s"token $token")
  )

Growing that to a Scala Script if you want to keep the code around:

#!/usr/bin/env amm
val token = os.read(os.pwd / "github-token.txt")
requests.post(
   "https://api.github.com/repos/lihaoyi/test/issues",
   data = ujson.Obj("title" -> "hello"),
   headers = Map("Authorization" -> s"token $token")
)
./script.sc # run the script

Before growing that into a project with a build tool such as Mill, providing incremental compilation, testing, packaging, and other features that become more and more necessary as your project grows into the 100,000s of lines of code.

Why Use a Hybrid Language?

In a traditional Compiled v.s. Scripting language world, these three use cases would likely have been totally different languages:

Converting between these languages usually involves a costly rewrite. It is always a headache trying to figure out when such a rewrite is necessary. We have all seen Bash scripts that have grown too large and would be better written in Python, or Python applications that have grown too large and would be better written in a compiled language like Java or C#!

Lightweight compiled languages like Scala aim to cut this Gordian Knot: you can use the same language for your first line of code in the interactive REPL as you use for your million-line backend distributed systems, and everything in between:

At small scales, the Scala REPL and scripts are just as convenient to use as Python or Ruby, while large scale Scala systems are just as maintainable and performant as large-scale systems written in Java or C#.

Scala is not the only language in this category: F# counts, and Swift and Kotlin straddle the line between hybrid and more traditional compiled languages. But Scala is from my experience largely successful at breaking this dichotomy and providing a language that scales from the smallest throwaway one-liner to the largest and most important production systems.

Easy Safety and Correctness

Scala's functional programming style and type-checking compiler helps rule out entire classes of bugs and defects, saving you time and effort you can instead spend developing features for your users. Rather than fighting TypeErrors and NullPointerExceptions in production, Scala surfaces mistakes and issues early on during compilation so you can resolve them before they impact your bottom line. Deploy your code with the confidence that you won't get woken up by outages caused by silly bugs or trivial mistakes.

A large focus of Scala is on the correctness of the code you write.

Any software engineer would know that writing "a program" is not the difficult part of their job. The difficulty is making sure their program is correct. Not just doing what you think it should do for all inputs it is expected to receive, but also staying correct as your code evolves over a period of time and is worked on by many different people.

Scala helps with this problem in two ways:

  1. Static Typechecking: Scala's compiler lets you check for many "trivial" errors before your program even runs, preventing silly bugs from slipping through and letting the programmer focus on the more subtle and complex issues

  2. Functional Programming: a style of designing applications which makes it easier to reason about your application, how it works, and how it can be modified.

I will discuss each approach in turn.

Static Typechecking

Static Typechecking is having a program analyze your code to find issues before you even run it. In compiled languages this is often the compiler itself, but in other languages it may not be: e.g. the Python language provides this in the form of MyPy, a separate checker that can be run whenever you want, but isn't a requirement to execute your code.

Static typechecking is an often religious debate. At a small scale, it's often up to personal preference: some people prefer static types, some people prefer dynamic languages that do not need nor allow typechecking. Often static typed languages are more verbose than dynamic languages, due to needing type annotations in various parts of your code, although hybrid languages like Scala and F# have type inference that reduces this verbosity.

At small scale, the personal preference between static typechecking doesn't matter. At larger scale, the lack of static typechecking is a huge pain point for anyone maintaining a system. While that was not so clear in the past, today we can see every dynamic language adding support for static typing:

We would not see so many large companies spending millions of dollars a year (5-10 full-time silicon valley software engineering salaries) creating type checkers for their dynamic languages if they did not see value in them. The fact is, once you're at a certain scale, the lack-of-static-typechecking in dynamic languages often ends up costing enough millions in lost productivity that this expenditure is worth it.

Another interesting data point is the empirical list of "top 10 Javascript errors" from the Rollbar error logging company:

These are as follows:

  1. Uncaught TypeError: Cannot read property
  2. TypeError: ‘undefined’ is not an object
  3. TypeError: null is not an object
  4. (unknown): Script error
  5. TypeError: Object doesn’t support property
  6. TypeError: ‘undefined’ is not a function
  7. Uncaught RangeError
  8. TypeError: Cannot read property ‘length’
  9. Uncaught TypeError: Cannot set property
  10. ReferenceError: event is not defined

8 of the 10 most common Javascript errors, (1) (2) (3) (5) (6) (8) (9) (10), are all things that a static typechecker would typically catch. No matter how much developers are doing their best to maintain the quality of their Javascript code, it is clear that tons of "dumb" type errors are still slipping through to production. It's no wonder that Typescript has taken off and become so popular so quickly!

While there is sometimes discussion about whether static types or unit tests are a better approach, in reality there is no dichotomy. Static typechecking complements unit tests, and benefits a well-tested codebase as much as it benefit one that is poorly or un-tested. Together with code review, continuous integration, functional programming (which I will discuss below) and other methodologies, these have a cumulative effect of improving your confidence in the correctness of your software.

Even for smaller company which don't have millions to spend add static typing to their language, static typechecking is often worth it if already available. With Scala, static typing is built-in for free.

Functional Programming

In short, "functional programming" is programming with a focus on functions that take parameters and return computed values, instead of mutating fields and variables. This tends to make it easier to see what a piece of code needs (its function parameters) and what it does (the value it computes), making it easier to understand, refactor, or add new features to a codebase.

For a more detailed discussion, check of the following post:

Functional programming is not unique to Scala, nor is Scala the language most deeply invested in functional programming. Nonetheless, Scala has a distinctly functional taste: the built-in collections are immutable by default, code often focuses on transforming values rather than mutating variables, and the standard library and ecosystem has many tweaks that allow this functional style to be both convenient and performant.

Most languages are picking up functional-programming features these days:

Languages are picking up functional features because their core developers are finding them a great complement to existing object-oriented and procedural styles. In Scala, Functional Programming has always been a major part of the language and ecosystem, on equal footing with the Object Oriented style that more people may be familiar with. Most teams strike a balance using each approach where appropriate. Scala developers thus reap the benefits of functional programming in helping them organize their code to tackle difficult, complex problems.

A Broad and Deep Ecosystem

As a language running on the Java Virtual Machine, Scala has access to the large Java ecosystem of standard libraries and tools that you will inevitably need to build production applications. Whether you are looking for a Protobuf parser, a machine learning toolkit, a database access library, a profiler to find bottlenecks, or monitoring tools for your production deployment, Scala has everything you need to bring your code to production.

The last major user-facing selling point of Scala is the richness of the ecosystem. When working with code in production, having just a programming language compiler or interpreter is not enough. In a typical week, I will use:

  1. Hundreds of third-party libraries, for all sorts of functionality the language doesn't have built in.
  2. A package repository, for users to publish and share their code
  3. Build tools to incrementally compile your code or do so in parallel
  4. A profiler, to investigate and resolve performance issues
  5. Monitoring tools, to keep an eye on my code when it's been deployed to production

Scala has all of these, partially on the strengths of its own ecosystem, and partially on the strengths of the Java ecosystem it piggy-backs on.

Maven Central

The Maven Central Java package repository is one of the largest package repositories in the world, with open source libraries available to do literally anything that someone has thought of to do. e.g. if I suddently manipulate PDF files, I can pull in Apache PDFBox off of Maven Central and be off to the races:

import $ivy.`org.apache.pdfbox:pdfbox:2.0.18`

val outPath = os.pwd / "combined.pdf"
val pdfFiles = os.list(os.pwd / "inputs")
val merger = new org.apache.pdfbox.multipdf.PDFMergerUtility
for (pdf <- pdfFiles merger.addSource(pdf.toIO)
val out = os.write.outputStream(outPath)
try {
  merger.setDestinationStream(out)
  merger.mergeDocuments()
} finally out.close()

Profilers and Monitoring

Profilers like Java Flight Recorder allow low-overhead monitoring of performance profiles in production, while others like JProfiler or Yourkit can be applied quickly to give immediate insight into the performance characteristics of your application:

Profile2.png

Build Tools

Build tools like SBT, Mill, Gradle, Maven, or Bazel provide a menagerie of project management options, with different tradeoffs to suite your requirements.

Many person-decades of work have gone into each and every facet of the Scala/Java ecosystem, and a developer working with Scala benefits from all of it for free.

Fundamental Principles of the Scala Language

The previous section of this post discusses how Scala's unique features directly benefit end users. We will now cover what are the core principles behind Scala that make all of this possible.

All in on Static Analysis

Scala more than most languages focuses on static compilation and analysis. Almost all Scala language features are resolved statically, with even the "monkey-patch"-esque extension methods and implicit conversions implemented at compile-time rather than runtime. Even compared to other compiled languages like Java, Scala eschews the kind of runtime reflection common in Java code in favor of implicits and other compile-time features.

While doing things at compile time can sometimes be complex, with language features such as implicits and macros having a well-deserved reputation for complexity, such an approach nevertheless offers some fundamental advantages.

Why Static Analysis

Fundamentally, static compilation and analysis means the computer can do more reasoning about your code than if the language did things dynamically at run time.

That means:

  1. Scala like most compiled languages is able to run an order of magnitude faster than dynamic scripting languages, due to the static nature of the code making it easier for the compiler (or JIT compiler) to optimize

  2. IDEs like IntelliJ are able to analyze Scala code precisely and accurately. Jump-to-definition, find-usages, refactorings, etc. work much better in statically-typed languages than in dynamic languages.

  3. Compile-time generation of things like JSON serializers has excellent performance, whereas runtime-reflection-based implementations tend to be extremely slow. This applies in every language, static or dynamic.

  4. The compiler can reason about what the code can do, what the code cannot do, and easily do things like compile your Scala to high-performance Javascript in Scala.js, or to a static binary using Graal Native Image, with perfect compatibility. In contrast, cross-compilers for dynamic languages like Clojure tend to have a long-list of caveats and incompatibilites

In general, having a language focused around static compilation and static analysis means the computer can help you out more in whatever you are doing: whether you are trying to find usages of a variable, renaming some method, or just trying to make your code blazing fast. These are all things that software engineers do day-in and day-out, and it's great to have tools that are able to help automate this work as much as possible!

Other Languages Pushing Static Analysis

Scala isn't the only language which has realized the multi-faceted benefits of static compilation. Even Python, the poster-child of dynamic languages, has:

This is a story that has played out many times in the past, and we can expect it to play out many times in the future. Fundamentally, the promise of having the computer do more work to help the programmer understand, debug, refactor, or optimize their code is too great to ignore. But In order to realize those gains, we need a language that the computer is effectively able to statically analyze. Scala is already such a language.

Inference for Ease of Use

Perhaps the most interesting thing about Scala is that it leans heavily on inference in order to make things easy to use. Consider this snippet that uses Scala to render a simple HTML fragment:

os.write(
  os.pwd / "index.html",
  body(
    h1("Blog"),
    for (postTitle <- postTitles)
    yield h2(postTitle)
  )
)

Without type inference, it would look something like:

os.write(
  os.pwd / "index.html",
  body(
    h1(new StringFrag("Blog")),
    new SeqFrag(
      for (postTitle: String <- postTitles)
      yield h2(new StringFrag(postTitle))
    )
)

While both snippets are understandable, it is clear how much work the inferred new StringFrag and new SeqFrag constructors and the inferred : String annotation are doing to keep our code clean and concise. The top snippet feels like something you'd see in a dynamic language, while the bottom snippet feels distinctly Enterprise Java-y.

These days, type-inference is becoming table stakes for compiled languages: newer typecheckers for dynamic languages like Ruby or Python use type inference heavily, and even more traditional compiled languages like Java are moving in that direction. Scala is a language built from the ground up around type inference, which ends up being much more coherent and well-designed in Scala than in other languages bolting it on after-the-fact.

Value Inference

Scala is perhaps unique among programming languages that it doesn't just infer type annotations ("type inference"), but it can also infer constructors/conversions (as seen above) or other values.

Consider the following snippet, which defines an Example[T] trait with a single value, and a function exampleFor[T] that lets us grab the value for any T for which we have an example defined:

@ trait Example[T]{ def value: T }

@ def exampleFor[T: Example] = implicitly[Example[T]].value

We can define examples for Int and String using the implicit syntax below:

@ implicit object ExampleInt extends Example[Int]{def value = 1213}

@ implicit object ExampleString extends Example[String]{def value = "moo"}

@ exampleFor[Int]
res6: Int = 1213

@ exampleFor[String]
res7: String = "moo"

exampleFor[Int] looks for any implicit value with the type Example[Int], finds ExampleInt, and returns the value 1213. Ditto for exampleFor[String], so far so good.

Where this gets interesting is when you define Examples that depend on other Examples, e.g.

@ implicit def ExampleTuple[T: Example, V: Example]: Example[(T, V)] = new Example[(T, V)]{
    def value = (implicitly[Example[T]].value, implicitly[Example[V]].value)
  }

Here we are defining an Example[(T, V)], or an example of a 2-element tuple, for elements of any type for which there is already an Example defined. Now, not only can we ask for exampleFor[(Int, String)] or exampleFor[(String, Int)]:

@ exampleFor[(Int, String)]
res9: (Int, String) = (1213, "moo")

@ exampleFor[(String, Int)]
res10: (String, Int) = ("moo", 1213)

We can also ask for exampleFors of deeply nested tuples, as long as they are made of Ints, Strings and 2-element tuples:

@ exampleFor[((String, String), (Int, Int))]
res11: ((String, String), (Int, Int)) = (("moo", "moo"), (1213, 1213))

@ exampleFor[((String, Int), (Int, String))]
res12: ((String, Int), (Int, String)) = (("moo", 1213), (1213, "moo"))

@ exampleFor[((String, Int), (((Int, String), Int), String))]
res13: ((String, Int), (((Int, String), Int), String)) = (
  ("moo", 1213),
  (((1213, "moo"), 1213), "moo")
)

While most programming languages allow you to take an expression (1213, "moo") and infer the type (Int, String), Scala allows you to go the other direction: take the type (Int, String) and infer a value (1213, "moo")! This is tremendously useful in a whole range of different scenarios, and allows the Scala language to be as concise as dynamic languages while still preserving its statically-typed nature.

For a deeper dive into how this value-inference technique works and is useful, check out this section in my book on Typeclass Inference:

Do What I Mean

Being able to specify the type of the value you want, and have your compiler automatically generate the code to provide it, is something unique to Scala. Scala achieves this not through clever hacks or revolutionary AI, but instead through a well-defined set of rules in where your program can infer types and values. Fundamentally, this kind of flexible program-inference is the closest we can get to a holy grail of a "do what I mean" language.

Lean Heavily on the Host Language

The last thing that Scala does well is to lean heavily on the host language for both its language semantics as well as its implementation. Scala is typically run on the JVM, which together with the Java ecosystem provides a host of useful things that Scala doesn't need to worry about:

  1. Multithreading
  2. Package Management and a Package Ecosystem
  3. IDEs and editors
  4. Profilers for CPU and Memory usage
  5. JIT compilation
  6. AOT compilation (e.g. using Graal Substrate VM)
  7. Debuggers
  8. Garbage collectors (several to choose from!)

While the JVM has limitations, I think that it is clear that Scala has benefited massively from piggy-backing on the JVM and Java ecosystem. Consider the state of multi-threading in Ocaml, which has been "work in progress" for years:

Piggy-backing on the JVM, Scala inherited a fully-working, well-defined and battle-tested multi-threading implementation. Rather than spending person-decades on re-implementing a multithreaded memory model, threading, atomics, locks, semaphores, volatiles, etc., the Scala community could focus on the usability and ergonomics that really makes the language unique.

These days Scala also runs in the browser via Scala.js, which is another example of successfully bootstrapping a rich community on top of an existing well-trodden ecosystem.

Conclusion: All Languages lead to Scala

Every programming language is subject to the same pressures: for better performance, tooling, convenience, correctness, or maintainability at scale. I have already discussed why I think Scala has both a good superficial featureset, as well as a sound underlying approach. Scala focuses on helping software engineers with their most practical challenges, with good fundamentals to ensure that it gets good mileage out of the efforts of the community.

Programming languages are diverse, but in many ways they are getting less diverse over time. Consider the state of the mainstream programming languages a decade ago, circa 2010:

Execution Style Types Perf Records Pattern Matching Lambdas
C++ Compiled Static Great Yes No No
Java Compiled + JIT Static Good No No No
C# Compiled + JIT Static / Inferred Good No No Yes
Python Interpreted Dynamic Poor No No Yes
Ruby Interpreted Dynamic Poor No No Yes
JS Interpreted + JIT Dynamic Ok Yes No Yes
Scala Compiled + JIT Inferred Good Yes Yes Yes

At the time, Scala was definitely the odd one out: with inferred types for conciseness, and functional programming features like records (case classes in Scala), pattern matching, and lambda functions. The divide between the compiled languages and the interpreted languages was stark. But over time, as the industry evolved, the landscape of the same languages looks very different today, post-2020:

Execution Style Types Perf Records Pattern Matching Lambdas
C++ Compiled Static Great Yes No Yes
Java Compiled + JIT Static / Inferred Good Yes Yes Yes
C# Compiled + JIT Static / Inferred Good Yes Yes Yes
Python Interpreted / Compiled Dynamic / Inferred Poor Yes Yes Yes
Ruby Interpreted / JIT Dynamic / Inferred Poor No Yes Yes
JS Interpreted + JIT Dynamic / Inferred Ok Yes No Yes
Scala Compiled + JIT Inferred Good Yes Yes Yes

The fact that every language is evolving should not come as a surprise to anyone, but what may be surprising is that every language is evolving towards the same place: they are all becoming JIT-compiled-ish, inferred-typed languages with functional programming features like records, pattern matching, and lambdas. Obviously the details of this diverse set of languages vary tremendously, but the direction they are heading is clear. Newer languages like Kotlin or Swift also tend to fit the same mold.

Scala already satisfies most of these criteria. Whether you are building a programming language, a website, or a large-scale distributed system, Scala is an excellent foundation to make it happen.

In the last few years, we have seen the Ruby and Python core developers moving mountains trying to implement JITs and static-compilation for performance. The Java and C# folks have been trying hard to make their language more ergonomic and approachable. Everyone has been trying to graft on functional-programming features.

Despite these efforts, retro-fitting these features onto an existing language never fits quite as nicely as a language that was designed with them from scratch. The type system grafted onto Python doesn't have quite the same elegance or performance benefit of a true compiled language, and the ergonomic improvements to Java are but a small respite that doesn't bring it close to the convenience of a scripting language. Despite the herculean efforts, trying to catch up in this way is truly a sisyphean task.

If you trace their efforts to the ultimate conclusion, there is already a language there that does everything they'd want: it's Scala.


About the Author: Haoyi is a software engineer, and the author of many open-source Scala tools such as the Ammonite REPL and the Mill Build Tool. If you enjoyed the contents on this blog, you may also enjoy Haoyi's book Hands-on Scala Programming


Message-based Parallelism with ActorsIntroducing the com-lihaoyi Github Organization