Jars

Here’s a fairly common scenario: I have a written a scala program and I want to run it. I know most computers can run java programs, and I know scala can run on the JVM, so I want to run my program using java. How do I do it?

Short answer: use the sbt-assembly plugin.

But… what does this plugin do? Why does the it take so long to build the assembly? And why do I need to use SBT at all? Let’s find out! We’ll touch on a few topics:

  • Java Revision
    • Compiling java programs with javac
    • The java classpath
    • Compile vs runtime dependencies
    • Jar files and the manifest
    • Running with the java command
  • Scala
    • Compiling with scalac and the scala runtime
    • Building a jar

So for the few of you who are still here, let’s drag ourselves kicking and screaming to java land!

Java Revision

Compiling java programs can be embarrassingly easy to mess up. Avoid embarrassment by following these rules:

  1. Package name = path to source
  2. Dependencies go on the class path

Package Name = Path to Source

Lets start by compiling a simple java class with no dependencies.

The class Printing.java is in package com.benmosheron.printing, this means the class declaration must be in this exact directory structure:

/com
  /benmosheron
    /printing
      /Printing.java

Check your relative location. It’s easy to get confused, and while you can compile from any directory, you’ll save yourself some pain if you compile from the location which contains the first directory of the package. In this case, it’s the directory containing com, I’m going to refer to it as the project root.

It’s easier to show than tell, so have a look at the code samples for an example.

We compile java programs using the java compiler: javac. We need to provide the path to a source file, and I’m also going to provide a location - the directory named out.

Because it has no dependencies, Printing.java can be compiled like so:

javac -d out com/benmosheron/printing/Printer.java

  • -d out puts the compilation results (class files) in the out directory
  • com/benmosheron/printing/Printing.java is the path to source file

The result of compilation is a class file, which will be nested under the exact same directory structure (note that the out directory needs to exist already before compiling).

/com
  /benmosheron
    /printing
      /Printer.java [com.benmosheron.printing]
/out
  /com
    /benmosheron
      /printing
        /Printer.class

Dependencies Go on the Class Path

The Printer class doesn’t do anything on it’s own, so we can’t run it, but we can create an application - SimpleJavaApp - which imports Printer and uses it. It also has a main method which can be used as an entry point. I’m going to skip over the code (you can find it here).

In order to compile SimpleJavaApp, the compiler needs to know where to find Printer, and the list of paths it searches is called the class path.

With the line import com.benmosheron.printing.Printer; we are telling java to search, starting from every directory (or jar file) listed on the class path, for the file com/benmosheron/printing/Printer.java. If it cannot find a source file, java will also search for an already compiled com/benmosheron/printing/Printer.class file to use. Just remember that java will only ever search on the class path.

By default, the class path contains the current directory (root), this works well in our case as javac will find Printer.java and is smart enough to know to compile it first, so we can compile SimpleJavaApp like so:

# Using default class path:
javac -d out com/benmosheron/scalaforjava/SimpleJavaApp.java
# Equivalent to (current file is already on the class path):
javac -d out -cp . com/benmosheron/scalaforjava/SimpleJavaApp.java
# Also equivalent to (Printer is found automatically):
javac -d out -cp . \
  com/benmosheron/printing/Printer.java \
  com/benmosheron/scalaforjava/SimpleJavaApp.java

We have to option of explicitly telling javac that it should use the current directory (root) as the class path, by specifying the -cp . parameter. The result of compilation will be two class files.

/com
  /benmosheron
    /printing
      /Printer.java [com.benmosheron.printing]
    /scalaforjava
      /SimpleJavaApp.java [com.benmosheron.scalaforjava]
/out
  /com
    /benmosheron
      /printing
        /Printer.class
      /scalaforjava
        /SimpleJavaApp.class

Now we have something to run! We can use the java command, specifying the full name of the class containing our main method. Again, java will look on the class path to find a class file whose exact path matches the package name, so we need to point the class path to the out directory, which holds our class hierarchy.

java -cp out com.benmosheron.scalaforjava.SimpleJavaApp

Compile Time Dependencies

What if we want to include some external dependencies? It’s a bit unruly to have all these class files with a tricky directory structure, so compiled java classes tend to be packaged up into jar files. A jar file is just a zipped up directory containing a bunch of classes organised to follow the “package name = path” rule.

The class JavaApp has a compile time dependency on the interface RuntimeInterface, which I’ve taken the liberty of bundling up into Compile.jar (source). It lives under a new directory:

/com
  /benmosheron
    /printing
      /Printer.java [com.benmosheron.printing]
    /scalaforjava
      /JavaApp.java [com.benmosheron.scalaforjava]
      /SimpleJavaApp.java [com.benmosheron.scalaforjava]
/lib
  /compile
    /Compile.jar

If you were to unzip Compile.jar, you would find it contains the class file com/benmosheron/runtime/RuntimeInterface.class, where RuntimeInterface’s package is com.benmosheron.runtime.

Javac needs a little help here, we just need to make sure Compile.jar is referenced on the classpath:

javac -d out -cp .:lib/compile/Compile.jar \
  com/benmosheron/scalaforjava/JavaApp.java

Here, the -cp .:lib/compile/Compile.jar is telling javac to put the root (.) on classpath, as well as our jar dependency. The : is just a separator. If you’re running on Windows, you’ll need to use ; instead.

Once that’s done, we can run JavaApp. We need to tell java to add Compile.jar to the class path, or it won’t be able to load the interface.

java -cp out:lib/compile/Compile.jar com.benmosheron.scalaforjava.JavaApp

You should find that the app starts OK, but fails after printing “Failed! ClassNotFoundException”. Weird - it compiled fine, why doesn’t it run?

Runtime Dependencies

There are two types of dependency in java:

  • compile time dependencies, which must be present in order for a program to compile
  • runtime dependencies, which are only required when the program runs

In JavaApp, we load an interface RuntimeInterface from Compile.jar, but we don’t explicitly instantiate an implementation of this interface. Instead, using a system called reflection, we dynamically load an implementation from Runtime.jar.

This is a fairly complex process which I won’t elaborate on here, the important thing is that it allows us to swap in different implementations without recompiling the app. We could just swap Runtime.jar out for another jar containing a different implementation.

/com
  /benmosheron
    /printing
      /Printer.java [com.benmosheron.printing]
    /scalaforjava
      /JavaApp.java [com.benmosheron.scalaforjava]
      /SimpleJavaApp.java [com.benmosheron.scalaforjava]
/lib
  /compile
    /Compile.jar
  /runtime
    /Runtime.jar

Now we can run it properly, without recompiling. We just need to include Runtime.jar on the classpath:

java -cp out:lib/compile/Compile.jar:lib/runtime/Runtime.jar \
  com.benmosheron.scalaforjava.JavaApp

This time the app should have everything it needs to successfully run.

Packaging JAR Files

We can use the jar command to package our app into a jar file, so we don’t have to keep track of the hierarchy of class files.

The big caveat when running jars is that you can’t specify the class path when you run them. This is a problem because we can’t include other jars inside our own - we have to provide them alongside it, and tell java to put them on the class path.

Instead, we use a special file to do this, called the manifest. You can set a few other things in the manifest, but the only ones we need are the class path and the app’s entry point (class containing the static main method).

The contents of our manifest (manifest.txt) are:

Class-Path: lib/compile/Compile.jar lib/runtime/Runtime.jar
Main-Class: com.benmosheron.scalaforjava.JavaApp

These are fairly self explanatory, with a few things to note:

  1. Dependencies are space separated (rather than : like the -cp argument).
  2. The newline at the end is mandatory, if you don’t include it, that last line will be ignored!

The command to create the jar is:

jar cvfm JavaApp.jar manifest.txt -C out com

Where cvfm is a bunch of flags:

  • c create a new jar
  • v verbose logging
  • f specify the output file name as the first argument
  • m specify the manifest file as the second argument
  • -C out com temporarily changes to the out directory, and includes all entities in the com directory in the jar.

This will create JavaApp.jar which we can run with:

java -jar JavaApp.jar

Just make sure you also provide the lib directory containing the two dependencies in the same directory as JavaApp.jar. For example, you could zip up the following directory structure to share you jar file:

/JavaApp.jar
/lib
  /compile
    /Compile.jar
  /runtime
    /Runtime.jar

Back to Scala

What a detour! You’ve probably cottoned on to why this matters for scala. To run scala programs with the java command, you need to include the scala library as a runtime dependency of your application.

I’ve set up a simple scala app here, with this structure:

/com
  /benmosheron
    /scalaforjava
      /ScalaApp.scala
/lib
  /compile
    /Compile.jar
  /runtime
    /scala-library.jar
/manifest.txt

I’m including the same Compile.jar from the java example as a compile time dependency, to show how everything behaves.

The scala runtime is just a jar file (scala-library.jar), you can get it here (look for the scala binaries for your platform). Copy scala-library.jar into the lib/runtime folder for the example code to work.

We can compile our scala app in much the same way as the java app:

scalac -d out -cp lib/compile/Compile.jar \
  com/benmosheron/scalaforjava/ScalaApp.scala

Everything here is exactly the same as it was with the java example (you’ll notice a few extra classes get generated). We can run our app just as easily.

Using the scala command:

scala -cp out:lib/compile/Compile.jar \
  com.benmosheron.scalaforjava.ScalaApp

Or using the java command, we just have to make sure the scala runtime is on the class path:

java -cp out:lib/compile/Compile.jar:lib/runtime/scala-library.jar \
  com.benmosheron.scalaforjava.ScalaApp

Or we can package our classes into a jar and run that. Our dependencies, including the scala runtime, need to be listed in the manifest:

# Package jar
jar cvfm ScalaApp.jar manifest.txt -C out com
# Run the jar
java -jar ScalaApp.jar

Let’s Never do this Again!

All in all, it really isn’t that hard to compile and run scala programs manually. All you have to do is be vigilant with how your directories are set up… and make sure you’ve copied all your dependencies into the right place… and that you’ve listed them all in in the manifest…

Thankfully we have SBT and maven and a bunch of other tools to help manage dependencies and class paths. They also remember all the intricacies of the javac, scalac, java and scala commands for us. But under the hood, these tools are just following the steps we have up to this point, and I think it’s nice to know what’s going on.

The major drawback with packaging your jar files up manually is that we have to provide all our dependencies along with them, in the exact directory structure we specify in the manifest. This is where tools like SBT assembly are a massive help, as they allow us to create one fat jar which contains all of our application code along with all it’s dependencies (like the scala runtime).

This isn’t a simple task, remember each jar is actually a zipped up directory with its own package hierarchies and a manifest. Tools like SBT assembly “explode” each of the jars into a great big mess of class files and manifests, then merge all the manifests and classes, and finally zip the resulting megastructure into one easy to handle jar. This is a massively simplified overview - they have to deal with different versions of dependencies, and clashing names, among other things.

If everything goes well, you won’t get any conflicts - but if you do get problems with merging, hopefully you have at least an idea of what’s going on now.

Thanks for reading.