Unit Testing with ScalaCheck

Table of contents

Reading Time: 5 minutes

Unit testing is essential to writing good code. Unit testing allows us to capture bugs as they are created, not long after they are deployed. A little time spent writing unit tests can save a lot of time debugging. If unit testing is not done it can become increasing more difficult and time consuming to fix bugs as code complexity and size increases. However, in almost all cases unit testing can never be complete. We can only write a finite number of tests and likely the function can take an infinite number of different inputs. What we would really like to test is if the code meets its specification, not just that the code gives the expected output for some finite number of inputs. The specification is a description of the program, usually given in a human language such as English. In some cases, we can use logic to confirm that the code meets its specification, but even then, we can make errors, so we would still like to use unit tests to confirm our logic. One way to test that the code is meeting its specification is to test its properties. A property of a code is something that is always true about it. For example, if we have a function that calculates the distance between two points in space, we expect that the distance will always be greater or equal to zero. This is a property of the code. If the code returns a negative number something is wrong. ScalaCheck gets close to testing a programs specification by getting close to testing its properties. ScalaCheck gets close to testing properties by generating a number of random inputs (100 by default) and checking if the output satisfies user defined properties. The example below shows code for finding the maximum of two numbers and an associated unit testing with ScalaCheck.

object MyMax {
   def myMax(x: Int, y: Int): Int = if (x > y) x else y
 }

import org.scalacheck.Properties
import org.scalacheck.Prop.forAll

class MaxScalaCheckTest extends Properties("Max") {

  property("max") = forAll { (x: Int, y: Int) =>
    val z = MyMax.myMax(x, y)
    (z == x || z == y) && (z >= x && z >= y)
  }

}

The associated build.sbt file is

name := "Max"

version := "0.1"

scalaVersion := "2.13.0"

libraryDependencies += "org.scalacheck" %% "scalacheck" % "1.14.0" % Test

Let’s look at the property. The first part (z == x || z ==y) means that myMax must always return one of the two integers that served as its inputs. If we passed in 87 and 123 and the function returned 793, something would be wrong. The second part of the property is (z >= x && z >=y). If x is larger than y than z should equal x and z >=x, because z == x and z >= y because z > y. If y is larger than x z >= x because y > x and z >= y because z == y. If x and y are equal than the property is also true. If the code has this property than code meets its specification. Every time the unit test is run it uses a different set of one hundred random pairs of integers. The numbers generated by ScalaCheck are not completely random. It checks the code using values that commonly break code such as 0, -1, 1, large positive numbers, and large negative numbers. ScalaCheck can produce many different types of random input, and the programmer has a lot of control over the random input. For example, ScalaCheck can generate different types of random chars.

property("GenCharExample") = forAll(numChar, //A random digit as a char
  alphaUpperChar, //A random upper case letter as char
  alphaLowerChar, //A random lower case letter as char
  alphaChar,  //A random lower or upper case letter as char
  alphaNumChar) { //A random lower or upper case letter or digit as char
  (s1, s2, s3, s4, s5) =>
    println(s1 + "\t" + s2 + "\t" + s3 + "\t" + s4 + "\t" + s5)
    true
}

ScalaCheck can generate random strings

property("GenStringExample") = forAll(numStr, //A random sequence of digits as a string
  alphaUpperStr, //A random sequence of upper case letters
  alphaLowerStr, //A random sequence of lower case letters
  alphaStr, //A random sequence of lower or upper case letters
  identifier) { //A random lower case letter followed by alphanumeric characters
  (s1, s2, s3, s4, s5) =>
    println(s1 + "\t" + s2 + "\t" + s3 + "\t" + s4 + "\t" + s5)
    true
}

ScalaCheck can generate only negative numbers or only positive numbers

property("GenNumExample") = forAll(negNum[Int], posNum[Int]) {
  (n, p) =>
    println(n + "\t" + p)
    n < p
}

Preconditions can be used to filter out random input that does not conform to the desired input. For example, the following code only uses integers that are divisible by three

property("GenPreconditionExample") = forAll { n: Int =>
  (n % 3 == 0) ==> {
    (n + 3) % 3 == 0
  }
}

Using preconditions can be dangerous, because if the precondition is rarely met the unit test may be aborted and reported as a failure. Therefore, preconditions should be avoided when possible. In the above example the random inputs could be multiplied by three, instead of using a precondition.

ScalaCheck can produce random lists.

property("GenListExample") = forAll(listOf(Int)) {
  xs => xs.length >= 0
}

You can write custom generators

val myGen = for {
  n <- choose(1, 50)
  m <- choose(n, 2 * n)
} yield (n, m)

property("MyGenExample") = forAll(myGen) {
  pair => pair._2 >= pair._1
}

You can also generate random dataframes for testing Spark code.

import com.holdenkarau.spark.testing.{DataframeGenerator, SharedSparkContext}
import org.apache.spark.sql.{SQLContext, SparkSession}
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}
import org.apache.spark.sql.functions.col
import org.scalacheck.Prop.forAll
import org.scalatest.FunSuite
import org.scalatest.prop.Checkers

class SparkExample extends FunSuite with SharedSparkContext with Checkers {
  override implicit def reuseContextIfPossible: Boolean = true

  test("schemas should be the same") {
    val sqlContext = new SQLContext(sc)

    val schema = StructType(List(
      StructField("id", IntegerType, nullable = true),
      StructField("name", StringType, nullable = true)
    ))

    val newSchema = StructType(List(StructField("name", StringType, nullable = true)))

    val dataframeGen = DataframeGenerator.arbitraryDataFrame(sqlContext, schema)

    val property = forAll(dataframeGen.arbitrary) {
      df => {
        val newDf = df.select(col("name"))
        newDf.schema == newSchema
      }
    }
    check(property)
  }
}

If you can’t come up with a property that covers all behaviors of the code, come up with a property that covers some property of the code. Can the output of the function ever be negative? If a function has an inverse apply the inverse to the output of the function and you should get your original input. If not, something is wrong. If you are testing an efficient implementation of an algorithm for which you have brute force method you are confident works, compare them. They should agree. If they don’t, something is wrong. You can also use ScalaCheck to make sure that refactoring code doesn’t fundamentally change how it works. You might know that a function is continuously increasing when one parameter is increased, run the function on pairs of inputs and check if this hold. I hope you use ScalaCheck to write better code. Happy coding.