Unit testing is essential to writing good code. Unit testing allows us to capture bugs as they are created, not long after they are deployed. A little time spent writing unit tests can save a lot of time debugging. If unit testing is not done it can become
object MyMax { def myMax(x: Int, y: Int): Int = if (x > y) x else y }
import org.scalacheck.Properties import org.scalacheck.Prop.forAll class MaxScalaCheckTest extends Properties("Max") { property("max") = forAll { (x: Int, y: Int) => val z = MyMax.myMax(x, y) (z == x || z == y) && (z >= x && z >= y) } }
The associated build.sbt file is
name := "Max" version := "0.1" scalaVersion := "2.13.0" libraryDependencies += "org.scalacheck" %% "scalacheck" % "1.14.0" % Test
Let’s look at the property. The first part (z == x || z ==y) means that myMax must always return one of the two integers that served as its inputs. If we passed in 87 and 123 and the function returned 793, something would be wrong. The second part of the property is (z >= x && z >=y). If x is larger than y than z should equal x and z >=x, because z == x and z >= y because z > y. If y is larger than x z >= x because y > x and z >= y because z == y. If x and y are equal than the property is also true. If the code has this property than code meets its specification. Every time the unit test is run it uses a different set of one hundred random pairs of integers. The numbers generated by ScalaCheck are not completely random. It checks the code using values that commonly break code such as 0, -1, 1, large positive numbers, and large negative numbers. ScalaCheck can produce many different types of random input, and the programmer has a lot of control over the random input. For example, ScalaCheck can generate different types of random chars.
property("GenCharExample") = forAll(numChar, //A random digit as a char alphaUpperChar, //A random upper case letter as char alphaLowerChar, //A random lower case letter as char alphaChar, //A random lower or upper case letter as char alphaNumChar) { //A random lower or upper case letter or digit as char (s1, s2, s3, s4, s5) => println(s1 + "\t" + s2 + "\t" + s3 + "\t" + s4 + "\t" + s5) true }
ScalaCheck can generate random strings
property("GenStringExample") = forAll(numStr, //A random sequence of digits as a string alphaUpperStr, //A random sequence of upper case letters alphaLowerStr, //A random sequence of lower case letters alphaStr, //A random sequence of lower or upper case letters identifier) { //A random lower case letter followed by alphanumeric characters (s1, s2, s3, s4, s5) => println(s1 + "\t" + s2 + "\t" + s3 + "\t" + s4 + "\t" + s5) true }
ScalaCheck can generate only negative numbers or only positive numbers
property("GenNumExample") = forAll(negNum[Int], posNum[Int]) { (n, p) => println(n + "\t" + p) n < p }
Preconditions can be used to filter out random input that does not conform to the desired input. For example, the following code only uses integers that are divisible by three
property("GenPreconditionExample") = forAll { n: Int => (n % 3 == 0) ==> { (n + 3) % 3 == 0 } }
Using preconditions can be dangerous, because if the precondition is rarely met the unit test may be aborted and reported as a failure. Therefore, preconditions should be avoided when possible. In the above example the random inputs could be multiplied by three, instead of using a precondition.
ScalaCheck can produce random lists.
property("GenListExample") = forAll(listOf(Int)) { xs => xs.length >= 0 }
You can write custom generators
val myGen = for { n <- choose(1, 50) m <- choose(n, 2 * n) } yield (n, m) property("MyGenExample") = forAll(myGen) { pair => pair._2 >= pair._1 }
You can also generate random dataframes for testing Spark code.
import com.holdenkarau.spark.testing.{DataframeGenerator, SharedSparkContext}
import org.apache.spark.sql.{SQLContext, SparkSession}
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}
import org.apache.spark.sql.functions.col
import org.scalacheck.Prop.forAll
import org.scalatest.FunSuite
import org.scalatest.prop.Checkers
class SparkExample extends FunSuite with SharedSparkContext with Checkers {
override implicit def reuseContextIfPossible: Boolean = true
test("schemas should be the same") {
val sqlContext = new SQLContext(sc)
val schema = StructType(List(
StructField("id", IntegerType, nullable = true),
StructField("name", StringType, nullable = true)
))
val newSchema = StructType(List(StructField("name", StringType, nullable = true)))
val dataframeGen = DataframeGenerator.arbitraryDataFrame(sqlContext, schema)
val property = forAll(dataframeGen.arbitrary) {
df => {
val newDf = df.select(col("name"))
newDf.schema == newSchema
}
}
check(property)
}
}
If you can’t come up with a property that covers all behaviors of the code, come up with a property that covers some property of the code. Can the output of the function ever be negative? If a function has an inverse apply the inverse to the output of the function and you should get your original input. If not, something is wrong. If you are testing an efficient implementation of an algorithm for which you have brute force method you are confident works, compare them. They should agree. If they don’t, something is wrong. You can also use ScalaCheck to make sure that refactoring code doesn’t fundamentally change how it works. You might know that a function is continuously increasing when one parameter is increased, run the function on pairs of inputs and check if this hold. I hope you use ScalaCheck to write better code. Happy coding.