/ EXERCISE, PROGRAMMING, STYLE

Exercises in Programming Style: spreadsheets

Last week, we solved the top 25 word frequencies problem with the help of the database. This week, we will get back to solve it with code alone. However, the model of the solution will be designed as a spreadsheet.

This is the 15th post in the Exercises in Programming Style focus series.Other posts include:

  1. Introducing Exercises in Programming Style
  2. Exercises in Programming Style, stacking things up
  3. Exercises in Programming Style, Kwisatz Haderach-style
  4. Exercises in Programming Style, recursion
  5. Exercises in Programming Style with higher-order functions
  6. Composing Exercises in Programming Style
  7. Exercises in Programming Style, back to Object-Oriented Programming
  8. Exercises in Programming Style: maps are objects too
  9. Exercises in Programming Style: Event-Driven Programming
  10. Exercises in Programming Style and the Event Bus
  11. Reflecting over Exercises in Programming Style
  12. Exercises in Aspect-Oriented Programming Style
  13. Exercises in Programming Style: FP & I/O
  14. Exercises in Relational Database Style
  15. Exercises in Programming Style: spreadsheets (this post)
  16. Exercises in Concurrent Programming Style
  17. Exercises in Programming Style: sharing data among threads
  18. Exercises in Programming Style with Hazelcast
  19. Exercises in MapReduce Style
  20. Conclusion of Exercises in Programming Style

The principle behind the spreadsheet’s model

The spreadsheet holds a number of cells, each cell having both a value and a formula. Just like in regular spreadsheet, the formula is a function that might reference another cell’s value, and computes the current cell’s value.

Here the list of candidate cells, as well as their description:

Cell Type Formula

allWords

List<String>

read(filename).flatMap {
    it.toLowerCase()
      .split("\\W|_".toRegex())
  }.filter {
    it.isNotBlank() && it.length >= 2
  }

stopWords

List<String>

read("stop_words.txt")
  .flatMap { it.split(",") }

nonStopWords

List<String>

allWords.value.filter {
  !stopWords.value.contains(it)
}

uniqueWords

List<String>

nonStopWords.value.distinct()

counts

List<Int>

uniqueWords.value.map { unique ->
  allWords.value.count { it == unique }
}

sortedData

List<Pair<String, Int>>

uniqueWords.value.zip(counts.value)
  .sortedByDescending {
    it.formula as Int
  }

The next step is to get all above cells into a collection, and to set the value sequentially according to the formulas.

data class Cell<T, V>(var value: T, var formula: V)                (1) (3)
typealias Spreadsheet = List<Cell<List<Any>, () -> List<Any>>>     (2) (3)

val spreadsheet: Spreadsheet =
  listOf(allWords, stopWords, nonStopWords, uniqueWords, counts, sortedData)

update(spreadsheet)

private fun update(cells: Spreadsheet) {
  cells.forEach {
    it.value = it.formula()                                        (4)
  }
}
1 Pair is immutable in Kotlin, so we need a custom class to model a cell
2 Kotlin’s type alias allows to give a name to a collection with generics, without creating a full-fledged type
3 A cell is named a "column" in the Python version. I’ve updated the semantics to better reflect the modeling of a spreadsheet: "a column" become "a cell", and "a collection of cells" becomes "a spreadsheet"
4 Update the cell’s value by invoking the cell’s function

Towards immutability

To prove once more that immutability can (should?) be used regardless of the chosen programming paradigm, let’s migrate to recursion and immutable data structures.

The main step is to drastically change the updating function:

private fun update(cells: Spreadsheet) {
  cells.forEach {
    it.value = it.formula()                                             (1)
  }
}

private fun update(columns: Spreadsheet): Spreadsheet {                 (2)
  tailrec fun recurseUpdate(todo: Spreadsheet, acc: Spreadsheet): Spreadsheet { (3)
    return if (todo.isEmpty()) acc
    else {
      val column = todo.first()
      val f = column.second
      if (f is () -> List<Any>)                                         (4)
          recurseUpdate(todo.takeLast(todo.size - 1), acc + (f() to f)) (5)
      else {
          val g = f as (List<Any>) -> List<Any>
          recurseUpdate(todo.takeLast(todo.size - 1),
                        acc + (g(acc.last().first) to f))               (6)
      }
    }
  }
  return recurseUpdate(columns, arrayListOf())
}

typealias Spreadsheet = List<Pair<List<Any>, Function<List<Any>>>>      (7)
1 Mutable version: the cell’s value is changed
2 Immutable version: the function returns the spreadsheet
3 Use tail-recursion for better performance
4 Check whether there are any argument to the function.
5 If the function doesn’t accepts any argument e.g. read the stop words, then just call the formula and return a new data structure with the updated cell’s value
6 If the function accepts an argument, then get the required argument from the last cell’s value. This obviously is a huge constraint, but it works in our case.
7 The typealias needs to be updated as well

At this point, one can change the code to use this new version:

val spreadsheet: Spreadsheet = listOf(allWords, nonStopWords, counts, sortedData)

return (update(spreadsheet)
  .last()
  .first
  .take(25) as List<Pair<String, Int>>)
  .toMap()

Conclusion

Modeling the problem as a spreadsheet is an interesting twist.

Note that, as opposed to regular spreadsheet software, one needs to explicitly update the values: there’s no implementation of the Observer pattern to update the value whenever a dependent value changes.

I believe the immutable version, though it’s a bit more complex to manage, allows better unit testing of each individual function.

The complete source code for this post can be found on Github.
Nicolas Fränkel

Nicolas Fränkel

Developer Advocate with 15+ years experience consulting for many different customers, in a wide range of contexts (such as telecoms, banking, insurances, large retail and public sector). Usually working on Java/Java EE and Spring technologies, but with focused interests like Rich Internet Applications, Testing, CI/CD and DevOps. Also double as a trainer and triples as a book author.

Read More
Exercises in Programming Style: spreadsheets
Share this