Zach Ahn

Deobfuscated topological sort📝

I have a really hard time with algorithm examples that are in books or available online. They might have a bunch of single-letter variable names. Or maybe they rely on language-specific features like list comprehensions. Or maybe lines do multiple things at once. Or even worse, maybe it’s multiple list comprehensions in one line. Or the pseudocode uses 1-indexed array indexes. And what’s the difference between x++ and ++x? I haven’t had to think about that since my second college computer science class.

Maybe it’s just me.

Anyway, I was trying to figure out how to compute topological sorts from a graph. Below is a heavily annotated explanation of it. It’s in Ruby cause it’s the language I’m most familiar with, but I tried avoiding unique features and conventions. So hopefully it makes some sense.

Oh, I should mention that tsort‘s runtime is O(number of nodes) since each node gets processed once. Oh and Ruby has an implementation of topological sort in its standard library. It’s called tsort.

=begin
This implementation of topological-sort requires that the graph is represented
as a Hash with the form of { source => [destination1, destination2, ...] }.
Here's an example of one:

    a ---> b ---> c
     \
      \--> d

    {
      a => [b, d],  # Because a points to b and d
      b => [c],     # Because b points to c
      c => [],      # Because c doesn't point to anything
      d => [],      # Because d doesn't point to anything either
    }

It returns an array of topologically sorted nodes. If it encounters a cycle, it
returns an empty array
=end
def tsort(graph)
  seen = {}
=begin
This algorithm fills out the array from the end to the start. Optionally you can
append nodes to the end of an array, then flip the array right before returning
it.
=end
  possible_ordering = [nil] * graph.size
  current_position = graph.size - 1

=begin
Lambdas are basically functions... In Python, you can define methods in methods,
and those inner methods have access to the variables that were defined in the
outer one. You can't do that in Ruby, but you can basically do the same thing
with lambdas

Anyway. This "inner function" is an implementation of depth first search with a
couple variations. If there are any circular dependencies, it returns false.
Otherwise it returns true.
=end
  depth_first_search = lambda do |source|
=begin
The `seen` hash is often implemented with "colors", where { node => color }.
This is basically that too, but it gives us a bit of a shortcut.

    seen[source] = false  # This means that a search is in progress. If this
                          # `depth_first_search` function comes across a
                          # `seen[source] == false`, that means that there is a
                          # circular dependency
    seen[source] = true   # This means that we already ran DFS successfully for
                          # this node and can be ignored while we continue to
                          # compute the tsort

In other words, `seen` is being used for two purposes. We can easily split them
into two different variables which might be cleaner to understand but would
require a bit more code.

    current_search = []  # This is a stack where we can `push` and `pop` the
                         # current node.
    seen = {}            # The value of `seen` will always be true

If we went with that, the code might look something like this. Note though that
`current_search.include?(source)` runs at `O(N)`, so that would increase the
worse case runtime to `O(N^2)`.

    if seen[source]
      if current_search.include?(source)
        return false  # Because there's a cycle
      else
        return true   # Since we can skip this
      end
    end

    seen[source] = true
    current_search.push(source)

    # The loop ...

    # And replace the `seen[source] = true` after the loop with:
    current_search.pop
=end
    if seen.key?(source)
      return seen[source]
    end

    seen[source] = false

    graph[source].each do |dest|
      is_tsortable = depth_first_search.call(dest)

      if !is_tsortable
        return is_tsortable
      end
    end

    seen[source] = true
    possible_ordering[current_position] = source
    current_position -= 1

    return seen[source]
  end

  node_is_tsortable = []

  graph.each do |source, dests|
    is_tsortable = depth_first_search.call(source)
    node_is_tsortable.push(is_tsortable)
  end

  if node_is_tsortable.all?
    possible_ordering
  else
    []
  end
end

I’m curious if this is helpful. Perhaps I’ll try to do some more.

Side note, this is un-code-golfed, but it isn’t code-bowling. What’s a sport that aims for “the middle”? Curling? Is this code curling?

Posted on 2020-06-21 11:10 PM -0400