Control Flow in Sorbet is Syntactic

People always ask me, “Why does Sorbet think this is nil? I just checked that it’s not!” So much so, that it’s at the very top of the Sorbet FAQ

That doc answers what’s happening and how to fix it, but it doesn’t really answer why it behaves this way. A common follow up question looks something like this:

Having to use local variables as mentioned in Sorbet’s limitations of flow-sensitivity docs is annoying. Idiomatic Ruby doesn’t use local variables nearly as much as Sorbet requires. What gives?

TL;DR: Sorbet’s type inference algorithm requires being given a fixed data structure that models control flow inside a method. Type inference doesn’t get to change that structure, so the things Sorbet learns while from inference don’t retroactively change Sorbet’s view of control flow. (This is in line with the other popular type systems for dynamically typed languages.) As a result control flow must be a function of local syntax alone (variables), not global nor semantic information (methods).

But that’s packing a lot in at once, so let’s take a step back.

In this post whenever I say type inference I basically mean assigning types to variables, and using the types of variables to resolve calls to methods. Type inference in Sorbet needs two things:

Since type inference requires the control flow graph, clearly building the control flow graph can’t require type inference. Instead, it has to build a control flow graph using only the method’s abstract syntax tree (or AST). Since all Sorbet has is an AST, the control flow only reflects syntax-only observations, like “these two variables are the same” and “an if condition branches on the value of this variable.” Sorbet can draw these observations exclusively from the syntactic structure of the current method, with no need to consult the symbol table, let alone run inference.

This brings us to our central conflict: knowing which method (or methods!) a given call site resolves to is not a syntactic property. Consider this snippet:

if [true, false].sample
  x = 0
else
  x = nil
end

x.even?

The meaning of x.even? depends on the type of x, which depends on the earlier control flow in the method. That means that if a program branches on a method return value, Sorbet cannot draw any interesting observations about control flow.

This gets to be a problem for methods whose meaning involves some claim like, “I always return the same thing every time I’m called.” Sorbet can’t know whether x.foo refers to one of those constant methods or a method that returns a random number every time, so it has to assume the worst.

Here’s a pathological example:

class FooIsAttr
  sig {returns(T.nilable(Integer))}
  attr_accessor 
end

class FooIsMethod
  sig {returns(T.nilable(Integer))}
  def foo
    # Returns something different every call
    [0, nil].sample
  end
end

sig {params(Integer).void}
def takes_integer(x); end

# Have to run inference to get type of `x`
# (running inference requires control flow)
if [true, false].sample
  x = FooIsAttr.new
else
  x = FooIsMethod.new
end

# x.foo returns the same thing only if x is `FooIsAttr`
if x.foo
  takes_integer(x.foo) # error
end
→ View on sorbet.run

Note the two calls to x.foo at the very end of the snippet:

Properties and attributes in other languages

Unfortunately, this all means that Sorbet can only track control flow-sensitive types on variables, not methods. This is the exact same limitation that other popular gradual type checkers except for one difference: both JavaScript and Python make a syntactic distinction between method calls (which have parentheses) versus property/attribute access (which don’t):

x.foo   # <- syntactically a property (JS) or attribute (Python)
x.foo() # <- syntactically a method call

In Ruby, both x.foo and x.foo() correspond to method calls,This is true even if foo was defined with attr_reader :foo!

so Sorbet models them as such. But in TypeScript, Flow, and Mypy,And maybe other control-flow sensitive type systems, too. Feel free to send me more examples.

that small, syntactic difference is enough to allow treating properties and attributes different from methods.

→ View example in TypeScript Playground
→ View example in Try Flow
→ View example in mypy Playground

In all the above examples, we see that the type of variable.property is aware of control flow, the types of expression().property and variable.method() are not.

Unfortunately, the direct analogue to properties in Ruby are instance variables like @property, which have the limitation that they can can only be accessed inside their owning class. It’s like if JavaScript only allowed this.property instead of allowing the call site to be any arbitrary expression like x.property. In Ruby, you can’t write x.@property.You can do something similar: x.instance_variable_get(:@property), but again this is a method, not a property access—someone could have overridden the .instance_variable_get method!

If you do use instance variables in Ruby with Sorbet, they behave comparablyThere’s a known bug in the implementation at the time of writing, but it occurs somewhat rarely in practice so we haven’t prioritized fixing it.

to their counterparts in other languages:

→ View example on sorbet.run

Seen from this lens, I think it’s fair to say that Sorbet is doing the best it can with what it has. If you disagree and have a suggestion for how Sorbet could do better, feel free to reach out.

Extra thoughts

It’s maybe worth noting that even the Ruby VM itself cheats a little here: yes x.foo is technically a method call, but if that method was defined via attr_reader, the Ruby VM has special handling to make it run much, much faster than had the method been defined manually. So while you can think of these two things as doing the same thing, the first one will run much faster:

attr_reader 
def foo; @foo; end

I take this to mean that even the Ruby VM itself realizes that there is value in having something property like. It just unfortunately didn’t make it into the language itself.

It’s interesting to imagine a future where Sorbet treats x.foo and x.foo() separately. For example, it could require that non-constant, nullary methods be written with trailing () even though Ruby doesn’t require it. Then a follow up change might be able to build on that invariant, to treat x.foo like a property access instead of a method call.

But not only are there some high-level design and low-level technical problems standing in the way of implementing this right now, there’s also a social problem: almost every Ruby style guide and linter requires the opposite, namely that nullary methods never be called with () explicitly. Solving social problems tends to involve waging holy wars, which is never all that fun.

And to throw another wrench into the picture: recent versions of JavaScript added getters, which allow executing an arbitrary method on property access. Python has had computed @property declarations since version 2.2. Notably, TypeScript, Flow, and mypy simply do not implement getters the same way as methods, even though they arguably should for soundness:

→ View example on TypeScript Playground
→ View example on Try Flow
→ View example on mypy Playground

If it were not so common in Ruby for all nullary methods to be called without (), instead of just those defined with attr_reader or something similar, maybe Sorbet could have chosen the same trade-off.