Swift ExpressibleBy protocols: What they are and how they work internally in the compiler

Swift ExpressibleBy protocols: What they are and how they work internally in the compiler

ExpressibleBy represents a series of protocols in the Swift Standard library that allows you to instantiate objects directly from token literals, like a string, a number, a floating-point and so on, if the object can be "expressed" like that. For example, here's the regular way of creating an URL in Swift:

func getURL() -> URL
    return URL(string: "https://swiftrocks.com")!
}

However, to prevent having to use this initializer everytime, you could say that it's possible to represent an URL directly from its URL string using ExpressibleByStringLiteral:

extension URL: ExpressibleByStringLiteral {
    public init(extendedGraphemeClusterLiteral value: String) {
        self = URL(string: value)!
    }

    public init(stringLiteral value: String) {
        self = URL(string: value)!
    }
}

This allows us to refactor getURL() to create an URL using nothing else but a string token:

func getURL() -> URL
    return "https://swiftrocks.com"
}

The standard library contains the following ExpressibleBy protocols:

* ExpressibleByNilLiteral: Expressible by nil.
* ExpressibleByIntegerLiteral: Expressible by a number token like 10.
* ExpressibleByFloatLiteral: Expressible by a floating-point token like 2.5.
* ExpressibleByBooleanLiteral: Expressible by true/false.
* ExpressibleByUnicodeScalarLiteral: Expressible from a single unicode scalar. Usage examples of this are Character and String.
* ExpressibleByExtendedGraphemeClusterLiteral: Similar to UnicodeScalar, but consists of a chain of scalars (a grapheme cluster) instead of a single one.
* ExpressibleByStringLiteral: Expressible by a string token like "SwiftRocks".
* ExpressibleByArrayLiteral: Expressible by an array token like [1,2,3].
* ExpressibleByDictionaryLiteral: Expressible by a dictionary token like ["name": "SwiftRocks"].

To make it short, you can use these protocols to hide unnecessary implementation details and possibly ugly initializers of your more complex types. An example use case is how Apple's SourceKit-LSP uses them to represent arbitrary arguments -- because the Any type does not conform to Codable, a CommandArgumentType enum is used to represent unknown arguments:

public enum CommandArgumentType: Hashable, ResponseType {
  case null
  case int(Int)
  case bool(Bool)
  case double(Double)
  case string(String)
  case array([CommandArgumentType])
  case dictionary([String: CommandArgumentType])
}

However, because we're dealing with an enum, representing an argument will result in not-so-pretty lines of code:

func getCommandArguments() -> CommandArgumentType {
    return .dictionary(["line": .int(2),
                        "column": .int(1),
                        "name": .string("refactor"),
                        "args": .array([.string("file://a.swift"), .string("open")])])
}

Fortunately, we can use ExpressibleBy to provide better looking alternatives to the enum:

extension CommandArgumentType: ExpressibleByNilLiteral {
  public init(nilLiteral _: ()) {
    self = .null
  }
}

extension CommandArgumentType: ExpressibleByIntegerLiteral {
  public init(integerLiteral value: Int) {
    self = .int(value)
  }
}

extension CommandArgumentType: ExpressibleByBooleanLiteral {
  public init(booleanLiteral value: Bool) {
    self = .bool(value)
  }
}

extension CommandArgumentType: ExpressibleByFloatLiteral {
  public init(floatLiteral value: Double) {
    self = .double(value)
  }
}

extension CommandArgumentType: ExpressibleByStringLiteral {
  public init(extendedGraphemeClusterLiteral value: String) {
    self = .string(value)
  }

  public init(stringLiteral value: String) {
    self = .string(value)
  }
}

extension CommandArgumentType: ExpressibleByArrayLiteral {
  public init(arrayLiteral elements: CommandArgumentType...) {
    self = .array(elements)
  }
}

extension CommandArgumentType: ExpressibleByDictionaryLiteral {
  public init(dictionaryLiteral elements: (String, CommandArgumentType)...) {
    let dict  = [String: CommandArgumentType](elements, uniquingKeysWith: { first, _ in first })
    self = .dictionary(dict)
  }
}

Which allows us to rewrite getCommandArguments() with easier to read tokens.

func getCommandArguments() -> CommandArgumentType {
    return ["line": 2,
            "column": 1,
            "name": "refactor",
            "args": ["file://a.swift", "open"]]
}

How it works internally

But how can a token become a full type? As with all compiler magic, we can uncover what's going on by intercepting Swift's compilation steps.

Using the first getURL() method as an example, let's first see how Swift treats ExpressibleBy objects. If we compile the code manually using -emit-sil argument, we can extract the Swift Intermediate Language (SIL) version of the code -- the final compilation step in Swift before LLVM takes the wheel.

swiftc -emit-sil geturl.swift

The output, which I edited to make it easier to read, looks like this:

sil hidden @$s3bla6getURL10Foundation0C0VyF : $@convention(thin) () -> @out URL {
bb0(%0 : $*URL):
  %1 = string_literal utf8 "https://swiftrocks.com"
  // removed: creating a String type from the string_literal
  // function_ref URL.init(stringLiteral:)
  %8 = function_ref @$s10Foundation3URLV3blaE13stringLiteralACSS_tcfC : $@convention(method) (@owned String, @thin URL.Type) -> @out URL
  %9 = apply %8(%0, %6, %7) : $@convention(method) (@owned String, @thin URL.Type) -> @out URL
  %10 = tuple ()
  return %10 : $()
} // end sil function '$s3bla6getURL10Foundation0C0VyF'

Here's what the method is doing:

1: Create a string_literal token
2: Create a String type from the literal
3: Call URL.init(stringLiteral:) with the String
4: Return the URL

As one would expect, the compiler achieves this magic by replacing the String line of code with the relevant ExpressibleBy initializer. Hooray for compiler magic!

Now, to locate where this happens in the compiler, we can grep the Swift source for mentions of "ExpressibleBy", which will point us to several places inside CSApply.cpp. In short, all usages of literals get converted to their ExpressibleBy equivalent, including the "expressibles that are literals themselves" (for example, an Int is itself an ExpressibleByIntegerLiteral). When Swift's type-checker reaches a literal, it gets a hold of an instance of the relevant protocol type and the name of the initializer, which can be determined from the literal we're looking at:

Expr *visitNilLiteralExpr(NilLiteralExpr *expr) {
  auto type = simplifyType(cs.getType(expr));
  auto &tc = cs.getTypeChecker();
  auto *protocol = tc.getProtocol(expr->getLoc(),
                                  KnownProtocolKind::ExpressibleByNilLiteral);
   DeclName initName(tc.Context, DeclBaseName::createConstructor(),
                     { tc.Context.Id_nilLiteral });
 //...
}

With that info in hand, the type-checker calls convertLiteralInPlace to replace the full expression with the equivalent ExpressibleBy initializer. The method itself does a lot of stuff, but there's something interesting to note here: If we take a look at KnownProtocols.def, we can see that all literals have default types:

EXPRESSIBLE_BY_LITERAL_PROTOCOL(ExpressibleByArrayLiteral, "Array", false)
EXPRESSIBLE_BY_LITERAL_PROTOCOL(ExpressibleByBooleanLiteral, "BooleanLiteralType", true)
EXPRESSIBLE_BY_LITERAL_PROTOCOL(ExpressibleByDictionaryLiteral, "Dictionary", false)
EXPRESSIBLE_BY_LITERAL_PROTOCOL(ExpressibleByExtendedGraphemeClusterLiteral, "ExtendedGraphemeClusterType", true)
EXPRESSIBLE_BY_LITERAL_PROTOCOL(ExpressibleByFloatLiteral, "FloatLiteralType", true)
EXPRESSIBLE_BY_LITERAL_PROTOCOL(ExpressibleByIntegerLiteral, "IntegerLiteralType", true)
EXPRESSIBLE_BY_LITERAL_PROTOCOL(ExpressibleByStringInterpolation, "StringLiteralType", true)
EXPRESSIBLE_BY_LITERAL_PROTOCOL(ExpressibleByStringLiteral, "StringLiteralType", true)
EXPRESSIBLE_BY_LITERAL_PROTOCOL(ExpressibleByNilLiteral, nullptr, false)
EXPRESSIBLE_BY_LITERAL_PROTOCOL(ExpressibleByUnicodeScalarLiteral, "UnicodeScalarType", true)

This means that if the expression has no type or has a type that doesn't conform to the protocol, the literal's true type will be assigned to the default's type conformance instead. For example, if I removed the conformance for getURL(), the SIL code will reveal that the internal String initializer is used instead:

func getURL() -> URL {
    return String.init(_builtinStringLiteral: "https://swiftrocks.com")
}

This not only allows you to write untyped expressions like let foo = "bar", but it also serves for UI reasons - thanks to that, in a later pass the previous getURL() example will result in our user-friendly Cannot convert value of type 'String' to specified type 'URL' compilation error.

Follow me on my Twitter (@rockbruno_), and let me know of any suggestions and corrections you want to share.

References and Good reads

The Swift Source Code