Next Steps with Python Type System

[s01e02] Useful tips included

Paweł Święcki
Daftcode Blog

--

Illustration by Magdalena Tomczyk

This is a follow-up post about Python typing system. The first part, describing typing basics, can be found here. In this one, I will show some more advanced features of Python’s typing. Also, I will include a number of tips about the usage of particular typing’s features and a short guide on how to introduce typing into your codebase.

1. Constraining types

In the previous blog post I described Optional type. Let’s get back to the snippet showing its usage:

The error reported by mypy type checker is indeed correct and useful. But what if you really know that in the current context get_user_id() will return an int and you just want to pass it to process_user_id()? First, think if the structure of your program isn’t convoluted and needs refactoring. You still want to do it? Well, we need to somehow inform mypy that type has changed. In our case this change is in fact constraining (limiting): from Optional[int] (i.e. Union[int, None]) to just int. First, let’s try probably the most obvious way to achieve it.

1.1. Type constraining with new type annotation [INCORRECT]

The simplest approach seems to be annotating a variable with a more strict type.

However, we cannot do that. Why? Because type annotating does not force types on variables, it informs about types. If there is any inconsistency, type checker will report it. In fact, if this approach was correct the whole type checking idea would collapse.

The type checking idea would collapse, especially when a new type wouldn’t be a subtype of old type (like int and str). We may imagine a hypothetical situation, where mypy would accept type constraining (changing type from more general one to less general one) by using an annotation. In our case it would be constraining from Union[int, None] to int. However, currently it’s not supported.

There are at least two correct ways to inform mypy type checker of a type different than expected.

1.2. Type constraining with type checking [CORRECT]

A correct way to change the type would be by ensuring that isinstance returns True for a new type:

Now mypy is sure that user_id has the correct type — otherwise, the call to process_user_id would not be executed.

Mind that using isinstance introduces small runtime overhead. As a bonus, we get additional runtime checks, which may be useful.

1.3. Type constraining with type casting [CORRECT]

Another correct way of telling mypy that the type is constrained (or changed in some other way) is using cast function. This one is explicitly described in PEP 484.

This function is, as we can see, defined in the typing module. Typing should not have any impact on runtime and this function keeps that promise (aside from the bare function call) — in Python’s source it’s defined as an identity function (I removed the docstring):

def cast(typ, val):
return val

So no runtime checks are performed. When cast is used it’s an instruction for type checker to blindly believe the new type.

Take note, that using cast may mask bugs: the previous type — which can be correct — is simply ignored. So, in a way, it works similarly to Any and # type: ignore (see below). Therefore use it with caution. (Thanks robin-gvx for pointing that out.)

2. Combining types and defining type aliases

Python’s types can be freely combined. Want a list of integers or floats or strings or Nones? Just use:

List[Union[int, float, str, None]]

or

List[Optional[Union[int, float, str]]], whatever suits you.

What about a tuple composed of a string and a list of tuples composed of an integer and a list of tuples composed of an integer, a string and a list of strings? 😅 Also pretty straightforward:

Tuple[str, List[Tuple[int, List[Tuple[int, str, List[str]]]]]].

Looks like pure fun! Not really? As a matter of fact, we sometimes use these complex data types in our programs. How to use type annotations with them and don’t lose our minds?

To make types more manageable and readable use type aliases. To create an alias just assign a type to a variable:alias = T. Now we can use alias instead of T.

The key here is to name the aliases properly. Making aliases that reflect the structure of the named type, likeListOfListsOfDictsFromStrToIntOrFloat, most of the time, doesn’t really make sense. Use names that reflect “business objects” inside. Also, nest aliases accordingly. Like this:

Using NamedTuple for defining Item, Order and Shipment would increase the readability of our code even more. Also, in a real-life code we would probably use custom-defined classes instead. Typing aliases could still be useful, though.

Shipment does look much better than Tuple[str, List[Tuple[int, List[Tuple[int, str, List[str]]]]]], doesn’t it? There is much less typing, too 😎

Now our code can be annotated with business-aware types, like this:

The code is cleaner and business logic at the front. Also, errors — in both type annotations and types themselves — are easier to catch.

3. NewType function

In the last section, we saw assigning a simple type to an alias, like ItemId = int. Even this simple alias can make sense since it indicates “sense” of this particular integer. It doesn’t protect us from the following mistakes, though:

Mypy is happy, IDE is happy and we are happy. Let’s just hope the person doing the code review will catch the error!

To prevent this we can define additional subclasses (subtypes) inheriting directly from int:

All good, the error was found. Unfortunately, passing value through additional constructor introduces runtime overhead. It’s especially painful when we have to process many instances. To deal with it, typing module has the NewType function. It’s used to define a distinct subtype:

NewType simply returns an identity function, so no subclass is defined in the runtime. Also, this introduces only minimal overhead. See the source code here.

Using types defined with NewType we can add additional type “tracking” throughout our code. It may be handy in security context — e.g. to distinguish between safe and (potentially) unsafe strings.

Mind that getting the value of `user_provided_string` may be far, far away from the call to `exec_code` so it could be hard to catch that it’s unsafe without the help of mypy.

In runtime passing value through a NewTypeSafeStr('2 + 2') in our case (line 5) — has almost no overhead and changes really nothing. For mypy it works like cast(SafeStr, '2 + 2').

4. Callable type

So far, we defined argument types and return types of functions. But what if we want to pass a function itself into another function (in Python functions are first-class objects, so it can be done just like that) — how would we express the type of the passed function?

Let’s digress for a second… In Python’s typing system functions have types just like any other kinds of values (remember: a function is a first-class object, just like integer or string). It may be surprising at first, but it’s natural if you think about it. For instance, type of a function annotated like this:

def fun(arg: str) -> int: ...

can be thought of as “str to int”. In fact, function’s type is, in some way, very similar to Dict type, which “maps” one value to another; like Dict[str, int] maps strs to ints. From a typing point of view, a function is more complex — there can be none, one or multiple arguments and dict has only one key. However, the very mapping idea is the same.

In Python to describe types of functions (and other callables) Callable type is used. It’s defined as follows:

Callable[[t1, t2, …, tn], tr]. A function with positional argument types t1 etc., and return type tr. [source]

Let’s see:

apply_function_on_value takes a function func as its first parameter and applies it on its second parameter, value. This function’s type is “str to int”, or Callable[[str], int]. So passing text_length function (which is just Python’s len defined on strings) to it is correct because func is defined as taking a str and returning an int.

append_parrot cannot be correctly passed to apply_function_on_value, since it returns a str, not an int, which is not compatible with funcs type.

5. Any type and turning off mypy checks

Python’s typing rules are quite strict but Python provides a loophole for cases you don’t want mypy to complain about a type. This loophole is Any type. Any is consistent with every type and every type is consistent with Any.

[W]hen a value has type Any, the type checker will allow all operations on it, and a value of type Any can be assigned to a variable (or used as a return value) of a more constrained type. [source]

Let’s prove it 😏

As docs state:

Any can be considered a type that has all values and all methods. Combined with the definition of subtyping above, this places Any partially at the top (it has all values) and bottom (it has all methods) of the type hierarchy. [source]

(Compare it with subtyping definition in paragraph 3 of my previous blog post about typing.) It can be illustrated like this:

Mind that relation between Any and other types, strictly speaking, is not subtyping relationship but being-consistent-with relationship. For formal definition and more context see here.

In practice, Any simply turns off mypy checks for the item it’s annotating. A different, more brutal, way to disable mypy checks is to use # type: ignore. Use it on the line for which you want to have mypy errors silenced:

Using Any should be as rare as possible, while using # type: ignore should be the last resort. Especially when you are taking the whole typing thing seriously.

You definitely should not use them when:

  • You want to postpone annotating something. If you don’t want to add a type at the moment, just leave the thing untyped. Python typing is completely optional — you can freely annotate one part of the code and leave the other not annotated.
  • You don’t understand mypy’s error message and just want to get rid of it. Sometimes it can be cryptic or too generic, indeed, but it’s really worthwhile to get to the bottom of the issue. In my experience, almost every time mypy is onto something.

Where Any might be useful?

  • Use Any when you don’t really know what the type of a variable will be. The typical case is data created from an externally provided JSON. What is the type of the variable created this way? Is it a list, dict, int, float, bool or None? It might as well be Any. Mind that properly defining JSON format using nested types (Dict, List, etc.) is impossible (or at least very difficult) due to recursive structure of JSON.

Where # type: ignore might be useful?

  • When mypy doesn’t really know what’s going on. E.g. something is changed in the runtime — like a method is dynamically added to a class or class hierarchy is changed. If the mypy’s error is really inadequate and you cannot think of another way to handle this, use # type: ignore. But first, try to understand why mypy is complaining. Mypy is, in fact, a pretty good code-smell detector 👍
  • When something is not yet compatible with mypy (like enums, which were not supported at the beginning).
  • There is a bug in mypy. Remember to report it on github!

Q&A

If you are not yet convinced to start using typing in Python, read on.

I like my Python code duck-typed and dynamic. Typing will ruin that. Amirite?

First, it’s true that mypy won’t understand some runtime hacks, which are related to the dynamic nature of the language. However, in return, you get a more reliable code.

Second, Python typing system supports protocols (aka “duck types”), both built-in as well as user-defined (this topic was not covered here). So no worries here.

Frankly, I don’t like this whole typing thing. Isn’t it slowly turning Python into Java?

Don’t worry, unlike in Java, typing in Python:

  • is completely optional (*),
  • is not affecting the runtime (**).

(*) Okay, not really, dataclasses force you to type them. This is, hopefully, an anomaly.

(**) Okay, not really… First, imports from typing are made. Second, some values are — in case of cast and NewType—passed through identity functions. Third, generic types (which were not covered in this blogpost) use a custom metaclass, which may conflict with a user-defined metaclass; it’s no longer a problem in Python 3.7.

Right, but just look at typed Python code: more similar to Java than to good old Python.

It may be the first impression, but did you really saw an enterprise Java codebase? 🙃 I’m not denying that typed Python code looks different then untyped one and you need to get used to reading it. But when you do — and the code itself is typed in a smart way (e.g. by using aliases) — it can become even more readable and understandable than before adding types. And it’s still Pythonic, as it adds to the “explicitness” and readability of the code.

Okay, let’s try this… How do I begin using type annotation in my codebase?

I recommend the following path:

  1. Start with running mypy on your untyped code and see what happens. You might be surprised that it already understands so much. (You might want to run it with --ignore-missing-imports and --follow-imports skip parameters at first.)
  2. Fix all initial mypy errors.
  3. Now just start adding type annotations to your code. You can begin anywhere, but I think, it’s best to start with the most important code. This way, your codebase will become more reliable earlier. Don’t worry that a wrong type of annotation will break your code: typing is designed to affect Python’s runtime as minimally as possible.
  4. Keep adding type annotations. After some time, adding type annotations will start to pay off. You may get more and more mypy errors that catch real bugs. (Just don’t forget to run mypy from time to time.)
  5. Next step is adding mypy checks to CI pipeline (and maybe to pre-commit hooks) and make your code more reliable all the time.

Generally, if you are committed to typing, it’s a good idea to annotate all new code. Another approach is to skip the annotating phase when you just want to quickly prototype a functionality. When you know the code will stay, you can add type annotations then.

You can also try to start by using an automated tool for adding type annotations. pyannotate library gets type information from observing actual types at runtime. Similarly, pytest-annotate adds type annotations by running tests and observing types. Also, pytype checker can add annotations based on static analysis of code. I don’t have an experience using those tools, but I think it’s a good idea to carefully verify annotations given by any of them. Studying those annotations may even lead to discovering some bugs in your code.

What to do when I don’t have time to understand a mypy error?

Find the time, it may pay off. Don’t use # type: ignore, but try to understand the problem first (sometimes searching on mypy issue page is the only way), and use it only if you are sure you don't have any good alternative. Mypy might, and probably is, onto something.

On the other hand, don’t try to please mypy no matter the cost. The rule of thumb is: don’t make your code worse just to silence mypy error. It’s still only a tool.

Remember that mypy is in constant development. Its error messages are getting better and false positives (which are discovered from time to time) are getting eliminated. So remember to always update mypy when the new version comes out.

Would you like to dig deeper into Python type system? You can! I’ve written the second typing blog post series 🤩 It’s devoted to subtyping relationships between complex types. Check it out, if want to know what “contravariance” is and why knowing that helps in writing safer code:

If you enjoyed this post, please hit the clap button below 👏👏👏

You can also follow us on Facebook, Twitter and LinkedIn.

--

--