Next Steps with Python Type System
[s01e02] Useful tips included
This is a follow-up post about Python typing system. The first part, describing typing basics, can be found here. In this one, I will show some more advanced features of Python’s typing. Also, I will include a number of tips about the usage of particular typing’s features and a short guide on how to introduce typing into your codebase.
1. Constraining types
In the previous blog post I described Optional
type. Let’s get back to the snippet showing its usage:
The error reported by mypy type checker is indeed correct and useful. But what if you really know that in the current context get_user_id()
will return an int
and you just want to pass it to process_user_id()
? First, think if the structure of your program isn’t convoluted and needs refactoring. You still want to do it? Well, we need to somehow inform mypy that type has changed. In our case this change is in fact constraining (limiting): from Optional[int]
(i.e. Union[int, None]
) to just int
. First, let’s try probably the most obvious way to achieve it.
1.1. Type constraining with new type annotation [INCORRECT]
The simplest approach seems to be annotating a variable with a more strict type.
However, we cannot do that. Why? Because type annotating does not force types on variables, it informs about types. If there is any inconsistency, type checker will report it. In fact, if this approach was correct the whole type checking idea would collapse.
The type checking idea would collapse, especially when a new type wouldn’t be a subtype of old type (like int
and str
). We may imagine a hypothetical situation, where mypy would accept type constraining (changing type from more general one to less general one) by using an annotation. In our case it would be constraining from Union[int, None]
to int
. However, currently it’s not supported.
There are at least two correct ways to inform mypy type checker of a type different than expected.
1.2. Type constraining with type checking [CORRECT]
A correct way to change the type would be by ensuring that isinstance
returns True
for a new type:
Now mypy is sure that user_id
has the correct type — otherwise, the call to process_user_id
would not be executed.
Mind that using isinstance
introduces small runtime overhead. As a bonus, we get additional runtime checks, which may be useful.
1.3. Type constraining with type casting [CORRECT]
Another correct way of telling mypy that the type is constrained (or changed in some other way) is using cast
function. This one is explicitly described in PEP 484.
This function is, as we can see, defined in the typing
module. Typing should not have any impact on runtime and this function keeps that promise (aside from the bare function call) — in Python’s source it’s defined as an identity function (I removed the docstring):
def cast(typ, val):
return val
So no runtime checks are performed. When cast
is used it’s an instruction for type checker to blindly believe the new type.
Take note, that using cast
may mask bugs: the previous type — which can be correct — is simply ignored. So, in a way, it works similarly to Any
and # type: ignore
(see below). Therefore use it with caution. (Thanks robin-gvx for pointing that out.)
2. Combining types and defining type aliases
Python’s types can be freely combined. Want a list of integers or floats or strings or Nones? Just use:
List[Union[int, float, str, None]]
or
List[Optional[Union[int, float, str]]]
, whatever suits you.
What about a tuple composed of a string and a list of tuples composed of an integer and a list of tuples composed of an integer, a string and a list of strings? 😅 Also pretty straightforward:
Tuple[str, List[Tuple[int, List[Tuple[int, str, List[str]]]]]]
.
Looks like pure fun! Not really? As a matter of fact, we sometimes use these complex data types in our programs. How to use type annotations with them and don’t lose our minds?
To make types more manageable and readable use type aliases. To create an alias just assign a type to a variable:alias = T
. Now we can use alias
instead of T
.
The key here is to name the aliases properly. Making aliases that reflect the structure of the named type, likeListOfListsOfDictsFromStrToIntOrFloat
, most of the time, doesn’t really make sense. Use names that reflect “business objects” inside. Also, nest aliases accordingly. Like this:
Shipment
does look much better than Tuple[str, List[Tuple[int, List[Tuple[int, str, List[str]]]]]]
, doesn’t it? There is much less typing, too 😎
Now our code can be annotated with business-aware types, like this:
The code is cleaner and business logic at the front. Also, errors — in both type annotations and types themselves — are easier to catch.
3. NewType function
In the last section, we saw assigning a simple type to an alias, like ItemId = int
. Even this simple alias can make sense since it indicates “sense” of this particular integer. It doesn’t protect us from the following mistakes, though:
Mypy is happy, IDE is happy and we are happy. Let’s just hope the person doing the code review will catch the error!
To prevent this we can define additional subclasses (subtypes) inheriting directly from int
:
All good, the error was found. Unfortunately, passing value through additional constructor introduces runtime overhead. It’s especially painful when we have to process many instances. To deal with it, typing
module has the NewType
function. It’s used to define a distinct subtype:
NewType
simply returns an identity function, so no subclass is defined in the runtime. Also, this introduces only minimal overhead. See the source code here.
Using types defined with NewType
we can add additional type “tracking” throughout our code. It may be handy in security context — e.g. to distinguish between safe and (potentially) unsafe strings.
In runtime passing value through a NewType
— SafeStr('2 + 2')
in our case (line 5) — has almost no overhead and changes really nothing. For mypy it works like cast(SafeStr, '2 + 2')
.
4. Callable type
So far, we defined argument types and return types of functions. But what if we want to pass a function itself into another function (in Python functions are first-class objects, so it can be done just like that) — how would we express the type of the passed function?
Let’s digress for a second… In Python’s typing system functions have types just like any other kinds of values (remember: a function is a first-class object, just like integer or string). It may be surprising at first, but it’s natural if you think about it. For instance, type of a function annotated like this:
def fun(arg: str) -> int: ...
can be thought of as “str
to int
”. In fact, function’s type is, in some way, very similar to Dict
type, which “maps” one value to another; like Dict[str, int]
maps str
s to int
s. From a typing point of view, a function is more complex — there can be none, one or multiple arguments and dict has only one key. However, the very mapping idea is the same.
In Python to describe types of functions (and other callables) Callable
type is used. It’s defined as follows:
Callable[[t1, t2, …, tn], tr]
. A function with positional argument typest1
etc., and return typetr
. [source]
Let’s see:
apply_function_on_value
takes a function func
as its first parameter and applies it on its second parameter, value
. This function’s type is “str
to int
”, or Callable[[str], int]
. So passing text_length
function (which is just Python’s len
defined on strings) to it is correct because func
is defined as taking a str
and returning an int
.
append_parrot
cannot be correctly passed to apply_function_on_value
, since it returns a str
, not an int
, which is not compatible with func
s type.
5. Any type and turning off mypy checks
Python’s typing rules are quite strict but Python provides a loophole for cases you don’t want mypy to complain about a type. This loophole is Any
type. Any
is consistent with every type and every type is consistent with Any
.
[W]hen a value has type Any, the type checker will allow all operations on it, and a value of type Any can be assigned to a variable (or used as a return value) of a more constrained type. [source]
Let’s prove it 😏
As docs state:
Any
can be considered a type that has all values and all methods. Combined with the definition of subtyping above, this placesAny
partially at the top (it has all values) and bottom (it has all methods) of the type hierarchy. [source]
(Compare it with subtyping definition in paragraph 3 of my previous blog post about typing.) It can be illustrated like this:
In practice, Any
simply turns off mypy checks for the item it’s annotating. A different, more brutal, way to disable mypy checks is to use # type: ignore
. Use it on the line for which you want to have mypy errors silenced:
Using Any
should be as rare as possible, while using # type: ignore
should be the last resort. Especially when you are taking the whole typing thing seriously.
You definitely should not use them when:
- You want to postpone annotating something. If you don’t want to add a type at the moment, just leave the thing untyped. Python typing is completely optional — you can freely annotate one part of the code and leave the other not annotated.
- You don’t understand mypy’s error message and just want to get rid of it. Sometimes it can be cryptic or too generic, indeed, but it’s really worthwhile to get to the bottom of the issue. In my experience, almost every time mypy is onto something.
Where Any
might be useful?
- Use
Any
when you don’t really know what the type of a variable will be. The typical case is data created from an externally provided JSON. What is the type of the variable created this way? Is it alist
,dict
,int
,float
,bool
orNone
? It might as well beAny
. Mind that properly defining JSON format using nested types (Dict
,List
, etc.) is impossible (or at least very difficult) due to recursive structure of JSON.
Where # type: ignore
might be useful?
- When mypy doesn’t really know what’s going on. E.g. something is changed in the runtime — like a method is dynamically added to a class or class hierarchy is changed. If the mypy’s error is really inadequate and you cannot think of another way to handle this, use
# type: ignore
. But first, try to understand why mypy is complaining. Mypy is, in fact, a pretty good code-smell detector 👍 - When something is not yet compatible with mypy (like enums, which were not supported at the beginning).
- There is a bug in mypy. Remember to report it on github!
Q&A
If you are not yet convinced to start using typing in Python, read on.
I like my Python code duck-typed and dynamic. Typing will ruin that. Amirite?
First, it’s true that mypy won’t understand some runtime hacks, which are related to the dynamic nature of the language. However, in return, you get a more reliable code.
Second, Python typing system supports protocols (aka “duck types”), both built-in as well as user-defined (this topic was not covered here). So no worries here.
Frankly, I don’t like this whole typing thing. Isn’t it slowly turning Python into Java?
Don’t worry, unlike in Java, typing in Python:
- is completely optional (*),
- is not affecting the runtime (**).
(*) Okay, not really, dataclasses
force you to type them. This is, hopefully, an anomaly.
(**) Okay, not really… First, imports from typing
are made. Second, some values are — in case of cast
and NewType
—passed through identity functions. Third, generic types (which were not covered in this blogpost) use a custom metaclass, which may conflict with a user-defined metaclass; it’s no longer a problem in Python 3.7.
Right, but just look at typed Python code: more similar to Java than to good old Python.
It may be the first impression, but did you really saw an enterprise Java codebase? 🙃 I’m not denying that typed Python code looks different then untyped one and you need to get used to reading it. But when you do — and the code itself is typed in a smart way (e.g. by using aliases) — it can become even more readable and understandable than before adding types. And it’s still Pythonic, as it adds to the “explicitness” and readability of the code.
Okay, let’s try this… How do I begin using type annotation in my codebase?
I recommend the following path:
- Start with running mypy on your untyped code and see what happens. You might be surprised that it already understands so much. (You might want to run it with
--ignore-missing-imports
and--follow-imports skip
parameters at first.) - Fix all initial mypy errors.
- Now just start adding type annotations to your code. You can begin anywhere, but I think, it’s best to start with the most important code. This way, your codebase will become more reliable earlier. Don’t worry that a wrong type of annotation will break your code: typing is designed to affect Python’s runtime as minimally as possible.
- Keep adding type annotations. After some time, adding type annotations will start to pay off. You may get more and more mypy errors that catch real bugs. (Just don’t forget to run mypy from time to time.)
- Next step is adding mypy checks to CI pipeline (and maybe to pre-commit hooks) and make your code more reliable all the time.
Generally, if you are committed to typing, it’s a good idea to annotate all new code. Another approach is to skip the annotating phase when you just want to quickly prototype a functionality. When you know the code will stay, you can add type annotations then.
You can also try to start by using an automated tool for adding type annotations. pyannotate library gets type information from observing actual types at runtime. Similarly, pytest-annotate adds type annotations by running tests and observing types. Also, pytype checker can add annotations based on static analysis of code. I don’t have an experience using those tools, but I think it’s a good idea to carefully verify annotations given by any of them. Studying those annotations may even lead to discovering some bugs in your code.
What to do when I don’t have time to understand a mypy error?
Find the time, it may pay off. Don’t use # type: ignore
, but try to understand the problem first (sometimes searching on mypy issue page is the only way), and use it only if you are sure you don't have any good alternative. Mypy might, and probably is, onto something.
On the other hand, don’t try to please mypy no matter the cost. The rule of thumb is: don’t make your code worse just to silence mypy error. It’s still only a tool.
Remember that mypy is in constant development. Its error messages are getting better and false positives (which are discovered from time to time) are getting eliminated. So remember to always update mypy when the new version comes out.
Would you like to dig deeper into Python type system? You can! I’ve written the second typing blog post series 🤩 It’s devoted to subtyping relationships between complex types. Check it out, if want to know what “contravariance” is and why knowing that helps in writing safer code:
If you enjoyed this post, please hit the clap button below 👏👏👏