Using Rust to Power Python Importing With oxidized_importer

May 10, 2020 at 01:15 PM | categories: Python, PyOxidizer

I'm pleased to announce the availability of the oxidized_importer Python package, a standalone version of the custom Python module importer used by PyOxidizer. oxidized_importer - a Python extension module implemented in Rust - enables Python applications to start and run quicker by providing an alternate, more efficient mechanism for loading Python resources (such as source and bytecode modules).

Installation instructions and detailed usage information are available in the official documentation. The rest of this post hopefully answers the questions of why are you doing this and why should I care.

In a traditional Python process, Python's module importer inspects the filesystem at run-time to find and load resources like Python source and bytecode modules. It is highly dynamic in nature and relies on the filesystem as a point-in-time source of truth for resource availability.

oxidized_importer takes a different approach to resource loading that is more static in nature and more suitable to application environments (where Python resources aren't changing). Instead of dynamically probing the filesystem for available resources, resources are instead indexed ahead of time. When Python goes to resolve a resource (say it is looking to import a module), oxidized_importer simply needs to perform a lookup in an in-memory data structure to locate said resource. This means oxidized_importer only has marginal reliance on the filesystem, which can make it much faster than Python's traditional importer. (Performance benefits of binaries built with PyOxidizer have already been clearly demonstrated.)

The oxidized_importer Python extension module exposes parts of PyOxidizer's packaging and run-time functionality to Python code, without requiring the full use of PyOxidizer for application packaging. Specifically, oxidized_importer allows you to:

  • Install a custom, high-performance module importer (OxidizedFinder) to service Python import statements and resource loading (potentially from memory, using zero-copy).
  • Scan the filesystem for Python resources (source modules, bytecode files, package resources, distribution metadata, etc) and turn them into Python objects, which can be loaded into OxidizedFinder instances.
  • Serialize Python resource data into an efficient binary data structure for loading into an OxidizedFinder instance. This facilitates producing a standalone resources blob that can be distributed with a Python application which contains all the Python modules, bytecode, etc required to power that application. See the docs on freezing an application with oxidized_importer.

oxidized_importer can be thought of as PyOxidizer-lite: it provides just enough functionality to allow Python application maintainers to leverage some of the technical advancements of PyOxidizer (such as in-memory module imports) without using PyOxidizer for application packaging. oxidized_importer can work with the Python distribution already installed on your system. You just pip install it like any other Python package.

By releasing oxidized_importer as a standalone Python package, my hope is to allow more people to leverage some of the technical achievements and performance benefits coming out of PyOxidizer. I also hope that having more users of PyOxidizer's underlying code will help uncover bugs and conformance issues, raising the quality and viability of the projects.

I would also like to use oxidized_importer as an opportunity to advance the discourse around Python's resource loading mechanism. Filesystem I/O can be extremely slow, especially in mobile and embedded environments. Dynamically probing the filesystem to service module imports can therefore be slow. (The Python standard library has the zipimport module for importing Python resources from a zip file. But in my opinion, we can do much better.) I would like to see Python move towards leveraging immutable, serialized data structures for loading resources as efficiently as possible. After all, Python resources like the Python standard library are likely not changing between Python process invocations. The performance zealot in me cringes thinking of all the overhead that Python's filesystem probing approach incurs - all of the excessive stat() and other filesystem I/O calls that must be performed to answer questions about state that is easily indexed and often doesn't change. oxidized_importer represents my vision for what a high-performance Python resource loader should look like. I hope it can be successful in steering Python towards a better approach for resource loading.

I plan to release oxidized_importer independently from PyOxidizer. While the projects will continue to be developed in the same repository and will leverage the same underlying Rust code, I view them as somewhat independent and serving different audiences.

While oxidized_importer evolved from facilitating PyOxidizer's run-time use cases, I'm not opposed to taking it in new directions. For example, I would entertain implementing Python's dynamic filesystem probing logic in oxidized_importer, allowing it to serve as a functional stand-in for the official importer shipped with the Python standard library. I have little doubt an importer implemented in 100% Rust would outperform the official importer, which is implemented in Python. There's all kinds of possibilities here, such as using a background thread to index sys.path outside the constraints of the GIL. But I don't want to get ahead of myself...

If you are a Python application maintainer and want to make your Python processes execute a bit faster by leveraging a pre-built index of available Python resources and/or taking advantage of in-memory module importing, I highly encourage you to take a look at oxidized_importer!