ENOSUCHBLOG

Programming, philosophy, pedaling.


Writing and publishing a Python module in Rust

Aug 2, 2020     Tags: devblog, programming, python, rust    

This post is at least a year old.

This post is a quick walkthrough of how I wrote a Python library, procmaps, in nothing but Rust. It uses PyO3 for the bindings and maturin to manage the build (as well as produce manylinux1-compatible wheels).

The code is, of course, available on GitHub, and can be installed directly with a modern Python (3.5+) via pip1 without a local Rust install:

1
$ pip3 install procmaps

Procmaps?

procmaps is an extremely small Python library, backed by a similarly small Rust library2.

All it does is parse “maps” files, best known for their presence under procfs on Linux3, into a list of Map objects. Each Map, in turn, contains the basic attributes of the mapped memory region.

By their Python attributes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import os
import procmaps

# also: from_path, from_str
# N.B.: named map_ instead of map to avoid shadowing the map function
map_ = procmaps.from_pid(os.getpid())[0]

map_.begin_address  # the begin address for the mapped region
map_.end_address    # the end address for the mapped region
map_.is_readable    # is the mapped region readable?
map_.is_writable    # is the mapped region writable?
map_.is_executable  # is the mapped region executable?
map_.is_shared      # is the mapped region shared with other processes?
map_.is_private     # is the mapped region private (i.e., copy-on-write)?
map_.offset         # the offset into the region's source that the region originates from
map_.device         # a tuple of (major, minor) for the device that the region's source is on
map_.inode          # the inode of the source for the region
map_.pathname       # the "pathname" field for the region, or None if an anonymous map

Critically: apart from the imports and the os.getpid() call, all of the code above calls directly into compiled Rust.

Motivation

The motivations behind procmaps are twofold.

First: I do program analysis and instrumentation research at my day job. Time and time again, I need to obtain information about the memory layout of a program that I’m instrumenting (or would like to instrument). This almost always means opening /proc/<pid>/maps, writing an ad-hoc parser, getting the field(s) I want, and then getting on with my life.

Doing this over and over again has made me realize that it’s an ideal task for a small, self-contained Rust library:

Second: I started learning Rust about a year ago, and have been looking for new challenges in it. Interoperating with another language (especially one with radically different memory semantics, like Python) is an obvious choice.

Structure

The procmaps module is a plain old Rust crate. Really.

The only differences are in the Cargo.toml:

1
2
3
4
5
6
7
8
[lib]
crate-type = ["cdylib"]

[package.metadata.maturin]
classifier = [
  "Programming Language :: Rust",
  "Operating System :: POSIX :: Linux",
]

(Other settings under package.metadata.maturin are available for e.g. managing Python-side dependencies, but procmaps doesn’t need them. More details are available here.)

In terms of code, the crate is structured like a normal Rust library. PyO3 only requires a few pieces of sugar to promote everything into Python-land:

Modules

Python modules are created by decorating a Rust function with #[pymodule].

This function then uses the functions of the PyModule argument that it takes to load the module’s functions and classes.

For example, here is the Python-visible procmaps module in its entirety:

1
2
3
4
5
6
7
8
9
#[pymodule]
fn procmaps(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_class::<Map>()?;
    m.add_wrapped(wrap_pyfunction!(from_pid))?;
    m.add_wrapped(wrap_pyfunction!(from_path))?;
    m.add_wrapped(wrap_pyfunction!(from_str))?;

    Ok(())
}

Functions

Module level functions are trivial to create: they’re just normal Rust functions, marked with #[pyfunction]. They’re loaded into modules via add_wrapped + wrap_pyfunction!, as seen above. Alternatively, they can be created within a module definition (i.e., nested within the #[pymodule]) function via the #[pyfn] decorator.

Python-visible functions return a PyResult<T>, where T implements IntoPy<PyObject>. PyO3 helpfully provides an implementation of this trait for many core types; a full table is here. This includes Option<T>, making it painless to turn Rust-level functions that return Options into Python-level functions that can return None.

procmaps doesn’t make use of them, but PyO3 also supports variadic arguments and keyword arguments. Details on those are available here.

Here’s a trivial Python-exposed function that does integer division, returning None if division by zero is requested:

1
2
3
4
5
6
7
8
#[pyfunction]
fn idiv(dividend: i64, divisor: i64) -> PyResult<Option<i64>> {
  if divisor == 0 {
    Ok(None)
  } else {
    Ok(Some(dividend / divisor))
  }
}

Classes

Classes are loaded into modules via the add_class function, as seen in the module definition.

Just like modules, they’re managed almost entirely behind a single decorator, this time on a Rust struct. Here is the entirety of the procmaps.Map class definition:

1
2
3
4
#[pyclass]
struct Map {
    inner: rsprocmaps::Map,
}

procmaps doesn’t need them, but trivial getters and setters can be added to the members of a class with #[pyo3(get, set)]. For example, the following creates a Point class:

1
2
3
4
5
6
7
#[pyclass]
struct Point {
  #[pyo3(get, set)]
  x: i64,
  #[pyo3(get, set)]
  y: i64,
}

…for which the following would be possible in Python:

1
2
3
4
5
6
7
8
9
# get_unit_point not shown above
from pointlib import get_unit_point

p = get_unit_point()
print(p.x, p.y)

p.x = 100
p.y = -p.x
print(p.x, p.y)

Using #[pyclass] on Foo auto-implements IntoPy<PyObject> for Foo, making it easy to return your custom classes from any function (as above) or member method (as below).

Member methods

Just as Python-visible classes are defined via #[pyclass] on Rust structs, Python-visible member methods are declared via #[pymethods] attribute on Rust impls for those structures.

Member methods return PyResult<T>, just like functions do:

1
2
3
4
5
6
#[pymethods]
impl Point {
  fn invert(&self) -> PyResult<Point> {
    Ok(Point { x: self.y, y: self.x})
  }
}

…allows for the following:

1
2
3
4
5
# get_unit_point not shown above
from pointlib import get_unit_point

p = get_unit_point()
p_inv = p.invert()

By default, PyO3 forbids the creation of Rust-defined classes within Python code. To allow their creation, just add a function with the #[new] attribute to the #[pymethods] impl block. This creates a __new__ Python method rather than __init__; PyO3 doesn’t support the latter5.

For example, here’s a constructor for the contrived Point class above:

1
2
3
4
5
6
7
#[pymethods]
impl Point {
  #[new]
  fn new(x: i64, y: i64) -> Self {
    Point { x, y }
  }
}

…which allows for:

1
2
3
4
5
from pointlib import Point

p = Point(100, 0)
p_inv = p.invert()
assert p.y == 100

Exceptions and error propagation

As mentioned above, (most) Python-visible functions and methods return PyResult<T>.

The Err half of PyResult is PyErr, and these values get propagated as Python exceptions. The pyo3::exceptions module contains structures that parallel the standard Python exceptions, each of which provides a py_err(String) function to produce an appropriate PyErr.

Creating a brand new Python-level exception takes a single line with the create_exception! macro. Here’s how procmaps creates a procmaps.ParseError exception that inherits from the standard Python Exception class:

1
2
3
4
5
use pyo3::exceptions::Exception;

// N.B.: The first argument is the module name,
// i.e. the function declared with #[pymodule].
create_exception!(procmaps, ParseError, Exception);

Similarly, marshalling Rust Error types into PyErrs is as simple as impl std::convert::From<ErrorType> for PyErr.

Here’s how procmaps turns some of its errors into standard Python IOErrors and others into the custom procmaps.ParseError exception:

1
2
3
4
5
6
7
8
9
10
11
12
// N.B.: The newtype here is only necessary because Error comes from an
// external crate (rsprocmaps).
struct ProcmapsError(Error);
impl std::convert::From<ProcmapsError> for PyErr {
    fn from(err: ProcmapsError) -> PyErr {
        match err.0 {
            Error::Io(e) => IOError::py_err(e.to_string()),
            Error::ParseError(e) => ParseError::py_err(e.to_string()),
            Error::WidthError(e) => ParseError::py_err(e.to_string()),
        }
    }
}

Compilation and distribution

With everything above, cargo build just works — it produces a Python-loadable shared object.

Unfortunately, it does it using the cdylib naming convention, meaning that cargo build for procmaps produces libprocmaps.so, rather than one of the naming conventions that Python knows how to look for when searching $PYTHONPATH6.

This is where maturin comes in: once installed, a single maturin build in the crate root puts an appropriately named pip-compatible wheel in target/wheels.

It gets even better: maturin develop will install the compiled module directly into the current virtual environment, making local development as simple as:

1
2
3
4
5
6
$ python3 -m venv env
$ source env/bin/activate
(env) $ pip3 install maturin
(env) $ maturin develop
$ python3
> import procmaps

procmaps has a handy Makefile that wraps all of that; running the compiled module locally is a single make develop away.

Distribution is slightly more involved: maturin develop builds wheels that are compatible with the local machine, but further restrictions on symbol versions and linkages are required to ensure that a binary wheel runs on a large variety of Linux versions and distributions7.

Compliance with these constraints is normally enforced in one of two ways:

  1. Packages are compiled into binary wheels, and then audited (and potentially repaired) via the PyPA’s auditwheel before release.
  2. Packages are compiled into binary wheels within a wholly controlled runtime environment, such as the PyPa’s manylinux Docker containers.

Distribution with maturin takes the latter approach: the maturin developers have derived a Rust build container from the PyPa’s standard manylinux container, making fully compatible builds (again, from the crate root) as simple as:

1
2
# optional: do `build --release` for release-optimized builds
$ docker run --rm -v $(pwd):/io konstin2/maturin build

This command, like a normal maturin build, drops the compiled wheel(s) into target/wheels. Because it runs inside of the standard manylinux container, it can and does automatically build wheels for a wide variety of Python versions (Python 3.5 through 3.8, as of writing).

From here, distribution to PyPI is as simple as twine upload target/wheels/* or maturin publish. procmaps currently uses the former, as releases are handled via GitHub Actions using the PyPA’s excellent gh-action-pypi-publish action.

Voilá: a Python module, written completely in Rust, that can be installed on the vast majority of Linux distributions with absolutely no dependencies on Rust itself. Even the non-maturin metadata in Cargo.toml is propagated correctly!

procmaps on PyPI

Wrapup

I only ran into one small hiccup while working on procmaps — I tried to add a Map.__contains__ method to allow for inclusion checks with the in protocol, e.g.:

1
2
3
fn __contains__(&self, addr: u64) -> PyResult<bool> {
    Ok(addr >= self.inner.address_range.begin && addr < self.inner.address_range.end)
}

…but this didn’t work, for whatever reason, despite working when called manually:

1
2
3
4
5
6
7
>>> 4194304 in map_
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: argument of type 'Map' is not iterable

>>> map_.__contains__(4194304)
True

There’s probably a reasonable explanation for this in the Python data model that I haven’t figured out. Edit: a Redditor pointed me to the correct approach. I’ve cut a new release of procmaps that shows the __contains__ protocol in action.

By and large, the process of writing a Python module in Rust was extremely pleasant — I didn’t have to write a single line of Python (or even Python-specific configuration) until I wanted to add unit tests. Both pyO3 and maturin are incredibly polished, and the PyPA’s efforts to provide manylinux build environments made compatible builds a breeze.


  1. …on x86_64 only, for the time being. There’s nothing fundamentally blocking other architectures; it’s just a matter of hooking them up via a CI other than GitHub Actions. 

  2. My original goal with the Rust library was to teach myself Pest on a simple format. It turns out that there is already a high quality equivalent package available on Crates. 

  3. Linux didn’t originate procfs but, as far as I can tell, no other Unices provide /proc/<pid>/maps. FreeBSD appears to provide a /proc/<pid>/map file of similar purpose. 

  4. Except in the “pathname” field; see the proc(5) manpage for details. 

  5. Presumably because Rust has no concept of a “created but uninitialized” object; the two are always conjoined. 

  6. Documentation for these is a little scarce, but strace -m procmaps indicates that the acceptable formats are procmaps.cpython-XX-target-triple.so, procmaps.so, and procmapsmodule.so

  7. These are known as the “manylinux” constraints, and are documented in PEPs 513 (“manylinux1”), 571 (“manylinux2010”), 599 (“manylinux2014”), 600, and possibly others.