Simon Willison’s Weblog

Subscribe

Weeknotes: software carpentry, compiling modules for SQLite

26th September 2020

This week I completed the Software Carpentry instructor training course, added two foundational features to sqlite-utils and learned how to compile modules for SQLite.

Software Carpentry

This week I took the two day instructor training course put on by Software Carpentry. I’ve been wanting to do this for over a year now, so I’m excited to have finally found the time to attend the workshop.

The Carpentries is a really interesting non-profit organization. Their mission is to “teach foundational coding and data science skills to researchers worldwide”—but I’ve always thought of them as teaching software engineering fundamentals to scientists—things like version control!

It turns out I’ve been following the carpentries project for fifteen years now, ever since their founder Greg Wilson started writing about his efforts to bring software engineering to scientists on his blog.

As you would expect from an organization that specializes in workshop education, the workshop they put on teaching you how to run workshops is top notch. Their curriculum materials, forged by a process of continuing tweaks over hundreds if not thousands of presentations, are superb. I aspire to produce educational documentation this good some day.

An interesting aspect of Carpentries is that every workshop is presented using live coding—no slides! The instructor works through the same material as the learners, talking through each line of code as they run it. The kind of thing you would normally use slides for (diagrams etc) instead lives in the workshop handouts. Take a look through their Databases and SQL course to get an idea of how that works.

I have to complete a couple more steps, but once I’ve done that I’ll be a certified instructor for the program—which means I can volunteer to help run their workshops in the future.

SQLite3 modules

I had two encounters with compiled SQLite modules this week.

One of my favourite lesser-known features of PostgreSQL is trigram indexes. The short version is: they make where text like '%something%' queries run crazy-fast. Normal indexes just speed up prefix% like searches, but trigrams work for strings in the middle of other strings as well.

I started a thread on the official SQLite forum asking about trigram index support, and proposed that maybe it could be achieveable using a custom SQLite FTS tokenizer. Core SQLite maintainer Dan Kennedy replied with a complete, working implementation of exactly that, written in C!

With Dan’s permission I’ve published his code in a sqlite-fts5-trigram GitHub repository. In doing so I figured out how to build it for macOS, and then hooked that build script up to a GitHub Actions workflow that produces a compiled ftstri.so module ready for anyone on macOS to run themselves.

Just one problem: it turns out binary modules like that need to be signed for macOS before anyone can run them. That’s a step beyond me at the moment, but I may revisit that in the future.

The other complied module I learned to use this week is spellfix.c, see my TIL Compiling the SQLite spellfix.c module on macOS. I needed these because I’ve started expereminting with Plex as a home media server, and it turns out Plex runs on SQLite! The catch is that it uses the spellfix module, which means you can’t open the Plex database in Datasette without first compiling that module for use with --load-extension—see the TIL for details.

sqlite-utils

I shipped sqlite-utils 2.20 and 2.21, and wrote about the big new features in two blog entries here:

The extract operation in sqlite-utils 2.20 was taking 12 minutes to execute against a table with 680,000 rows. I had a rethink of how it worked and got that down to just 4 seconds in sqlite-utils 2.21!

I worked on these as part of my ongoing exploration of Datasette and sqlite-utils as tools for cleaning up and refactoring data. They fill some key holes in the tooling I’ve built so far.

I’m already using transform to build out the next phase of my datasette-edit-tables plugin. See issue #11 for notes on work-in-progress support for editing table schemas directly in Datasette.

Screenshot of datasette-edit-tables showing an interface for editing and re-ordering columns

TIL this week

Releases this week