Simon Willison’s Weblog

Subscribe

Weeknotes: Installing Datasette with Homebrew, more GraphQL, WAL in SQLite

13th August 2020

This week I’ve been working on making Datasette easier to install, plus wide-ranging improvements to the Datasette GraphQL plugin.

Datasette and Homebrew

Datasette is now part of the GitHub Discussions beta—which means the GitHub repository for the project now has a Datasette discussions area. I’ve been wanting to set up somewhere to talk about the project free of pressure to file issues or bug reports for a while, so I’m really excited to have this as a new community space.

One of the first threads there was about Making Datasette easier to install. This inspired me to finally take a look at issue #335 from July 2018—“Package datasette for installation using homebrew”.

I used the VisiData Homebrew Tap as a starting point, along with Homebrew’s Python for Formula Authors documentation. To cut a long story short, brew install datasette now works!

I wrote up some detailed notes on Packaging a Python CLI tool for Homebrew. I’ve also had my sqlite-utils CLI tool accepted into Homebrew, so you can now install that using brew install sqlite-utils as well.

datasette install, datasette uninstall

The updated Datasette installation instructions now feature a range of different options: Homebrew, pip, pipx and Docker.

Datasette Plugins need to be installed into the same Python environment as Datasette itself. If you installed Datasette using pipx or Homebrew figuring out which environment that is isn’t particularly straight-forward.

So I added two new commands to Datasette (released in Datasette 0.47): datasette install name-of-plugin and datasette uninstall name-of-plugin. These are very thin wrappers around the underlying pip, but with the crucial improvement that they guarantee they’ll run it in the correct environment. I derived another TIL from these on How to call pip programatically from Python.

datasette --get “/-/versions.json”

Part of writing a Homebrew package is defining a test block that confirms that the packaged tool is working correctly.

I didn’t want that test to have to start a Datasette web server just so it could execute an HTTP request and shut the server down again, so I added a new feature: datasette --get.

This is a mechanism that lets you execute a fake HTTP GET request against Datasette without starting the server, and outputs the result to the terminal.

This means that anything you can do with the Datasette JSON API is now available on the command-line as well!

I like piping the output to jq to get pretty-printed JSON:

% datasette github.db --get \
    '/github/recent_releases.json?_shape=array&_size=1' | jq
[
  {
    "rowid": 140912432,
    "repo": "https://github.com/simonw/sqlite-utils",
    "release": "https://github.com/simonw/sqlite-utils/releases/tag/2.15",
    "date": "2020-08-10"
  }
]

datasette-graphql improvements

I introduced datasette-graphql last week. I shipped five new releases since then, incorporating feedback from GraphQL advocates on Twitter.

The most significant improvement: I’ve redesigned the filtering mechanism to be much more in line with GraphQL conventions. The old syntax looked like this:

{
  repos(filters: ["license=apache-2.0", "stargazers_count__gt=10"]) {
    edges {
      node {
        full_name
      }
    }
  }
}

This mirrored how Datasette’s table page works (e.g. repos?license=apache-2.0&stargazers_count__gt=10), but it’s a pretty ugly hack.

The new syntax is much, much nicer:

{
  repos(filter: {license: {eq: "apache-2.0"}, stargazers_count: {gt: 10}}) {
    edges {
      node {
        full_name
      }
    }
  }
}

Execute this query.

The best part of this syntax is that the columns and operations are part of the GraphQL schema, which means tools like GraphiQL can provide auto-completion for them interactively as you type a query.

Another new feature: tablename_row can be used to return an individual row (actually the first matching item for its arguments). This is a convenient way to access rows by their primary key, since the primary key columns automatically become GraphQL arguments:

{
  users_row(id: 9599) {
    id
    name
    contributors_list(first: 5) {
      totalCount
      nodes {
        repo_id {
          full_name
        }
        contributions
      }
    }
  } 
}

Try that query here.

There are plenty more improvements to the plugin detailed in the datasette-graphql changelog.

Write-ahead logging in SQLite

SQLite’s Write-Ahead Logging feature improves concurrency by preventing writes from blocking reads. I was seeing the occasional “database is locked” error with my personal Dogsheep so I decided to finally figure out how turn this on for a database.

The breakthrough realization for me (thanks to a question I asked on the SQLite forum) was that WAL mode is a characteristic of the database file itself. Once you’ve turned it on for the file, all future connections to that file will take advantage of it.

I wrote about this in a TIL: Enabling WAL mode for SQLite database files. I also embedded what I learned in sqlite-utils 2.15, which now includes sqlite-utils enable-wal file.db and sqlite-utils disable-wal file.db commands (and accompanying Python API methods).

Datasette 0.46, with a security fix

Earlier this week I also released Datasette 0.46, with the key feature being a security fix relating to canned queries and CSRF protection.

I used GitHub’s security advisory mechanism for this one: CSRF tokens leaked in URL by canned query form. I’ve also included detailed information on the exploit (and the fix) in issue #918.

Also new in 0.46: the /-/allow-debug tool, which can be used to experiment with Datasette’s allow blocks permissions mechanism.

Releases this week

TIL this week