Simon Willison’s Weblog

Subscribe

Weeknotes: Covid-19, First Python Notebook, more Dogsheep, Tailscale

1st April 2020

My covid-19.datasettes.com project publishes information on COVID-19 cases around the world. The project started out using data from Johns Hopkins CSSE, but last week the New York Times started publishing high quality USA county- and state-level daily numbers to their own repository. Here’s the change that added the NY Times data.

It’s very easy to use this data to accidentally build misleading things. I’ve been updating the README with links about this—my current favourite is Why It’s So Freaking Hard To Make A Good COVID-19 Model by Maggie Koerth, Laura Bronner and Jasmine Mithani at FiveThirtyEight.

First Python Notebook

Ben Welsh from the LA Times teaches a course called First Python Notebook at journalism conferences such as NICAR. He ran a free online version the course last weekend, and I offered to help out as a TA.

Most of the help I provided came before the course: Ben asked attendees to confirm that they had working installations of Python 3 and pipenv, and if they didn’t volunteers such as myself would step in to help. I had Zoom and email conversations with at least ten people to help them get their environments into shape.

This XKCD neatly summarizes the problem:

XKCD Python Environments

One of the most common problems I had to debug was PATH issues: people had installed the software, but due to various environmental differences python3 and pipenv weren’t available on the PATH. Talking people through the obscurities of creating a ~/.bashrc file and using it to define a PATH over-ride really helps emphasize how arcane this kind of knowledge is.

I enjoyed this comment:

“Welcome to intro to Tennis. In the first two weeks, we’ll discuss how to rig a net and resurface a court.”—Claus Wilke

Ben’s course itself is hands down the best introduction to Python from a Data Journalism perspective I have ever seen. Within an hour of starting the students are using Pandas in a Jupyter notebook to find interesting discrepancies in California campaign finance data.

If you want to check it out yourself, the entire four hour workshop is now on YouTube and closely follows the material on firstpythonnotebook.org.

Coronavirus Diary

We are clearly living through a notable and very painful period of history right now. On the 19th of March (just under two weeks ago, but time is moving both really fast and incredibly slowly right now) I started a personal diary—something I’ve never done before. It lives in an Apple Note and I’m adding around a dozen paragraphs to it every day. I think it’s helping. I’m sure it will be interesting to look back on in a few years time.

Dogsheep

Much of my development work this past week has gone into my Dogsheep suite of tools for personal analytics.

  • I upgraded the entire family of tools for compatibility with sqlite-utils 2.x.
  • pocket-to-sqlite got a major upgrade: it now fetches items using Pocket’s API pagination (previously it just tried to pull in 5,000 items in one go) and has the ability to only fetch new items. As a result I’m now running it from cron in my personal Dogsheep instance, so “Save to Pocket” is now my preferred Dogsheep-compatible way of bookmarking content.
  • twitter-to-sqlite got a couple of important new features in release 0.20. I fixed a nasty bug in the --since flag where retweets from other accounts could cause new tweets from an account to be ignored. I also added a new count_history table which automatically tracks changes to a Twitter user’s friends, follower and listed counts over time (#40).

I’m also now using Dogsheep for some journalism! I’m working with the Big Local News team at Stanford to help track and archive tweets by a number of different US politicians and health departments relating to the ongoing pandemic. This collaboration resulted in the above improvements to twitter-to-sqlite.

Tailscale

My personal Dogsheep is currently protected by client certificates, so only my personal laptop and iPhone (with the right certificates installed) can connect to the web server it is running on.

I spent a bit of time this week playing with Tailscale, and I’m really impressed by it.

Tailscale is a commercial company built on top of WireGuard, the new approach to VPN tunnels which just got merged into the Linux 5.6 kernel. Tailscale first caught my attention in January when they hired Brad Fitzpatrick.

WireGuard lets you form a private network by having individual hosts exchange public/private keys with each other. Tailscale provides software which manages those keys for you, making it trivial to set up a private network between different nodes.

How trivial? It took me less than ten minutes to get a three-node private network running between my iPhone, laptop and a Linux server. I installed the iPhone app, the Ubuntu package, the OS X app, signed them all into my Google account and I was done.

Each of those devices now has an additional IP address in the 100.x range which they can use to talk to each other. Tailscale guarantees that the IP address will stay constant for each of them.

Since the network is public/private key encrypted between the nodes, Tailscale can’t see any of my traffic—they’re purely acting as a key management mechanism. And it’s free: Tailscale charge for networks with multiple users, but a personal network like this is free of charge.

I’m not running my own personal Dogsheep on it yet, but I’m tempted to switch over. I’d love other people to start running their own personal Dogsheep instances but I’m paranoid about encouraging this when securing them is so important. Tailscale looks like it might be a great solution for making secure personal infrastructure more easily and widely available.