Simon Willison’s Weblog

Subscribe

Weeknotes: AI won’t slow down, a new newsletter and a huge Datasette refactor

22nd March 2023

I’m a few weeks behind on my weeknotes, but it’s not through lack of attention to my blog. AI just keeps getting weirder and more interesting.

I’m beginning to expect that every Tuesday may be a write-off for the next few years, since the AI community seems to have decided that Tuesday is the day to launch everything.

Two Tuesdays ago we got a Google announcement, Anthropic’s Claude and GPT-4. On Tuesday this week we got Google Bard, Bing Image Creator and Adobe Firefly.

I’ve written about a bunch of that stuff this month:

Apparently this blog is now partly focused on AI! If you want to stay up-to-date with my writing on this (and other) subjects you can subscribe to my atom feed, or you can sign up for my brand new Substack newsletter.

My blog as a newsletter

I know there are a lot of people out there who don’t habitually use a feed reader but do find great value from email newsletters.

simonw.substack.com is my new newsletter, which is effectively a way to subscribe to my blog via email.

I started it a few months ago when it looked like Twitter was about to collapse under the weight of its new mismanagement. I first promoted it at the bottom of my Large language models are having their Stable Diffusion moment post, and it’s since grown to 640 subscribers!

I plan to send it out around once a week, provided there’s material to send.

It will be mostly content from my blog, with maybe a paragraph or two of additional context added at the top highlighting themes of the past week (such as GPT-4).

The first two editions can be found here:

A fun detail about my newsletter is how I’m generating it.

Substack doesn’t have an API, but I wanted to automate as much of the process of copying in data from my blog as possible.

I built myself an automation around copy and paste!

observablehq.com/@simonw/blog-to-newsletter is an Observable notebook I wrote which assembles most of the newsletter for me.

It works by running this SQL query against my datasette.simonwillison.net Datasette instance, which runs against a SQLite copy of my blog content (a PostgreSQL/Django app) built by a GitHub Action in this repository.

The SQL query assembles a string of HTML which is rendered in the notebook. There’s also a “Copy to clipboard” button which uses this JavaScript pattern to copy a rich text representation of the HTML to the clipboard.

When I hit “paste” in the Substack editor interface it converts that representation into Substack’s chosen subset of HTML. Then I can edit it by hand in the Substack editor.

This is working really well so far—it’s really easy to tweak the generated HTML in the Observable notebook, and once I’ve transferred it to Substack I can re-arrange things and add my own extra commentary to the top of the newsletter before hitting send.

Datasette’s new JSON API

I finally landed a GIANT branch I’ve been working on for several months now: a complete redesign of Datasette’s default JSON format, one of the largest changes I need to land prior to releasing Datasette 1.0.

The previous default JSON format was a bit of a mess: it had dozens of keys, and presented the row data as an array of arrays (on the basis that the column names were available in a separate key, and rows as arrays would be more efficient in terms of bytes on the wire).

I always found myself adding ?_shape=array to that URL to get a smalle format, which strongly indicated that the default I had picked was the wrong one.

The new format can now be previewed here—it looks like this (truncated):

{
  "ok": true,
  "next": "d,v",
  "rows": [
    {
      "pk1": "a",
      "pk2": "a",
      "content": "a-a"
    },
    {
      "pk1": "a",
      "pk2": "b",
      "content": "a-b"
    }
  ]
}

The default keys are "ok", "next" to indicate pagination (this is null if there are no extra pages) and "rows" with a list of JSON objects.

If you want extra rows—like a total row count, or a list of columns, or some suggested facets—you can request them using the new ?_extra= parameter—for example:

https://latest.datasette.io/fixtures/sortable.json?_extra=columns&_extra=count&_extra=suggested_facets

This returns a response that starts like this:

{
  "ok": true,
  "next": "d,v",
  "count": 201,
  "columns": [
    "pk1",
    "pk2",
    "content",
    "sortable",
    "sortable_with_nulls",
    "sortable_with_nulls_2",
    "text"
  ],
  "suggested_facets": [
    {
      "name": "pk1",
      "toggle_url": "https://latest.datasette.io/fixtures/sortable.json?_extra=columns&_extra=count&_extra=suggested_facets&_facet=pk1"
    },
    {
      "name": "pk2",
      "toggle_url": "https://latest.datasette.io/fixtures/sortable.json?_extra=columns&_extra=count&_extra=suggested_facets&_facet=pk2"
    },
    {
      "name": "text",
      "toggle_url": "https://latest.datasette.io/fixtures/sortable.json?_extra=columns&_extra=count&_extra=suggested_facets&_facet=text"
    }
  ],
  "rows": [

There’s still more work to do on this feature: I need to write the documentation for it, and figure out how it should affect the Datasette endpoint that returns results from an arbitrary SQL query. But it’s ready to preview, and I’m keen to get feedback on it as quickly as possible!

Please take a look, and provide feedback on this dedicated issue thread—or come and talk about it in the Datasette Discord.

Releases these weeks

TIL these weeks