BLOBs with Knex and PostgreSQL

My team and I recently decided that we should store user-generated binary files in our PostgreSQL database so that a user could download them at a later time. This is different from uploading those files to something like Amazon S3 (which we’ll likely consider migrating to at some point down the road).

Initially, we implemented this solution using a bytea column, but we found that the retrieval performance was unacceptable. It required the whole file to be read into memory before being sent back to the user (as opposed to streaming the content in smaller chunks).

We decided to switch to using PostgreSQL’s Large Object capability since that allows for reading smaller chunks of data. This enabled us to stream the content back to the user requesting the file.

In this post, I’ll show you how we were able to use Knex with the pg-large-object npm package to quickly stream files from the database to the user in our Node.js application.

pg-large-object

We use Knex to access our PostgreSQL database from our Node.js server. Knex doesn’t have any built-in support for PostgreSQL’s Large Object functionality at this time. The only existing library I was able to find for Node.js that could handle Large Objects is Joris van der Wel’s pg-large-object package.

pg-large-object provides a callback interface (or a promise-based interface if you use pg-promise) for streaming data to and from Large Objects in the database. Here’s an example taken straight from the README (reformatted slightly) for streaming content to a file:


var pg = require('pg');
var LargeObjectManager = require('pg-large-object').LargeObjectManager;
var conString = "postgres://postgres:1234@localhost/postgres";

pg.connect(conString, function (err, client, done) {
  if (err) {
    return console.error('could not connect to postgres', err);
  }

  var man = new LargeObjectManager({ pg: client });

  // When working with Large Objects, always use a transaction
  client.query('BEGIN', function (err, result) {
    if (err) {
      done(err);
      return client.emit('error', err);
    }

    // A LargeObject oid, probably stored somewhere in one of your own tables.
    var oid = 123;

    // If you are on a high latency connection and working with
    // large LargeObjects, you should increase the buffer size.
    // The buffer should be divisible by 2048 for best performance
    // (2048 is the default page size in PostgreSQL, see LOBLKSIZE)
    var bufferSize = 16384;
    man.openAndReadableStream(oid, bufferSize, function (err, size, stream) {
      if (err) {
        done(err);
        return console.error('Unable to read the given large object', err);
      }

      console.log('Streaming a large object with a total size of', size);
      stream.on('end', function () {
        client.query('COMMIT', done);
      });

      // Store it as an image
      var fileStream = require('fs').createWriteStream('my-file.png');
      stream.pipe(fileStream);
    });
  });
});

Large Objects with Knex

In the above example, the LargeObjectManager is taking care of a lot of complexity for us. But there are still some details in that example that we don’t normally have to worry about–like acquiring a connection or manually sending the BEGIN and COMMIT statements when using a transaction. (PostgreSQL’s Large Object facility requires that all Large Object access be handled within a transaction.)

pg-large-object does not explicitly integrate with Knex. But it has been written to work with node-postgres, which happens to be what Knex uses underneath the hood as well.

It took a little poking around in the Knex source, but we eventually figured out how to extract the right object to pass to the LargeObjectManager’s constructor.

WARNING: Accessing the internal connection like this is not something that’s described in the Knex.js documentation. The following code could very well break if a future version of Knex changes its internal structures. Make sure you’re factoring that into your decision to use pg-large-object with Knex in a production application.


knex.transaction(async (trx) => {
  const connection = await trx.client.acquireConnection();
  const blobManager = new LargeObjectManager({ pg: connection });

  // ...
});

A Promise-Based Interface

Our last step was to make a nice Promise-based function that could be used to stream a BLOB of data directly to a Writable stream:


import { Writable } from 'stream';
import { Transaction } from 'knex';

const streamBlobToWritable = async (trx: Transaction, oid: number, writable: Writable): Promise => {
  return new Promise(async (resolve, reject) => {
    const connection = await trx.client.acquireConnection();
    const blobManager = new LargeObjectManager({ pg: connection });

    blobManager.openAndReadableStream(oid, 16384, async (err, size, blobStream) => {
      if (err) {
        reject(err);
        return;
      }

      blobStream.on('end', function () {
        resolve();
      });

      blobStream.pipe(writable);
    });
  });
};

Then this could be used to stream directly from the database to an Express response:


await knex.transaction(async (trx) => {
  // Query the name of the file, the OID, the type of file, etc.
  // ...

  response.setHeader('Content-Type', 'application/pdf');
  response.setHeader('Content-Disposition', `attachment; filename="${fileName}"`);
  await BlobManager.streamBlobToWritable(trx, oid, response);
  response.end(undefined, 'binary');
});

I hope you found this solution as efficient as we did.