Exposing records in CakePHP

… using UUIDs (universally unique identifiers) on top of the existing AIID (auto-increment ID) primary keys.

Introduction

I first have to explain a bit where I am coming from regarding this topic.

Over 13 years ago (~2007) I started some commercial sites and social networks in CakePHP 1/2 and at the time there was not so much available yet.
One of the requirements was the obfuscation of the records to mask the order of entry as well as how many entries per timeframe have been done.
Competitors can often use those primary key inserts to have some idea about someone else’s data.
Others randomly add dummy records or jump ahead here in the auto-increment position.
We started to use UUIDs though, for this very reason.
So all our records had this char(36) primary key as well as tons of those as foreign keys.
There wasn’t much available at the time, even with CakePHP 2.x in 2011 (binary-16 UUID was introduced in CakePHP 3 ~2015), so this is what we went with.

The database quickly filled up, and with a few million records and some joins, it also felt terribly slow. We tried, with some luck, to add more indexes here.
Remember ACL? A nightmare if you use UUIDs here too 😉 Don’t.
An article also describes why this not be such a good approach anymore these days.
Apart from that, we also had the difficulty of working with those IDs, as you cannot easily visualize or remember them.

There are cases were you need UUIDs, especially with multi-DB setups and alike. But we never had any such need.
So the rest of the article also assumes the lack of this particular requirement.

UUID comparison

There are two factors to this, as we already found out: Size and performance.
The first is quite a lot if you have a lot of records and on top a lot of relations and thus foreign key columns.
If you suddenly have char-36 (and thus 36 bytes) everywhere instead of unsigned int-10 (4 bytes), that sums up to a lot in the end.
With binary-16 (16 bytes), we can cut down here to the half already.

But what about the other topic: Performance?
This article nicely shows the issues with random UUIDs, exactly the ones we have been using.
For ordered UUIDs, it would become close to integers again (with the additional index overhead!), but those weren’t possible for the obfuscation reasons to begin with.
Real-life DBs show here quite the usability and performance issues, especially if you join across multiple such large tables.
Even indexes can not fully mitigate some of the issues.

Hashids

Then after some years and with new projects I explored Hashids and how it would be to keep the primary key for all its usefulness and just add a layer of obfuscation on top.
With different shorteners, the resulting IDs were even shorter than the actual integer number.
Applying them on top of the id column on the fly actually made it not necessary to change any code. All worked out of the box.

It looked amazing, despite the warning in our heads that this actually is not like "100% safe" since theoretically, you could find the right salt and algorithm to reverse the shortening. Once this was done, you would have full access to all the integer data that we wanted to hide. But given the complex salt, I am not sure how easy it would be to find this.

On top of this then some other issues appeared in more complex use cases, though.
Soon queries were harder to do due to the auto-shortened primary key before or after fetching or storing data.
You start to disable here and then, and manually write workarounds.

So, while having a minimal opt-in layer on top, it actually turned out to be neither fully obfuscating, nor easily usable to the full extend.

Expose through additional UUID column

Now we can dive into the actual topic.
After some years have passed it is time to upgrade apps from CakePHP 3 to CakePHP 4.
And during this challenge, I now also look into possible ways of getting rid of the legacy decisions or at least how to start something fresh in a more future proof way.

For this, I looked into how we apply slugs.
Slugs are often added as additional columns into records where you want to have a special (human reasonable) lookup.
So articles, tags, users usually have this dashed-version slug for this reason. On the other side, entities like comments, messages and records without a clear and concise name or title will not.
You can then expose the first only by that slug, hiding the ID completely and you are already done.
For the latter, we do need a different way, though.

It came to me, that combining both UUID and AIID (per record) as int-10 + binary-16 (4 + 16 = 20 bytes) could be the ideal compromise.
It keeps all internal relationships fast and small-sized, but it allows exposure through this additional UUID where needed.

This is now the Expose plugin and its behavior came to be.
If you want a demo, you can try the sandbox here.

I will skip all install and setup details, as those can be found in the plugin docs.

Migrating existing tables

So this will be one of the most important use cases. How we can add this functionality now to existing tables and data in a fast way?
Before we dive into that: If you want existing links and URLs to still work, this would need additional work in some cases in the form of 301 redirects.

Migrating AIID primary key records

Let’s imagine we want to expose an existing images table now through additional UUIDs instead of the AIIDs.
We have a row size of 2 million and we want to make the transition smooth and quick.

So first we add the uuid column to the records we want to expose this way.
The nice side effect of an additional column is that we can create and populate that one while the rest of the application still works as before.

We can use the provided command to create the migration:

bin/cake add_exposed_field Images -b -d

We want to use binary-16 UUIDs here for the mentioned advantages above.
Using -d we can see what it will create for us.

Based on an existing non-empty table, a generated migration could look somewhat like this:

    public function change() {
        $table = $this->table('images');
        // This is the field the plugin needs per entity to work
        $table->addColumn('uuid', 'binary', [
            'limit' => 16,
            'default' => null,
            'null' => true, // Add it as true for existing entities first, then fill/populate, then set to false afterwards.
        ]);

        // Besides primary key we will also want to have a unique index for quicker lookup on this exposed lookup field.
        $table->addIndex(['uuid'], ['unique' => true]);

        $table->update();
    }

Then we execute it using the Migrations plugin‘s migrate command:

bin/cake migrations migrate

Now we just need to fill the empty uuid columns. We can use the other provided command for this:

bin/cake populate_exposed_field Images

Once all records are populated, we can make a second migration file and set the field to be DEFAULT NOT NULL and add a UNIQUE constraint.
If you run here the first command again, it will display the code snippet for it:

bin/cake add_exposed_field Images -b

We don’t need the dry-run part here anymore since it will just output the migration content in this case.

That’s it.
You can still safely deploy this and run the current application in parallel to any data migration as it won’t use the column for the existing code yet.

Migrating from UUID primary key

This is BC (backward compatible) in terms of URL as you can just switch the fields around internally, to the outside it will still all look and behave the same.
The challenge here is to also migrate the foreign key columns along with the added integer primary key data.
The plugin offers a note on how to manage this.

Hiding the AIID primary keys

We can now refactor all the public actions (index, view, …) to different finder and adjust all templates to use ->uuid instead of ->id for generating links.
Piece by piece will this now effectively remove the primary key visibility, making all public interactions go through the new lookup column.

If we want to keep the code base free of changes, we can also look into the plugin’s Superimpose component.
The idea is to only add this where needed. It will for those few actions fully superimpose the ->id property on your entities with the uuid value.
This also goes for nested relations you fetch as long as they have the Expose behavior attached.

Wherever superimposition is a bit too much magic or limiting in terms of more complex cases you can still do the manual exposure, of course.
That’s it, it should all be working smoothly now.

Taking it a step further: Shortening

Now we have all this in place. But handling 32-char-long dash-separated strings is often a bit cumbersome.
What if we could shorten the UUIDs to something more concise?
I found out that UUIDs (length 36) can actually be safely transformed into a 22-char string.
The default shortener the plugin provides can do this out of the box. It can be used through a custom database type (ShortUuidType) which extends the binary one.

In this case, you need to specify the type-map in your bootstrap:

use Cake\Database\Type;

Type::map('binaryuuid', \Expose\Database\Type\ShortUuidType::class);

A UUID stored in the DB as 4e52c919-513e-4562-9248-7dd612c6c1ca would then be displayed and
used as fpfyRTmt6XeE9ehEKZ5LwF for example.
This can be a bit more user-friendly as it is shorter and selectable with a double click in most browsers and apps.

You can select a different converter than the default Short one or write your own one on plugin or app level using the given interface.
See the docs for details.

Summary

We found a great and fast way of adding exposure on top of an existing and simple DB schema.
It does not necessarily have to be a "random UUID v4" or alike, it can also be any other unique enough identifier.
Easy enough to adjust the behavior and underlying database Type classes to what you want to use.

Let me know what kind of UUID alternatives you have in mind in the comment section.

4.75 avg. rating (93% score) - 4 votes

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.