Adventures In Foreign Keys 1: Setting Up Foreign Keys in Stack Overflow

In A Foreign Key, In A Foreign Table

Much of what people want from foreign keys, like referential integrity and join elimination, are only as guaranteed as much as SQL Server can trust your constraints (and even then…). The same goes for check constraints, too.

Thankfully, things like Primary Keys and Unique Constraints are sort of self-trusting, in that there’s really no way to add them without checking the underlying data. Sure, you can ignore dupe keys (if SQLblog is down again, try the Wayback Machine version), but that’s a different sort of trick.

This week, I’ve got 5 posts about foreign keys to look at various aspects of implementing them, and their behavior once they’re in place. If you think I missed something, or didn’t adequately cover it, hang tight. It’ll probably show up eventually.

Pretty much maybe.

Bookworm

When you read books about relational design — Louis Davidson has an excellent one — the importance of things like this are emphasized quite a bit. If you’re starting an app from scratch, you’d be wise to heed the advice in them, especially if it’s an OLTP app. If you take care of it here, you don’t have to worry about enforcing it in your data warehouse.

If you ever want to feel poor design pain, try adding, enforcing, or loading large amounts of data into tables with foreign keys. While you do that, start learning a new language.

But what if you’ve got a database that’s been around for a while. Can you trust the data? Can you trust that no developer ever removed or disabled a constraint to “fix a problem” or “add some custom data” and then forgot to turn them back on for a week or two?

Didn’t think so.

Keying Time Had Come

Continuing to use Stack Overflow as a demo data set, let’s look at if the newest dump is foreign key-able.

Before we get started, I just want to mention that I don’t expect it to be. As far as I can tell, if a user deletes their account, their actions are mostly preserved.

In order to test our data, we’d be wise to create some indexes on important columns.

Next, let’s see what kind of nonsense we’ve got to clean up. We’re going to run some selects to get a count of rows in our “action” tables (Badges, Comments, Posts, Votes) to see which ones have orphaned Users or Posts. Since it’s long and repetitive, I’m going to put these at the end of the post. For now, let’s just look at the results.

Votes with no User sticks out, because that data is cleaned from the dump. If they published how people voted, it would be kind of terrible. Brent would find out what I really think about his Answers.

The rest are… Odd. I’m really not sure what Stack Overflow does  to remove User or Post actions when they’ve been deleted.

IRL

You’ll probably know what processes do (or don’t do) this kind of clean up work, which might make deciding what to do with dirty data easier.

You have a few options that I can think of quickly:

  1. Delete it
  2. Associate it with dummy Ids
  3. Move it to separate tables for reference

I’ll give you $100 if you guess which one I picked before I tell you that I deleted it. The delete scripts are at the end, too.

You were so close.

Deciding what to do with it (or if you’re going to do anything with it at all) is a business decision. It’s their data. If they decide to follow you on your crazy quest to implement referential integrity, it’s your job as a data person to work with them on the best way to end up with data that complies with your Utopian vision. If they want to keep it in some form, you have to respect that.

If they’re cool with deleting it, well..

Be Kind, Batchwind

Just read this post from Michael J. Swart about batching modifications.

Excellent Adventures

I’m going to do some more of these posts, and talk about the difficulties with implementing foreign keys, cascading actions, indexing, and join elimination.

Thanks for reading!

Scripts

Checking for bad data:

Deleting bad data:
Previous Post
[Video] Office Hours 2018/11/7 (With Transcriptions)
Next Post
Adventures In Foreign Keys 2: Common Foreign Key Errors

11 Comments. Leave new

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.