Centralize Your Query Logic!

At a talk I gave earlier this month, an audience member asked if Massive supported joining information from multiple tables together. It's come up on the issue tracker before as well. Massive does not currently have this functionality, and while I'm open to suggestions it's not on my own radar.

The central reason for this is that join logic can be tricky to manage from the application architecture side. The ability to correlate and combine what you need when you need it is certainly powerful, but it also embeds assumptions about your database layout in client code. As the database and application evolve, these assumptions can easily fall out of date and out of sync with each other. In real terms, if your application's "model" (whether implicit or explicit) of a user loaded from the database includes only the user record itself sometimes, but other times looks for information in a separate profile table, adds current statistics, et cetera, and you have functionality that operates on A User, either you understand that users come in different shapes and handle them accordingly across the board or you are living on borrowed uptime.

Some application architectures approach this scenario by grouping the query logic together. In the enterprise world, n-tier applications frequently pull related queries into "services" or Data Access Objects (DAOs) so there's at least some kind of organizational schema. This reduces the maintenance overhead somewhat, but it's an imperfect solution, not least because there's nothing but fallible code reviews (if that) standing in the way of someone dropping data access code somewhere else.

Fortunately, there's already part of the application-database ecosystem dedicated to organizing things -- the database itself! And as an organizing principle, it already has its own way to manage complex queries. Sure, it'll involve writing a little SQL, but let's face it: you were going to wind up writing SQL eventually anyway.

If you've only scratched the surface of working with databases, you might not be familiar with views. The good news is they're pretty straightforward: a view is a stored SQL query with a name, given life with the statement CREATE VIEW myview AS SELECT.... You can SELECT from a view just like you can a table, optionally with JOINs and a WHERE clause and all the other trimmings, whereupon the database executes the query. Results are not stored so the information you get out of a view is always current, unless you intentionally sacrifice realtime data for speed by creating a materialized view which does persist results and has to be manually refreshed.

The reason views are underrated and underutilized in application development has mostly to do with the frameworks developers use to communicate with databases. When you have to provide a concrete implementation of a unary User model, odds are you only care about things you can both read and write to, so you back it up with tables instead of using views to shape data for your needs. There's little room for views in object/relational mapping, and when I've had to use O/RMs I've really only been able to take advantage of views to streamline the raw SQL queries you have to write anyway when you use O/RMs.

If you're not stuck with an object-relational mapper, though, you can really get your money's worth out of views! Retrieving user records from a view, or building more complex user-inclusive results by joining it into other views, ensures that you have a consistent definition of what information comprises a user built into your database. You can't always stop other developers from winging it, naturally, but having that central definition to point to eliminates at least one major potential ambiguity. Massive's omission of the join feature encourages developers using it to center their thinking on the database and the tools it offers for organizing information.

As with anything, there are tradeoffs. Here, it's flexibility. Views may be ephemeral stored queries, but they're still part of the database schema for all that, and the schema takes more planning and effort to change than does application code. But it's a good idea to be thinking carefully about this stuff in the first place.