Producing Data and Schemas in JSON array-of-array format.

JSON was initially designed for the informal transfer of data that has no schema. It has no concept of a table, or of an array of identical arrays. This means that it must tell you each key for each object, even if the original data object was a table. With the happy minimum of key-value pairs, the JSON document that is produced will produce tabular data that tells you the column to which the data object belongs, and it will do a reasonable job of implying a data type. Sadly, that just isn’t enough for us because JSON only recognises just four base data types; string, number, Boolean and null. In SQL Server, we’re used to a lot more than that. BSON and other extensions supplement this with functions that coerce the string into the correct datatype, but we don’t need to do that because all of the data for any one column will be under the same constraints. In fact, there is a lot of redundant data in a JSON document produced by SQL Server that represents a table or result. Not only that, but there isn’t enough of the right sort of metadata until we add a JSON Schema.

CSV provides the most economical way of transferring tabular data as an ASCII file, but SQL Server doesn’t always support it properly to the rfc4180 standard. However, JSON can transfer tabular data in a way that is almost as economical in space, and more reliably. Here, just to convince you that it is compact, are the first three records of Adventureworks humanResources.employee table in array-of-array JSON.

It is valid JSON (RFC 4627). It isn’t the usual way of storing table data, which is much more verbose, being an array of JSON objects of key-value pairs. We are no longer bound to do this the conventional way because we can now transfer the metadata with the data. The JSON Schema tells you how it is stored.

JSON can be persuaded into this array-of-array format, and if we can read and write it in SQL Server, then we can use it. To prove this, we need to be able to save a database in this format, and to save the schema in this format.

In this article, I’ll demonstrate how to produce an array-in-array JSON document, and the schema to go with it. With this, you have sufficient information to make it easy to transfer such data.

JSON Arrays to Relational Table

Before we turn to the task of producing this sort of data from a table or expression we ought to show to turn the specimen json document that I’ve just shown you back into a relational table. We’ll start by doing the task manually, without a schema, as if it were CSV. You might think I’m being ironic when you first see the code that performs this action, but no. We can then go on to generate all the laborious stuff automatically:

Let’s shred our sample JSON. We have two alternatives. We can do it the JSON_Value way, referencing array elements within the row rather than key/value pairs.

 

You can make sure that it is returning all the data in the correct format, even the pesky hierarchy ID, by doing a SELECT INTO. There is a big, big, problem here, though, that they don’t warn you about. JSON_Value has a maximum of 4000 for a string.

For data of any size, we will need to use the second alternative, OPENJSON. This has the rather neater Explicit Schema syntax. Note that, in our rendition of the explicit schema used by OpenJSON, we need to reference array elements rather than key/value pairs

Unfortunately, in this ‘explicit schema’ format, they haven’t yet got around to supporting CLR types so you can’t create a HierarchyID from JSON this way. Instead, you need to specify the CLR type as being NVARCHAR and coerce it into its CLR type in the result, thus with the OrganizationNode in the HumanResources.Employee table ….

This works fine but complicates the code. If you are inserting into a table, you can rely on implicit coercion of the datatype to convert the NVARCHAR into a hierarchyid or geography.

As well as getting this from the JSON Schema if it is available, we can get the explicit schema as well as the JSON_Value() column list. We can use either the schema or the sys.dm_exec_describe_first_result_set directly. The latter approach allows you to create a JSON Schema from any expression, which greatly extends the usefulness of this approach. if you already have the table that matches the JSON data, you merely specify a SELECT * from your table ( we used Humanresources.Employee) and the result contains your spec which you can then paste into the browser pane. This time, just so it is a bit more obvious what’s going on, I’ll do it without the aggregation of the lines. It is a bit more manual because you need to cut and paste the result, and don’t forget to nick out that last comma.

That select statement can be derived like this …

Relational table to JSON Array

You can store tables or results from SQL Server in this economical format, though the process of generating the data isn’t so pretty. FOR JSON doesn’t support this directly, which is sad. Somehow, I was hoping for FOR JSON RAW.

Just as with CSV, the array-within-array format is only valuable if both ends of the data transfer are aware of enough of the schema/metadata to transform the document into a table. We’ll go into the detail of how we do this with JSON Schema later. The first task is to squeeze the array-in-array format from the somewhat reluctant OpenJSON function.

To get the JSON in array-of-array format from a particular table, and we’ve chosen adventureworks2016.person.person, you do this in SQL Server 2017. I’ve used the String_Agg() aggregation function, but you can do it as easily, but more messily, via the XML trick if you are on SQL Server 2016. Here is a simplified version of the way that we’ll do it

Note that we need to specify ‘INCLUDE_NULL_VALUES’ in that OpenJSON expression. This is because we need all the columns to be converted in that array, even if they are null in some rows. If we don’t we get the array in the right order but with maybe one or more fields missing, but without being able to know which!

This is much faster than the old way we did the conversion to JSON, using XML.

Whereas the XML version unfortunately shares a vice with the JSON version,  because it also dislikes tables with CLR Datatypes and ‘errors out’.

This is a shame. Where there are no CLR types, it is possible to save the entire contents of a database with a routine like this, which is very satisfying. Note we have a hard-wired directory path which we ought to avoid. You’ll need to alter that to a suitable server directory if you want to try this out.

Instead of this, We’ll add supporting temporary procedure to do the difficult bits that I’ve already illustrated

Now this isn’t fast: it is two minutes rather than eighteen seconds to dump out Adventureworks in native mode or twenty-one seconds in tab-delimited mode. However we want JSON, especially as we can validate it and distinguish between blank strings and nulls, and attach a schema to the table

Adding a schema.

The schema that accompanies this is reasonably simple to generate, Here, just to illustrate how we do it, is a batch that does it for the query ‘SELECT * FROM adventureworks2016.HumanResources.Employee‘.

This turns out a JSON Schema that, nicely formatted, looks like this

Now, if you send the JSON schema with the JSON, either within the same JSON document or separately, you can create the OpenJSON explicit schema from the values in the JSON Schema. Let’s do this manually, just to show the ‘man behind the curtain’.

Now so far, we’ve Managed to create a schema for a single table. We need to automate this now to do any table. Let’s create this now and try it out

We can now try this out by writing the schemas of all the tables in AdventureWorks.

It takes 11 seconds to do them all on my machine. You aren’t limited to tables, you can do any SQL Query.

Conclusions

We seem to have used a lot of SQL to achieve our ends. However, we now have data that we can validate outside the database, share with JSON-friendly applications or import into JSON-savvy databases. We have a version of a JSON tabular document that is economical in storage.

The next stage is to use it to build a database. I’ve described elsewhere how to do it with the more conventional Object-within-array JSON document and schema but not in array-in-array JSON. That’s next.

SourceCode

The source to this article and various blogs on the topic of importing, validating and exporting both JSON Schema and data in SQL Server  is on github here