Thursday, June 27, 2019

The '$' As The JSON Document

Recently on Stackoverflow was a question about the handling of  a JSON document stored in a MySQL Database. The data looked like this:

[{"name":"cdennett","address":"123 street","Postcode":"ABCDE"}]

The data above is valid JSON. That data is in an array because it is surrounded by []s while objects are surrounded by {}s. And the author was trying to use the following JSON_TABLE function

SELECT people.* 
FROM t1, 
     JSON_TABLE(json_col, '$.people[*]' 
COLUMNS (
                name VARCHAR(40)  PATH '$.name',
                address VARCHAR(100) PATH '$.address')
     ) people;

Some you who have been using the JSON data type are probably smiling at the code above.It is a simple problem caused by confusion about the path of the JSON document. The problem is the way the data is referenced. Or to put it another way the path to the document is wrong. And, sadly, you probably only run into this after getting confused and having to learn the proper way to look at the path of a JSON document.

'$' is Your Document

The '$' character refers to the entire document. Is we use JSON_EXTRACT or the arrow operator it is easy to retrieve the entire document. 

 select json_extract(json_col,'$') from t1;
+----------------------------------------------------------------------+
| json_extract(json_col,'$')                                           |
+----------------------------------------------------------------------+
| [{"name": "cdennett", "address": "123 street", "Postcode": "ABCDE"}] |
+----------------------------------------------------------------------+

Or we could use the shortcut method  select json_col->"$" from t1; to get the same results.

Where this gets confusing is if you use a function like JSON_KEYS like such:

select JSON_KEYS(json_col) from t1x;
+---------------------+
| JSON_KEYS(json_col) |
+---------------------+
| NULL                |
+---------------------+

Why did we get a NULL and not the same output as from the JSON_EXTRACT?  Well, JSON_KEYS will return null if a) any argument is NULL, the document in question is not an object, or a path.  And an array is not an object.

But other functions are not bothered by the fact that the document is not an object.

select json_pretty(json_col) from t1;
+------------------------------------------------------------------------------------------+
| json_pretty(json_col)                                                                    |
+------------------------------------------------------------------------------------------+
| [
  {
    "name": "cdennett",
    "address": "123 street",
    "Postcode": "ABCDE"
  }
] |
+------------------------------------------------------------------------------------------+

So if '$' is [{"name":"cdennett","address":"123 street","Postcode":"ABCDE"}] how do we 'peel off one onion layer' to get to the {"name":"cdennett","address":"123 street","Postcode":"ABCDE"}?

The contents of the first array is denoted as $[0].

'$' is [{"name":"cdennett","address":"123 street","Postcode":"ABCDE"}]
and
'$[0] is {"name":"cdennett","address":"123 street","Postcode":"ABCDE"}
and
'$[0].name is "cdbennet"

So if we refer back to the code snippet from Stackoverflow, it becomes evident that the path was certainly not '$.people[*]' but '$[*]' or '$[0]'.

Conclusions

So we end up with two conclusions. First is that to remember that '$' refers to the entire document and walking down the document of the structure means walking down a path that starts at '$'. And second, you might want to consider not burring things in a top level array.