Friday, November 9, 2018

A Tale of Two JSON Implementations - MySQL and MariaDB

JSON has proven to be a very import data format with immense popularity. A good part of my time for the last two or so years has been dedicated to this area and I even wrote a book on the subject.  This is a comparison of the implementations of handling JSON data in MySQL and MariaDB. I had requests from the community and customers for this evaluation.


JSON Data Types Are Not All Equal


MySQL added a JSON data type in version 5.7 and it has proven to be very popular.  MariaDB has  JSON support  version 10.0.16 but is actually an alias to a longtext data type so that statement based replication from MySQL to MariaDB is possible.

MySQL stores  JSON documents are converted to an internal format that permits quick read access to document elements.  The binary format is structured to enable the server to look up sub-objects or nested values directly by key or array index without reading all values before or after them in the document.  From a practical standpoint the big thing most people notice is that the data is alphabetized.

MariaDB does no such optimization and their documentation states the claim that the performance is at least equivalent.

JSON Tables

The first comparison is 'how hard is it to put non-JSON data into a JSON column?'  The standard (see below under 'extra'), by the way, states that the data must be in the UTF8MB4 character set. And what is going on 'underneath the cover'?

The first step is to create tables to hold JSON data.


MySQL:

create table t1 (j json);
Query OK, 0 rows affected (0.1826 sec)

 MySQL  localhost:33060+ ssl  json  SQL > show create table t1;
+-------+----------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                   |
+-------+----------------------------------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `j` json DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+----------------------------------------------------------------------------------------------------------------+
1 row in set (0.0149 sec)

 MySQL  localhost:33060+ ssl  json  SQL > desc t1;
+-------+------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------+------+-----+---------+-------+
| j     | json | YES  |     | NULL    |       |
+-------+------+------+-----+---------+-------+
1 row in set (0.0028 sec)


MariaDB:

MariaDB [json]> create table t2 (j json);
Query OK, 0 rows affected (0.046 sec)

MariaDB [json]> show create table t2;
+-------+----------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                     |
+-------+----------------------------------------------------------------------------------------------------------------------------------+
| t2    | CREATE TABLE `t2` (
 `j` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+----------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.000 sec)

MariaDB [json]> desc t2;
+-------+----------+------+-----+---------+-------+
| Field | Type     | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| j     | longtext | YES  | | NULL |      |
+-------+----------+------+-----+---------+-------+
1 row in set (0.001 sec)

MariaDB [json]>





Conclusion: Both use UTF8MB4 and the underlying tables are roughly equivalent.

Checking Constraints

Ensuring that only JSON gets into the JSON columns is essential. MySQL does this by default for the JSON data type but MariaDB does not.

MySQL:

insert into t1 values('junk');

ERROR: 3140: Invalid JSON text: "Invalid value." at position 0 in value for column 't1.j'.

MariaDB:



MariaDB [json]> insert into t2 values ('junk');

Opps! We now have a NON-JSON value in the table.

To be fair, MariaDB does let you establish a constraint check on the column to avoid this issue.


MariaDB [json]> drop table t2;
Query OK, 0 rows affected (0.046 sec)
MariaDB [json]> create table t2 (j json, check (json_valid(j)));
Query OK, 0 rows affected (0.049 sec)


MariaDB [json]> insert into t2 values ('junk');
ERROR 4025 (23000): CONSTRAINT `CONSTRAINT_1` failed for `json`.`t2`

Conclusion: MySQL does as advertised by rejecting non-JSON data by default while MariaDB can do so with a little extra work.

Loading Data


The only major complaint about JSON data is that there is just so much of it. So having a way to bulk load is important.

MySQL:

MySQL's new shell (mysqlsh) has Python, JavaScipt, and SQL modes.  It is very easy to use either the Python or JavaScript modes to write a quick script to read bulk data sets line by line.  Giuseppe 'Datacharmer' Maxia has a great example of converting data from MongoDB to MySQL using the shell and I have used that example code extensively in the past. But now 8.0.13 has a bulk loader built into the shell.
The New MySQL Shell's utility to bulk load JSON

This utility functions allows JSON data to be stores in a JSON Document Collection (MySQL Document Store) or in a JSON column of s structured table. Very handy.


MariaDB:

Maria does not have an equivalent to the new MySQL Shell nor does it have a bulk loader utility for JSON data. 

I tired to use the Connect Storage Engine (not installed by default) and follow the examples on the CONNECT JSON Table Type page without luck. The Connect engine is supposed to have the capability to auto discover a table structure and define the table itself.  Mea Cupla for not getting this to work and I would encourage those who do use this feature to send me pointers PLEASE! But after several hours and tedious attempts to follow the examples exactly it was time to move on to something else.

Conclusion: MySQL does better loading data, especially in bulk.

JSON Functions


Both databases have functions and there are some differences.


  1. Functions That Create JSON Values
    1. JSON_ARRAY, JSON_OBJECT and JSON_QUOTE  are found in both and work the same.
  2. Functions That Search JSON Values
    1. JSON_CONTAINS, JSON_CONTAINS_PATH, JSON_EXTRACT, JSON_KEYS, and JSON_SEARCH are found in both and work the same.  However only MySQL has the -> and ->>  shortcuts for JSON_EXTRACT and JSON_UNQUOTE(JSON_EXTRACT)).. 
  3. Functions That Modify JSON Values
    1. JSON_ARRAY_APPEND, JSON_ARRAY_INSERT, JSON_INSERT, JSON_REMOVE, JSON_REPLACE, JSON_SET, and JSON_UNQUOTE are in both and work the same.  
    2. MariaDB has JSON_MERGE which has been deprecated in MYSQL 8.0.3 and replaced with JSON_MERGE_PRESERVE & JSON_MERGE_PATCH.   MySQL 8.0 supports the JSON Merge Patch format defined in RFC 7396 function.
  4. Functions That Return JSON Value Attributes
    1. JSON_DEPTH, JSON_LENGTH, JSON_TYPE, and JSON_VALID are found in both and work the same.
  5. Table Functions
    1. JSON_TABLE which allows you to use unstructured JSON data in a structured temporary table is in MySQL only.
  6. JSON Utility Functions
    1. JSON_PRETTY, JSON_STORAGE_FREE, and JSON_STORAGE_SIZE are only in MySQL.
  7. Other Functions
    1. JSON_ARRAYAGG and JSON_OBJECTAGG are only in MySQL and are very handy for turning non JSON data into JSON.
    2. JSON_VALUE and JSON_QUERY are MariaDB specific and return an object/array or a scalar respectively.

Conclusion:  I tested both database's functions and found they worked as advertised.  JSON_PRETTY is much missed by my eyes when dealing with documents with several levels of embedding with MariaDB.  The merging functions are richer for MySQL especially for those who need to follow the merge patch standard. And JSON_TABLE is a valuable tool when trying to treat unstructured data as structured data temporarily especially combined with CTEs and Windowing Functions that were introduced in MySQL 8.   

Updating Data


Updating data can be expensive and JSON data can be expansive.

MySQL:

MySQL 5.7 required a complete re-write of the document.  If this is something you do a lot then you need to consider upgrading to MySQL 8.

MySQL 8.0's  the optimizer can perform a partial, in-place update of a JSON column instead of removing the old document and writing the new document in its entirety to the column. 
Replication. But there are conditions to this: 1) It has to be a JSON column, 2) The UPDATE statement uses any of the three functions JSON_SET(), JSON_REPLACE(), or JSON_REMOVE() to update the column but a direct set like UPDATE mytable SET jcol = '{"a": 10, "b": 25'}) does not work, 3) The input column and the target column must be the same column, 4) All changes replace existing array or object values with new ones, and do not add any new elements to the parent object or array, and 5) The value being replaced must be at least as large as the replacement value. In other words, the new value cannot be any larger than the old one (An exception to this requirement occurs when a previous partial update has left sufficient space for the larger value. You can use the function JSON_STORAGE_FREE() see how much space has been freed by any partial update). If you can follow those rules the optimizer will do partial rewrites.

MariaDB:

I searched the MariaDB docs and Jira but found no mention of partial JSON column updates.  If anyone has links, please send them to me. So it appears that MariaDB does a full rewrite.

Conclusion:  MySQL is more efficient here.

Replication 

Efficient replication is a must and goes double for JSON with the potential of very large document payloads having to cross network connections.

MySQL:

In MySQL 5.7 an update to a JSON column was written to the binary log as the complete document. In MySQL 8.0, it is possible to log partial updates to JSON documents.   In statement based replication JSON partial updates are always logged as partial updates. 

But in row based replication they are logged as complete documents. To enable the logging of partial updates, set binlog_row_value_options=PARTIAL_JSON. Please note that the replication master has this variable set, partial updates received from that master are handled and applied by a replication slave regardless of the slave's own setting for the variable.


MariaDB:

I searched the MariaDB docs and Jira but found no mention of partial JSON Replication updates.  If anyone has links, please send them to me. So expect the full document to be replicated.

Conclusion: MySQL can be more efficient.

X DevAPI / Document Store

Only MySQL has the Document Store and the underlying Document Store and it is proving to be very popular with customers.  The ability to use a database as a NoSQL Document Store and a relational database is popular.  Not having embedded strings in their code and having the IDEs help is driving developers to this approach. 

MariaDB's CTO said at Zendcon that they will support if customers demand.

Conclusion:  MySQL is the sole choice here.


Overall Conclusion

I have been using MySQL's JSON data type since the 5.7 DMRs and know them well.  MariaDB's implementation seems very familiar and works as expected.  MySQL is superior in the partial updates of data and replication, functions such as JSON_TABLE, the X DevAPI, and bulk loading of data. 

Extra


The IETF's standard is here and makes pretty quick reading.