Which format use to store dates in database with MySQL / MariaDB?

This question may seem weird at first, but depending on the use case, the answer is not necessarily so obvious that it may appear. In this article I will talk mostly about MySQL but the principles can apply to any database. I will also speak, for the sake of simplicity, only cases we want to store a date and time, the case of a single date being only a simplified version of it.

The most obvious answer: Datetime

Instinctively, until a few years ago, I was using the datetime type without asking myself a question. Indeed, it is very easy to use, its format is very similar to that of PHP and it allows many manipulations (I advise you besides the two other articles on the manipulation of date in PHP and SQL).
But, looking at it more closely, the datetime field is just a string, and all data for a date is not populated.Indeed, without the context of the recording, when one retrieves a datetime in base, there is no way of knowing in what timezone is this date.
And even if we know the timezone (from another field for example), if it must be displayed in another timezone, it is not necessarily easy handling.

The fake friend: Timestamp

One might think that the timestamp type is the solution, but in fact its name is misleading. The simplest way to explain it is to show how it works by example:

MariaDB [test]> CREATE TABLE `Test` (
    ->   `id` int(11) UNSIGNED NOT NULL AUTO_INCREMENT,
    ->   `timestamp` timestamp NOT NULL,
    ->   PRIMARY KEY (`id`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.11 sec)
 
 
MariaDB [test]> INSERT Test (timestamp) VALUES (1515359139);
Query OK, 1 row affected, 1 warning (0.01 sec)
 
MariaDB [test]> INSERT Test (timestamp) VALUES ('2018-01-07 22:10');
Query OK, 1 row affected, 1 warning (0.02 sec)
 
MariaDB [test]> SELECT * FROM Test;
+----+---------------------+
| id | timestamp           |
+----+---------------------+
|  1 | 0000-00-00 00:00:00 |
|  2 | 2018-01-07 22:10:00 |
+----+---------------------+
3 rows in set (0.00 sec)

As you can see, despite its name, the timestamp field behaves just like the datetime field. It does not take a timestamp as input and does not return a timestamp. In fact, reading the documentation, we realize that the difference between datetime and timestamp lies in how to store the data internally in SQL. In the case of the timestamp, MySQL converts the date to timestamp for storage and retranslates it when the data is requested. For this, MySQL uses its current timezone (variable time_zone). This means that if the server configuration is changed between the record and the search, the date value will be changed.
Moreover, in the end, this format also does not allow to store information on the recording timezone.

Another problem of this format is that it is limited to the size of the int, which means that you can not store a date beyond 2038.

MariaDB [test]> INSERT Test (timestamp) VALUES ('2040-01-01 10:10');
Query OK, 1 row affected (0.01 sec)
 
MariaDB [test]> SELECT * FROM Test;
+----+---------------------+
| id | timestamp           |
+----+---------------------+
|  3 | 0000-00-00 00:00:00 |
+----+---------------------+
1 rows in set (0.00 sec)

Alternative: timestamp as an integer

One solution is not to ask your data storage system to be smart. After all, its purpose is to store our information reliably, possibly ACID but not to guess in what timezone are our users.
The idea is to store our dates as a timestamp in a simple integer field and it is up to the application to manage this date. In fact we use a big int, otherwise the next bug of the year 2000 will be the bug of "2038-01-09 03:14:07" (max value of int in UTC).
And to have a readable value during an SQL query, just use the FROM_UNIXTIME function.

MariaDB [test]> CREATE TABLE `Test` (
    ->   `id` bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT,
    ->   `timestamp` bigint UNSIGNED NOT NULL,
    ->   PRIMARY KEY (`id`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.07 sec)
 
MariaDB [test]> INSERT Test (timestamp) VALUES (1515359139);
Query OK, 1 row affected (0.02 sec)
 
MariaDB [test]> INSERT Test (timestamp) VALUES (UNIX_TIMESTAMP());
Query OK, 1 row affected (0.01 sec)
 
MariaDB [test]> INSERT Test (timestamp) VALUES (253402210800);
Query OK, 1 row affected (0.01 sec)
 
MariaDB [test]> SELECT timestamp FROM Test;
+----------------+
| timestamp      |
+----------------+
|     1515359139 |
|     1515362870 |
|   253402210800 |
+-----------------+
3 rows in set (0.00 sec)
 
MariaDB [test]> SELECT from_unixtime(timestamp) FROM Test;
+--------------------------+
| from_unixtime(timestamp) |
+--------------------------+
| 2018-01-07 22:05:39      |
| 2018-01-07 23:07:50      |
| NULL                     |
+--------------------------+
3 rows in set (0.00 sec)

Be careful, however, the functions on MySQL dates do not work for values larger than 32bits.
Fortunately our favorite language (PHP) is doing better.

$ php -r "echo (new DateTime('@253402210800'))->format('Y-m-d');"
9999-12-31

Advantages

This system allows the management of the display format to the application front and have a rather silly database, because ultimately its role is to store our data, not to have business intelligence.

A simple example of a datetime display problem is the management of comments on this blog. If someone on the island of Reunion publishes a message say January 27 at 1 am (insomniac), records '2018-01-27 01:00:00', in France it is only 22h, if I reply quickly to his comment, my answer is save to '2018-01-26 22:30:00'. So the answer will be displayed before the question! You tell me that if my site does not handle timezone and is still in France the order will be good, but my reader will see his message written a day in advance, which is not great either.

If all the dates are recorded in base in timestamp, the logic of the display is done front side according to the location of the user and everybody is happy.

Disadvantages

If you are used to letting your DBMS fill in your creation date fields itself, this is not possible by saving in an integer field. Indeed, the default current_timestamp only works for datetime and timestamp fields.

In addition, depending on the type of data you are recording, sometimes the display of a date does not depend on the location of the user but on the data itself. In this kind of case you have to trick a little.

For example, consider an event site that advertises different shows, concerts, or festivals.
The difference from the previous example is time, the schedule only makes sense in relation to the place where it takes place and is no longer linked to the user.
In this situation the timezone of the date depends on the data inserted and is therefore fixed for each event. The easiest way is to store the timestamp and the timezone in two separate columns so that you can get a complete date with all the information.
What would give in base:

MariaDB [test]> CREATE TABLE event(
    ->   `id` int(11) UNSIGNED NOT NULL AUTO_INCREMENT,
    ->   `name` varchar(100) NOT NULL,
    ->   `place` varchar(100) NOT NULL,
    ->   `date_ts` bigint(20) UNSIGNED NOT NULL,
    ->   `tz` varchar(50) NOT NULL,
    ->   PRIMARY KEY (`id`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.01 sec)
 
MariaDB [test]> INSERT event (name, place, date_ts, tz) VALUES ('Metallica', 'Accor Hotel Arena, Paris', 1504800000, 'Europe/Paris');
Query OK, 1 row affected (0.00 sec)
 
MariaDB [test]> SELECT * FROM event;
+----+-----------+--------------------------+------------+--------------+
| id | name      | place                    | date_ts    | tz           |
+----+-----------+--------------------------+------------+--------------+
|  1 | Metallica | Accor Hotel Arena, Paris | 1504800000 | Europe/Paris |
+----+-----------+--------------------------+------------+--------------+
1 row in set (0.00 sec)
 

And when recovering the data:

<?php
 
$conn = new PDO('mysql:dbname=test;host=127.0.0.1', 'root', null);
$sql =  'SELECT name, place, date_ts, tz FROM event LIMIT 1';
foreach  ($conn->query($sql) as $row) {
    $date = new DateTime('@'.$row['date_ts']); // if the first parameter is a timestamp, the second parameter is ignored
    $date->setTimezone(new DateTimeZone($row['tz']));
    var_dump($date);
}
/*
class DateTime#4 (3) {
  public $date =>
  string(26) "2017-09-07 18:00:00.000000"
  public $timezone_type =>
  int(3)
  public $timezone =>
  string(12) "Europe/Paris"
}
*/

Other ideas: strings

As long as we are using primary field types to store our dates, why not just store it as a string.

Timestamp in a varchar

As the integer, by its size, limits the timestamp that can be recorded, we can say that the store in a varchar would remove this limit.
But the bigint allows to go further than the DateTime (9999-12-31), which is more than enough. By then, computing will have evolved, making these date format issues obsolete.
Then, the varchar format is more greedy in memory. A bigint is stored on 4 bits while the varchar uses as many bits as the number of characters plus 1, for example 21bits for a varchar (20).
This format makes it possible to use the native MySQL functions as from_unixtime, however, I have some doubts about the performance of the sorting or looking for a date greater or less than another (which is very powerful with integers).
In the end this solution is very close and more greedy than that with integers and therefore has little interest.

Complete DateTime in a varchar

To solve the problem of lack of timezone information in a DateTime field, we could also store the date in ISO 8601 (1997-07-16T19: 20: 30.45 + 01: 00) format or another standard in a varchar field .Note, however, if the ISO 8601 format allows you to sort by date, because the lexicographic ordering is in chronological order, the selection of lines by date is not possible.
We must know its use cases before considering this solution.

MariaDB [test]> CREATE TABLE `test` (
    ->  `id` int(11) UNSIGNED NOT NULL AUTO_INCREMENT,
    ->  `timestamp` varchar(50) NOT NULL,
    ->  PRIMARY KEY (`id`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.01 sec)
 
MariaDB [test]> INSERT test (timestamp) VALUES ('1997-07-16T19:20:30.45+01:00');
Query OK, 1 row affected (0.01 sec)
 
MariaDB [test]> INSERT test (timestamp) VALUES ('2004-02-12T15:19:21+00:00');
Query OK, 1 row affected (0.00 sec)
 
MariaDB [test]> INSERT test (timestamp) VALUES ('2004-02-12T15:19:00-05:00');
Query OK, 1 row affected (0.00 sec)
 
MariaDB [test]> INSERT test (timestamp) VALUES ('2004-02-12T15:19:21+01:00');
Query OK, 1 row affected (0.01 sec)
 
MariaDB [test]> select * from test order by timestamp;
+----+------------------------------+
| id | timestamp                    |
+----+------------------------------+
|  1 | 1997-07-16T19:20:30.45+01:00 |
|  3 | 2004-02-12T15:19:00-05:00    |
|  2 | 2004-02-12T15:19:21+00:00    |
|  4 | 2004-02-12T15:19:21+01:00    |
+----+------------------------------+
3 rows in set (0.00 sec)
 
MariaDB [test]> select * from test where timestamp > '2004-02-12T15:19';
+----+---------------------------+
| id | timestamp                 |
+----+---------------------------+
|  2 | 2004-02-12T15:19:21+00:00 |
|  3 | 2004-02-12T15:19:00-05:00 |
|  4 | 2004-02-12T15:19:21+01:00 |
+----+---------------------------+
2 rows in set (0.00 sec)

Conclusion

I hope this post helped you to see a little more clearly and to identify some particular points.
Finally, I do not think there is a perfect solution and it depends primarily on your use cases.
But if I can only advise one thing, it's to decide on a format and stick to it for your entire base. There is nothing worse to manage and use than a non-homogeneous database in these types of fields or in its nomenclature (and I know what I'm talking about).

Add a comment