Storing Time Information: Timezone Required

Storing time information: Timezone required?

Whichever way you do it, it will fail in different ways depending on what is changing.

  1. If you store timestamps in the according timezone as 2013-12-29 12:34:56 America/New_York, this will fail if, say, the Bronx suddenly starts their own timezone America/New_York_Bronx with a different offset and your event happened to be in the Bronx.

    Decide how likely this is and how bad a failure would be.

  2. If you store timestamps in UTC and the timezone in which the event is happening is redefining their offset (e.g. shifting DST dates around, or entirely shifting to a different offset), the event time may differ from the actual wall clock time at that location. If you store 2013-12-29 12:34:56 UTC for an event at 13:34:56 in Berlin, Germany, and Berlin shifts their DST around, 2013-12-29 12:34:56 UTC may now correspond to 14:34:56 Berlin local time, while the event is still actually happening at 13:34 local time.

    Decide how likely this is and how bad a failure would be.

  3. If you store the UTC timestamp and link it to a physical location which you then link to a timezone, you can counteract both problems. But for this you'll have to store the precise physical location, not just "New York", otherwise you just have case 1. with one more intermediate step. If you do store the precise physical location and have a precise way to resolve this location to a timezone and you keep your timezone database up to date, you can handle pretty much all change scenarios.

    Decide how practical this is and how much this extra precision is worth to you.

Best practices with saving datetime & timezone info in database when data is dependant on datetime

Hugo's answer is mostly correct, but I'll add a few key points:

  • When you're storing the customer's time zone, do NOT store a numerical offset. As others have pointed out, the offset from UTC is only for a single point in time, and can easily change for DST and for other reasons. Instead, you should store a time zone identifier, preferably an IANA time zone identifier as a string, such as "America/Los_Angeles". Read more in the timezone tag wiki.

  • Your OrderDateTime field should absolutely represent the time in UTC. However, depending on your database platform, you have several choices for how to store this.

    • For example, if using Microsoft SQL Server, a good approach is to store the local time in a datetimeoffset column, which preserves the offset from UTC. Note that any index you create on that column will be based on the UTC equivalent, so you will get good query performance when doing your range query.

    • If using other database platforms, you may instead wish to store the UTC value in a timestamp field. Some databases also have timestamp with time zone, but understand that it doesn't mean it stores the time zone or offset, it just means that it can do conversions for you implicitly as you store and retrieve values. If you intend to always represent UTC, then often timestamp (without time zone) or just datetime is more appropriate.

  • Since either of the above methods will store a UTC time, you'll also need to consider how to perform operations that need an index of local time values. For example, you might need to create a daily report, based on the day of the user's time zone. For that, you'd need to group by the local date. If you try to compute that at query time from your UTC value, you'll end up scanning the entire table.

    A good approach to deal with this is to create a separate column for the local date (or perhaps even the local datetime depending on your needs, but not a datetimeoffset or timestamp). This could be a completely isolated column that you populate separately, or it could be a computed/calculated column based on your other column. Use this column in an index so you can filter or group by local date.

  • If you go for the computed-column approach, you'll need to know how to convert between time zones in the database. Some databases have a convert_tz function built-in that understands IANA time zone identifiers.

    If you're using Microsoft SQL Server, you can use the new AT TIME ZONE function in SQL 2016 and Azure SQL DB, but that only works with Microsoft time zone identifiers. To use IANA time zone identifiers, you'll need a third party solution, such as my SQL Server Time Zone Support project.

  • At query time, avoid using the BETWEEN statement. It is fully inclusive. It works ok for whole dates, but when you have time involved you're better off doing a half-open range query, such as:

    ... WHERE OrderDateTime >= @t1 AND OrderDateTime < @t2

    For example, if @t1 were the start of today, @t2 would be the start of tomorrow.

Regarding the scenario discussed in comments where the user's time zone has changed:

  • If you choose to calculate the local date in the database, the only scenario you need to worry about is if a location or business switches time zones without a "zone split" occurring. A zone split is when a new time zone identifier is introduced which covers the area that changed, including their old and new rules.

    For example, the latest zone added to the IANA tzdb at the time of writing this is America/Punta_Arenas, which was a zone split when the southern part of Chile decided to stay at UTC-3 when the rest of Chile (America/Santiago) went back to UTC-4 at the end of DST.

    However, if a minor locality on the border of two time zones decides to change which side they follow, and a zone split wasn't warranted, then you'd potentially be using the rules of their new time zone against their old data.

  • If you store the local date separately (computed in the application, not the DB), then you'll have no problems. The user changes their time zone to the new one, all old data is still intact, and new data is stored with the new time zone.

Handling timezones in storage?

You have to store in UTC - if you don't, your historic reporting and behaviour during things like Daylight Savings goes... funny. GMT is a local time, subject to Daylight Savings relative to UTC (which is not).

Presentation to users in different time-zones can be a real bastard if you're storing local time. It's easy to adjust to local if your raw data is in UTC - just add your user's offset and you're done!

Joel talked about this in one of the podcasts (in a round-about way) - he said to store your data in the highest resolution possible (search for 'fidelity'), because you can always munge it when it goes out again. That's why I say store it as UTC, as local time you need to adjust for anyone who's not in that timezone, and that's a lot of hard work. And you need to store whether, for example, daylight savings was in effect when you stored the time. Yuk.

Often in databases in the past I've stored two - UTC for sorting, local time for display. That way neither the user nor the computer get confused.

Now, as to display: Sure, you can do the "3 minutes ago" thing, but only if you store UTC - otherwise, data entered in different timezones is going to do things like display as "-4 hours ago", which will freak people out. If you're going to display an actual time, people love to have it in their local time - and if data's being entered in multiple timezones you can only do that with ease if you're storing UTC.

Is it always a good idea to store time in UTC or is this the case where storing in local time is better?

I think that in order to answer that question, we should think about the benefits of using UTC to store timestamps.

I personally think that the main benefit to that is that the time is always (mostly) guaranteed to be consistent. In other words, whenever the timezone is changed, or DST applied, you don't get back or forth in time. This is especially useful in filesystems, logs and so on. But is it necessary in your application?

Think of two things. Firstly, about the time of DST clock shift. Is it likely that your events are going to occur between 2 AM and 3 AM (on the day the clock shift is done)? What should happen then?

Secondly, will the application be subject to actual timezone changes? In other words, are you going to fly with it from London to Warsaw, and change your computer timezone appropriately? What should happen in that case?

If you answered no to both of those questions, then you're better with the local time. It will make your application simpler. But if you answered yes at least once, then I think you should give it more thinking.


And that was all about the database. The other thing is the time format used internally by the application, and that should depend on what actually you will be doing with that time.

You mentioned it exposing the time via an API. Will the application query the database on every request? If you store the time internally as UTC, you will either need to do that or otherwise ensure that on DST/timezone change the cached times will be adjusted/pruned.

Will it do anything with the time itself? Like printing the event will occur in 8 hours or suspending itself for circa that time? If yes, then UTC will probably be better. Of course, you need to think of all the forementioned issues.

Time zone storage in data type timestamp with time zone

This is just a misunderstanding stemming from the somewhat misleading type name. The time zone itself is not stored at all. It just acts as offset to compute a UTC timestamp (input), which is actually stored. Or as decorator in the display of a timestamp according to the current or given time zone (output). That's all according to the SQL standard.

Just the point in time is stored, no zone information. That's why 64 bit of information is enough. The timestamp is displayed to the client according to the current time zone setting of the session.

Details:

  • Ignoring time zones altogether in Rails and PostgreSQL

Also, since Jon mentioned it, time with time zone is defined in the SQL standard and thus implemented in Postgres, but its use is discouraged:

time with time zone is defined by the SQL standard, but the definition
exhibits properties which lead to questionable usefulness.

It's an inherently ambiguous type that cannot deal with DST properly.

What is the ideal way to store a timezones aware in a PostgreSQL database?

None of the above, use a timestamp with time zone data type. It can handle timezone info as either an offset, timezone name, or abbreviation.

How should I store data for events in different timezones?

If you mean that you are recording the time of events as they happen, or recording the times of events in the past, then paxdiablo's answer is correct. Usually, UTC will suffice for that. In a few cases, you might want to store a local datetime + offset (a "DateTimeOffset" in some platforms), but that depends on exactly what you're using the data for.

However, if you are asking about scheduling an event that will happen in the future, especially if it's a recurring event, then UTC is not entirely sufficient. I've already written about this several times, so I suggest you read these articles for details:

  • https://stackoverflow.com/a/20828804
  • https://stackoverflow.com/a/19627330
  • https://stackoverflow.com/a/19170823
  • https://serverfault.com/a/554761

Should I store UTC timestamps or localtime for shifts

First, understand that date-time is barely touched on by the SQL standard. Data types, definitions, and behavior vary widely across various database products.

/h3>

Fortunately for you, Postgres has exceptionally rich handling of date-time data types and date-time functions. But you must study the documentation carefully and experiment so you understand behavior.

07:00, regardless of daylight savings

Actually, that would not be regardless but respecting Daylight Saving Time (DST). DST is changing the meaning of 7 AM by sliding it later or earlier by an hour. Respecting DST correctly is the heart of your problem.

For the sake of other readers, a word about terminology: The “localtime” word in the context of the Question means a date and/or a time-of-day without regard for time zone or offset-from-UTC.

As such, a local-date-time does not represent a point on the timeline but rather a rough idea of possible points. In Auckland NZ, 7 AM arrives much earlier than in Paris FR. And in turn, 7 AM in Paris FR happens much sooner than 7 AM in Montréal CA. So without a time zone, a “local” value has no meaning. In Postgres, these “local” types are:

  • timestamp without time zone (date & time-of-day)
  • date (date only)
  • time without time zone (time-of-day only)

An actual moment on the timeline requires a zone or offset. Let’s call these “zoned” types, for lack of a better word. In Postgres that would be:

  • timestamp with time zone (date & time-of-day, adjusted to UTC)
  • time with time zone (time-of-day, adjusted to UTC)

Very important to understand that Postgres never saves any time zone information. Despite the “with time zone” name, the zone is not saved as part of your data. Instead, “with time zone” means “with respect for time zone” in that Postgres adjusts the input value to UTC.

  • For the zoned types, Postgres applies any specified zone/offset info in the input value to adjust from that zone/offset to UTC. The specified zone/offset info is then discarded.
  • In contrast, for the “local” types, any specified zone/offset info is ignored entirely.

Question 1: What format is recommended for storing the local timezone?

So, depending on your business needs and rules, you may want to separately record the zone/offset separately, in addition to the date-time value. Neither Postgres nor the SQL standard specify a data type for zone or offset. So I suggest storing as text.

  • Offset-from-UTC
    • Use the formats specified by the ISO 8601 standard for designators. These formats are primarily (a) Z for UTC (short for Zulu, means UTC) and (b) ±hh:mm where + means ahead of UTC (like India) and - (the MINUS character, or alternatively a HYPHEN-MINUS) means behind UTC (like the Americas).
    • While omitting the padded zero on the hour, and omitting the colon, are allowed by ISO 8601, I suggest you never do so. Many protocols and software implementations expect those bits and may break without them.
    • Example: For India, five and a half hours ahead of UTC, +05:30
  • Time zone
    • Use the formal name in format of continent/region defined by the IANA in the tzdata database, formerly known as the Olson database. See this recent list of zones.
    • Example: For India, Asia/Kolkata

More terminology: An offset-from-UTC is a number of hours and minutes and seconds ahead of, or behind, UTC. A time zone is an offset plus a set of rules for handling anomalies such as Daylight Saving Time (DST). So always better to use a time zone when you know it.

To answer the Question:
You need both “local” and “zoned” types for your solution.

To record the definition of a shift, you want a “local” time-of-day, the time without time zone. When you record that shift should normally start at 7 AM, you do not want Postgres to alter or adjust that value.

To record the concept of a particular shift, you would record (a) the start time as time without time zone, (b) the date as date. By applying an offset or zone, you can determine a UTC value. You may want to also/instead record the UTC value itself. But beware of doing so far into the future, as politicians everywhere are quite fond of redefining time zones with little advance notice.

To record the moment a work actually clocked-in, you want the UTC moment, the timestamp with time zone type. You can input a zoned value if need be, and Postgres will adjust into UTC. As mentioned in the Question, when working with actual moments on the timeline, it is almost always best to work and store data in UTC.

For both both of the types, you may or may not wish to additionally record the intended time zone as well.

A Postgres session has a default time zone. I suggest you never rely on that. Better to always specify the intended time zone in your SQL code and/or your input data. When manually perusing data, you may find it handy to set the session default to UTC or to a zone, but I would not do that in code.

Is there a way in which I can store the start/end times of shifts that allows me to write an SQL query that can compare those times with a supplied UTC time.

As mentioned above, you would record the general concept of a shift as a “local”. As in, “In 2015 the Dusseldorf and Detroit factories started at 6 AM while the Delhi factor started at 7 AM, but in 2016 all three factories start at 7 AM”. To record any one actual shift, record in UTC though you may also want to record the “local” values for readability by humans.

Should I just store the clock in/out times as Local to be the same as the shift start/end times?

No, no, certainly not. Any actual points on the time line, real moments, should be recorded in UTC. Use timestamp with time zone and let Postgres adjust inputs to UTC as needed. Though generally I suggest your app’s programming be in UTC beforehand.

Or should I do some magic and store everything as UTC times

No magic needed, just consistent handling of your data to always include time zone (or offset) data and adjust into UTC. Get this straight in your app programming as well as the database. For example, in Java, hand off objects rather than mere strings for date-time values. With JDBC 4.2 the database can exchange OffsetDateTime (and, optionally, Instant or ZonedDateTime) objects. (Avoid the troublesome java.util.Date & .Calendar classes, now legacy.)

Table of date-time types in Java (both legacy and modern) and in standard SQL

how to work out the current Local time start time of a shift if it is stored as "2:00 UTC"

If you have a date and a time-of-day in UTC, you can always apply a time zone (or offset) to see the “local” value.

Date-time handling is tricky slippery stuff. So think it through, give yourself time to learn, and practice, practice, practice. Tips: (a) Learn 24-hour time, (b) When at work programming, think in UTC, keep a UTC clock on your desk, and forget about your own personal local time zone. Thinking primarily in your own local time zone, and constantly translating back-and-forth with UTC, will drive you crazy and lead to errors.

Storing timezone information for a timestamp in Cassandra

We store millions of timestamps and their timezones in our Cassandra cluster per day.

Our recommended practice is to store timestamps as two values:

  1. Store the timestamp itself as a timestamp type or bigint and always align it to UTC (we use .NET datetimes, which start from year zero and not the UNIX epoch, so we have to use bigint.)
  2. Store the UTC offset as a bigint, in microseconds

That way all of our timestamps are, by default, using the same timezone across the board. However, whenever we need to do something in local time we can just fetch the UTC offset column and subtract it from the timestamp in order to get local time.



Related Topics



Leave a reply



Submit