How to Convert "2019-11-02T20:18:00Z" to Timestamp in Hql

How to convert 2019-11-02T20:18:00Z to timestamp in HQL?

If you want preserve milliseconds then remove Z, replace T with space and convert to timestamp:

select timestamp(regexp_replace("2019-11-02T20:18:00Z", '^(.+?)T(.+?)Z$','$1 $2'));

Result:

2019-11-02 20:18:00

Also it works with milliseconds:

 select timestamp(regexp_replace("2019-11-02T20:18:00.123Z", '^(.+?)T(.+?)Z$','$1 $2'));

Result:

2019-11-02 20:18:00.123

Using from_unixtime(unix_timestamp()) solution does not work with milliseconds.
Demo:

  select from_unixtime(unix_timestamp("2019-11-02T20:18:00.123Z", "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"));

Result:

2019-11-02 20:18:00

Milliseconds are lost. And the reason is that function unix_timestamp returns seconds passed from the UNIX epoch (1970-01-01 00:00:00 UTC).

What version of Hive do I need to have timestamps with more than 6 decimal places in the fractional seconds?

Jira HIVE-21575 is about introducing a way to use SQL:2016 compatible format instead of currently used SimpleDateFormat and at the same time it should provide simple method for conversion non-standard timestamp formats with precision.
And I have found that child Jira HIVE-21868 released in version 4.0

What you can do if you have Hive < 4.0:

  1. Standard timestamp format is 'yyyy-MM-dd HH:mm:ss.SSSSSSSSS' (up to 9 digits precision). If you have strings in this format, you can convert to timestamp using timestamp(str) or cast(str as timestamp), though in most cases you do not need explicit conversion, it will be done implicitly, precision is not lost, you can insert such strings into timestamp column and compare with timestamps without conversion.

  2. For timestamps in non standard format with nanoseconds you need conversion.
    The problem here is that date_format method does not work because it expects timestamp in standard format. from_unixtime(unix_timestamp(str, format)) does not work because unix_timestamp function returns seconds, not nanoseconds, after this conversion, milliseconds and nanoseconds will be lost.

The solution for non-standard formats is to extract milliseconds or nanoseconds from string, apply conversion in seconds using unix_timestamp(str, format), concatenate result with milliseconds, then convert to timestamp using timestamp() or cast (this last step is not necessary).

Demo (Hive 2.3.6):

with your_data as (
select
'16AUG2001:23:46:32.123456789' --non standard format
as ts
)

select ts as original_string,
timestamp( --in most cases you can do without final timestamp() conversion
concat(
from_unixtime(unix_timestamp(split(ts,'\\.')[0],'ddMMMyyyy:HH:mm:ss')), --timestamp with seconds precision
'.', split(ts,'\\.')[1] --digits after dot
)
) as timestamp_converted
from your_data;

Result:

original_string                 timestamp_converted
16AUG2001:23:46:32.123456789 2001-08-16 23:46:32.123456789

As you can see, it works fine with nanoseconds precision. I am using final timestamp(string) conversion just to show that string produced is compatible with timestamp, you can omit explicit conversion timestamp(string).


  1. If you initially have bigint unix timestamp in milliseconds and want to convert it to Hive timestamp, see this recipe: https://stackoverflow.com/a/63672215/2700344

  2. A bit different method if you have strings like this "2019-11-02T20:18:00.123Z", see: https://stackoverflow.com/a/58713989/2700344
    This method works if string can be easily converted to standard format using regexp_replace.

Hive - calculating string type timestamp differences in minutes

select anonymousid,
(max(unix_timestamp(timestamp, "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")) -
min(unix_timestamp(timestamp, "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"))
) / 60
from db1.formevent
group by anonymousid;

Convert 2020-10-31T00:00:00Z String Date to long

By doing df.setTimeZone(TimeZone.getTimeZone("GMT+11"));, you are asking the date formatter to interpret your string in the GMT+11 time zone. However, your string shouldn't be interpreted in that timezone. See that Z in the string? That stands for the GMT time zone, so you should have done this instead:

df.setTimeZone(TimeZone.getTimeZone("GMT"));

In fact, your string is in the ISO 8601 format for an Instant (or a "point in time", if you prefer). Therefore, you could just parse it with Instant.parse, and get the number of milliseconds with toEpochMilli:

System.out.println(Instant.parse("2020-10-31T00:00:00Z").toEpochMilli());
// prints 1604102400000

Warning: you shouldn't really use SimpleDateFormat anymore if the Java 8 APIs (i.e. Instant and such) are available. Even if they are not, you should use NodaTime or something like that.



Related Topics



Leave a reply



Submit