How to record created_at and updated_at timestamps in Hive?
Hive does not provide such mechanism. You can achieve this by using UDF in your select: from_unixtime(unix_timestamp()) as created_at
. Note this will be executed in each mapper or reducer and may return different values. If you need the same value for all the dataset (for Hive version before 1.2.0), pass the variable to the script and use it inside as: '${hiveconf:created_at}' as created_at
Update: current_timestamp
returns the current timestamp at the start of query evaluation (as of Hive 1.2.0). All calls of current_timestamp within the same query return the same value. unix_timestamp()
Gets current Unix timestamp in seconds. This function is non-deterministic and prevents proper optimization of queries - this has been deprecated since 2.0 in favour of CURRENT_TIMESTAMP constant. So, it's not a function, it's a constant!
See this docs: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
For hive queries CURRENT_TIMESTAMP is preferable when you rewrite tables or partitions or insert into because all the file(s) anyway are being rewritten, not records, therefore the created_at
timestamp should be the same.
Having both a Created and Last Updated timestamp columns in MySQL 4.0
From the MySQL 5.5 documentation:
One TIMESTAMP column in a table can have the current timestamp as the default value for initializing the column, as the auto-update value, or both. It is not possible to have the current timestamp be the default value for one column and the auto-update value for another column.
Changes in MySQL 5.6.5:
Previously, at most one TIMESTAMP column per table could be automatically initialized or updated to the current date and time. This restriction has been lifted. Any TIMESTAMP column definition can have any combination of DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP clauses. In addition, these clauses now can be used with DATETIME column definitions. For more information, see Automatic Initialization and Updating for TIMESTAMP and DATETIME.
Get full data view for two tables in Hive?
You can use UNION ALL:
select tr_id, res_id, info_json, created_at, updated_at, src
from
(select tr_id, res_id, info_json, created_at, updated_at, 'NoArch' as src
from Table2NoArch
union all
select tr_id, res_id, info_json, null created_at, null updated_at, 'Arch' as src
from Table1Arch
)u
where res_id in (111,333,444)
created_at and updated_at are absent in one Table1Arch, NULLs are selected, you can use current_timestamp or current_date instead.
Added src column, so you can easily find out the source of data.
Union two tables having unix_timestamp() function in both
unix_timestamp()
Gets current Unix timestamp in seconds.
This function is
non-deterministic and prevents proper optimization of queries -
this has been deprecated since 2.0 in favour of CURRENT_TIMESTAMP
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
insert overwrite table TableC
select field1,field2, unix_timestamp(current_timestamp) as field3 from table_A
UNION
select field1,field2, unix_timestamp(current_timestamp) as field3 from table_B
Additional work-arounds
insert overwrite table TableC
select field1,field2,unix_timestamp() as field3
from ( select field1,field2 from table_A
union all select field1,field2 from table_B
) t
group by field1,field2
or
insert overwrite table TableC
select field1,field2,unix_timestamp() as field3
from ( select field1,field2 from table_A
union select field1,field2 from table_B
) t
Not able to reference Hive date variable in later set statements
Hive variable substitution is simple text replacement. This statement:
set my_date=select to_date(date_sub(last_day(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd')),1));
will assign string 'select to_date(date_sub(last_day(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd')),1))'
to the variable my_date
. Hive does not calculate variables unfortunately. And your final statement will be resolved as
select date_format('select to_date(date_sub(last_day(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd')),1))','yyyy-MM');
And this is incorrect select statement.
You can calculate the variable in the separate script and pass it to another script using shell, like in this answer. See also https://stackoverflow.com/a/56450129/2700344
You can print variable inside the Hive script using shell echo command:
! echo my_date contains '${hiveconf:my_date}';
Also do not use FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd'). Use current_date() instead, see this answer for more details: https://stackoverflow.com/a/41140298/2700344.
Related Topics
Postgresql Sorting Language Specific Characters (Collation)
How to Save an Image from SQL Server to a File Using SQL
Dynamically Choose Column in SQL Query
Selecting Distinct Values for Multiple Columns
Split a Single Column of Data with Comma Delimiters into Multiple Columns in Ssis
Making Row Values into Column Values -- SQL Pivot
Is There Any Other Way to Create Constraints During SQL Table Creation
How to Record Created_At and Updated_At Timestamps in Hive
Oracle SQL Comparison of Dates Returns Wrong Result
SQL Query to Translate a List of Numbers Matched Against Several Ranges, to a List of Values
Why Can Pl/Pgsql Functions Have Side Effect, While SQL Functions Can'T
Which Is the Best Way to Form the String Value Using Column from a Table with Rows Having Same Id
How to Have the Table Name as "Option" in MySQL
In MySQL: How to Pass a Table Name as Stored Procedure And/Or Function Argument