Hive Insert Query Like SQL

Hive insert query like SQL

Some of the answers here are out of date as of Hive 0.14

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingvaluesintotablesfromSQL

It is now possible to insert using syntax such as:

CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2));

INSERT INTO TABLE students
VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);

Can I insert data into table in Hive similar as in SQL?

Yes Hive don't support insert into table values format

You need to either load the data from flat file to Hive or Hive Table to Hive.

Loading from flat file can be done 2 ways, 1 from Local file system, 2 from hadoop file system.

You can actually make a join from 2 different tables and load into new table. Even overwrite is possible.

Check the links below for type of loading and formats.

http://zacktutorials.blogspot.ca/2014/07/big-data-hadoop-hive-sql-query-hello.html

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML

ConvertJsonToSQL for Hive Insert query

What version of Hive are you using? There are Hive 1.2 and Hive 3 versions of PutHiveStreaming and PutHive3Streaming (respectively) that let you put the data directly into Hive without having to issue HiveQL statements. For external Hive tables in ORC format, there are also ConvertAvroToORC (for Hive 1.2) and PutORC (for Hive 3) processors.

Assuming those don't work for your use case, you may also consider ConvertRecord with a FreeFormTextRecordSetWriter that generates the HiveQL with the PARTITION statement and such. It gives a lot more flexibility than trying to patch a SQL statement to turn it into HiveQL for a partitioned table.

EDIT: I forgot to mention that the Hive 3 NAR/components are not included with the NiFi release due to space reasons. You can find the Hive 3 NAR for NiFi 1.11.4 here.

Data Loaded wrongly into Hive Partitioned table after adding a new column using ALTER

Partition columns should be the last ones in the select. When you add new column it is being added as the last non-partition column, partition columns remain the last ones, they are not stored in the datafiles, only metadata contains information about partitions. All other columns order also matters, it should match table DDL, check it using DESCRIBE FORMATTED table_name.

INSERT OVERWRITE table Final table  PARTITION(COLUMN4, COLUMN5)
select
stg.Column1,
stg.Column2,
stg.Column3,
stg.Column6 (New column) ------------New column
stg.Column4(Partition Column) ---partition columns
stg.Column5(Partition Column)
...

How to make dynamic insert in hive from a field?

Use dynamic partition:

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

insert into table tab2
PARTITION (REFERENCE_DATE)
SELECT
from_unixtime (unix_timestamp ('Sun Oct 22 05:35:03 2017', 'E MMM dd HH: mm: ss yyyy'), 'yyyyMMdd') as reference_date
FROM tab1 LIMIT 100;

Better use yyyy-MM-dd date format because this is native Hive date format:

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

insert into table tab2
PARTITION (REFERENCE_DATE)
SELECT
from_unixtime (unix_timestamp('Sun Oct 22 05:35:03 2017', 'E MMM dd HH: mm: ss yyyy'), 'yyyy-MM-dd') as reference_date
FROM tab1 LIMIT 100;


Related Topics



Leave a reply



Submit