Hive insert query like SQL
Some of the answers here are out of date as of Hive 0.14
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingvaluesintotablesfromSQL
It is now possible to insert using syntax such as:
CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2));
INSERT INTO TABLE students
VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);
Can I insert data into table in Hive similar as in SQL?
Yes Hive don't support insert into table values format
You need to either load the data from flat file to Hive or Hive Table to Hive.
Loading from flat file can be done 2 ways, 1 from Local file system, 2 from hadoop file system.
You can actually make a join from 2 different tables and load into new table. Even overwrite is possible.
Check the links below for type of loading and formats.
http://zacktutorials.blogspot.ca/2014/07/big-data-hadoop-hive-sql-query-hello.html
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
ConvertJsonToSQL for Hive Insert query
What version of Hive are you using? There are Hive 1.2 and Hive 3 versions of PutHiveStreaming and PutHive3Streaming (respectively) that let you put the data directly into Hive without having to issue HiveQL statements. For external Hive tables in ORC format, there are also ConvertAvroToORC (for Hive 1.2) and PutORC (for Hive 3) processors.
Assuming those don't work for your use case, you may also consider ConvertRecord with a FreeFormTextRecordSetWriter that generates the HiveQL with the PARTITION statement and such. It gives a lot more flexibility than trying to patch a SQL statement to turn it into HiveQL for a partitioned table.
EDIT: I forgot to mention that the Hive 3 NAR/components are not included with the NiFi release due to space reasons. You can find the Hive 3 NAR for NiFi 1.11.4 here.
Data Loaded wrongly into Hive Partitioned table after adding a new column using ALTER
Partition columns should be the last ones in the select. When you add new column it is being added as the last non-partition column, partition columns remain the last ones, they are not stored in the datafiles, only metadata contains information about partitions. All other columns order also matters, it should match table DDL, check it using DESCRIBE FORMATTED table_name
.
INSERT OVERWRITE table Final table PARTITION(COLUMN4, COLUMN5)
select
stg.Column1,
stg.Column2,
stg.Column3,
stg.Column6 (New column) ------------New column
stg.Column4(Partition Column) ---partition columns
stg.Column5(Partition Column)
...
How to make dynamic insert in hive from a field?
Use dynamic partition:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert into table tab2
PARTITION (REFERENCE_DATE)
SELECT
from_unixtime (unix_timestamp ('Sun Oct 22 05:35:03 2017', 'E MMM dd HH: mm: ss yyyy'), 'yyyyMMdd') as reference_date
FROM tab1 LIMIT 100;
Better use yyyy-MM-dd date format because this is native Hive date format:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert into table tab2
PARTITION (REFERENCE_DATE)
SELECT
from_unixtime (unix_timestamp('Sun Oct 22 05:35:03 2017', 'E MMM dd HH: mm: ss yyyy'), 'yyyy-MM-dd') as reference_date
FROM tab1 LIMIT 100;
Related Topics
How to Design a Database Schema to Support Tagging with Categories
SQL Server Int or Bigint Database Table Ids
How to Insert Multiple Rows into Oracle with a Sequence Value
Accessing JSON Array in SQL Server 2016 Using JSON_Value
Return Only One Row from the Right-Most Table for Every Row in the Left-Most Table
SQL Server 2008: Ordering by Datetime Is Too Slow
Iterate Through Rows in SQL Server 2008
The Wait Operation Timed Out. Asp
How to Pass Variable from Shell Script to SQLplus
How to Set Server Output on in Datagrip
How to Join the Most Recent Row in One Table to Another Table
Find Duplicate Records in a Table Using SQL Server
Insert a Blob via a SQL Script
Postgresql Changing Data Directory in Ubuntu
Postgresql Not Ilike Clause Does Not Include Null String Values
Which One Have Better Performance:Derived Tables or Temporary Tables