Which is the difference between LOAD DATA INPATH and LOAD DATA LOCAL INPATH in HIVE
I got the answer:
- 'LOCAL' signifies that the input file is on the local file system.
- If 'LOCAL' is omitted then it looks for the file in HDFS.
Source: https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations
hive external table location vs load path
The location clause in the DDL of an external table is used to
specify the hdfs location where the data needs to be stored. Later
on when we query the table the data would be read from this specified
path.The load data inpath is the path of the source file from where the data
is loaded into the table. The source could be either a local file
path or a hdfs file path.
Hope I have cleared your confusion.
When executing LOAD DATA in Hive, does it copies the data?
It is explained in the documentation :
If the keyword LOCAL is not specified, then Hive will either use the full URI of filepath, if one is specified, or will apply the following rules:
[...]
Hive will move the files addressed by filepath into the table (or partition)
Loading data into a table
either you can upload that file into hdfs and try same command with hdfs path.
or
you may use local keyword as below.
load data local inpath "D:\data files\sample.txt" into table sample;
check this for more details
Does Hive need an explicit command to load data into the table from HDFS
First create sentence is completely wrong one: STORED AS should be TEXTFILE, ORC, Parquet, etc, it is not location and of course when you create table you should not provide file name. Tables in hive are created on locations(folders), not files and the property for tables location is LOCATION, not STORED AS. See recent example: https://stackoverflow.com/a/68095278/2700344
Second create sentence creates table without location specified (default loacation will be used for managed tables, like this /user/hive/warehouse/dbo/table1 ), see here more details https://stackoverflow.com/a/67073849/2700344
Execute DESC FORMATTED dbo.table1
and check LOCATION.
Yes, you need load data to be executed because your file is located not in the table location. If you place file into some dedicated location for that table, you can CREATE EXTERNAL TABLE and specify LOCATION. But your file is currently in such folder which should not be used as a table location: /usr/hive. This /usr/hive/table1 looks much better. Alternatvely you can create table like in second CREATE sentence and just copy file into it's location using hadoop fs cp
command. LOAD DATA does the same.
LOAD DATA INPATH loads same CSV-base data into two different and external Hive tables
It looks like you just need to specify a different 'LOCATION' for the second table. When you do the 'LOAD DATA', Hive is actually copying data into that path. If both tables have the same 'LOCATION', they will share the same data.
Related Topics
Handling Null in Greatest Function in Oracle
Why Is There a Scan on My Clustered Index
How to Alter a Column Datatype for Derby Database
How to Set a Jdbc Timeout for a Single Query
Increment Counter or Insert Row in One Statement, in Sqlite
Postgres How to Implement Calculated Column with Clause
Why Can't I Use Select ... for Update with Aggregate Functions
Sqlite: Autoincrement Primary Key Questions
Sql Error: Ora-02298: Cannot Validate (System.Aeropuerto_Fk) - Parent Keys Not Found
Why Sum(Null) Is Not 0 in Oracle
Check That a List Parameter Is Null in a Spring Data JPA Query
Postgres on Conflict Do Update on Composite Primary Keys
Storing Single Quotes in Varchar Variable SQL Server 2008
How to Add a Not Null Constraint on Column Containing Null Values
Is There a Tool to Generate a Full Database Ddl for SQL Server? What About Postgres and MySQL