Difference Between 'Load Data Inpath ' and 'Location' in Hive

Which is the difference between LOAD DATA INPATH and LOAD DATA LOCAL INPATH in HIVE

I got the answer:

  • 'LOCAL' signifies that the input file is on the local file system.
  • If 'LOCAL' is omitted then it looks for the file in HDFS.

Source: https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations

hive external table location vs load path

  1. The location clause in the DDL of an external table is used to
    specify the hdfs location where the data needs to be stored. Later
    on when we query the table the data would be read from this specified
    path.

  2. The load data inpath is the path of the source file from where the data
    is loaded into the table. The source could be either a local file
    path or a hdfs file path.

Hope I have cleared your confusion.

When executing LOAD DATA in Hive, does it copies the data?

It is explained in the documentation :

If the keyword LOCAL is not specified, then Hive will either use the full URI of filepath, if one is specified, or will apply the following rules:
[...]
Hive will move the files addressed by filepath into the table (or partition)

Loading data into a table

either you can upload that file into hdfs and try same command with hdfs path.

or

you may use local keyword as below.

load data local inpath "D:\data files\sample.txt" into table sample;

check this for more details

Does Hive need an explicit command to load data into the table from HDFS

First create sentence is completely wrong one: STORED AS should be TEXTFILE, ORC, Parquet, etc, it is not location and of course when you create table you should not provide file name. Tables in hive are created on locations(folders), not files and the property for tables location is LOCATION, not STORED AS. See recent example: https://stackoverflow.com/a/68095278/2700344

Second create sentence creates table without location specified (default loacation will be used for managed tables, like this /user/hive/warehouse/dbo/table1 ), see here more details https://stackoverflow.com/a/67073849/2700344
Execute DESC FORMATTED dbo.table1 and check LOCATION.

Yes, you need load data to be executed because your file is located not in the table location. If you place file into some dedicated location for that table, you can CREATE EXTERNAL TABLE and specify LOCATION. But your file is currently in such folder which should not be used as a table location: /usr/hive. This /usr/hive/table1 looks much better. Alternatvely you can create table like in second CREATE sentence and just copy file into it's location using hadoop fs cp command. LOAD DATA does the same.

LOAD DATA INPATH loads same CSV-base data into two different and external Hive tables

It looks like you just need to specify a different 'LOCATION' for the second table. When you do the 'LOAD DATA', Hive is actually copying data into that path. If both tables have the same 'LOCATION', they will share the same data.



Related Topics



Leave a reply



Submit