How to Access Files in Hadoop Hdfs

view contents of file in hdfs hadoop

I believe hadoop fs -cat <file> should do the job.

How to copy file from HDFS to the local file system

  1. bin/hadoop fs -get /hdfs/source/path /localfs/destination/path
  2. bin/hadoop fs -copyToLocal /hdfs/source/path /localfs/destination/path
  3. Point your web browser to HDFS WEBUI(namenode_machine:50070), browse to the file you intend to copy, scroll down the page and click on download the file.

View contents of a file in HDFS

The only way to see the content of a file is hadoop fs -cat /path/to/your/file. In the path, you have to provide the path to file and not folder. I think you used hadoop fs -cat /tej/ that will not work.

How to navigate directories in Hadoop HDFS

There is no cd (change directory) command in hdfs file system. You can only list the directories and use them for reaching the next directory.

You have to navigate manually by providing the complete path using the ls command.

hdfs dfs -ls /user/username/app1/subdir/

How files or directories are getting stored in hadoop hdfs

HDFS file system is a distributed storage system wherein the storage location is virtual and created using the disk space from all the DataNodes. While installing hadoop, you must have specified paths for dfs.namenode.name.dir and dfs.datanode.data.dir. These are the locations at which all the HDFS related files are stored on individual nodes.

While storing the data onto HDFS, it is stored as blocks of a specified size (default 128MB in Hadoop 2.X). When you use hdfs dfs commands you will see the complete files but internally HDFS stores these files as blocks. If you check the above mentioned paths on your local file system, you will see a bunch of files which correcpond to files on your HDFS. But again, you will not see them as actual files as they are split into blocks.

Check below mentioned command's output to get more details on how much space from each DataNode is used to create the virtual HDFS storage.

hdfs dfsadmin -report #Or

sudo -u hdfs hdfs dfsadmin -report

HTH

URI to access a file in HDFS

Default port is "8020".

You can access the "hdfs" paths in 3 different ways.

  1. Simply use "/" as the root path

    For e.g.

    E:\HadoopTests\target>hadoop fs -ls /
    Found 6 items
    drwxrwxrwt - hadoop hdfs 0 2015-08-17 18:43 /app-logs
    drwxr-xr-x - mballur hdfs 0 2015-11-24 15:36 /tmp
    drwxrwxr-x - mballur hdfs 0 2015-10-20 15:27 /user
  2. Use "hdfs:///"

    For e.g.

    E:\HadoopTests\target>hadoop fs -ls hdfs:///
    Found 6 items
    drwxrwxrwt - hadoop hdfs 0 2015-08-17 18:43 hdfs:///app-logs
    drwxr-xr-x - mballur hdfs 0 2015-11-24 15:36 hdfs:///tmp
    drwxrwxr-x - mballur hdfs 0 2015-10-20 15:27 hdfs:///user
  3. Use "hdfs://{NameNodeHost}:8020/"

    For e.g.

    E:\HadoopTests\target>hadoop fs -ls hdfs://MBALLUR:8020/
    Found 6 items
    drwxrwxrwt - hadoop hdfs 0 2015-08-17 18:43 hdfs://MBALLUR:8020/app-logs
    drwxr-xr-x - mballur hdfs 0 2015-11-24 15:36 hdfs://MBALLUR:8020/tmp
    drwxrwxr-x - mballur hdfs 0 2015-10-20 15:27 hdfs://MBALLUR:8020/user

    In this case, "MBALLUR" is the name of my Name Node host.

find file in hadoop filesystem

If you are looking for equivalent of locate Linux command than such option does not exist in Hadoop. But if you are looking for the way of how to find specific file you can use name parameter of fs -find command for this:

hadoop fs -find /some_directory -name some_file_name

If you are looking for the actual location of hdfs file in your local file system you can use fsck command for this:

hdfs fsck /some_directory/some_file_name -files -blocks -locations


Related Topics



Leave a reply



Submit