Reading Remote Hdfs File with Java

How a Java client upload/download a file to/from remote HDFS server?

Try this example using-filesystem-api-to-read-and-write-data-to-hdfs code with below maven configuration :

<properties>
<hadoop.version>2.7.0</hadoop.version>
<hadoop.core>1.2.1</hadoop.core>
</properties>

<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>${hadoop.core}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
</dependencies>

Accessing HDFS File system via Java API Vs. Java Runtime HDFS command

This question basically boils down to "API vs CLI" and isn't necessarily Hadoop-specific.

At the end of the day both API calls and CLI calls will hit the same underlying code and do the same thing. The advantage of using an API is that you get endpoints and responses that are automatically in a format that Java can work with.

If you call hdfs commands from the CLI in Java then you have to manually parse the response as a string to figure out if it did what you expect. You also have to download those binaries and put them on the system PATH. Compare that to using the HDFS API: the results return actual usable objects, and any errors will throw an exception that you can handle.

Uploading a local file to a remote hdfs with Java API but connect to localhost

You need to change your configuration of the http-address to your local IP address instead of 0.0.0.0.
0.0.0.0 gets resolved to localhost and will then be used by dfs.client.use.datanode.hostname => true while your local IP address will be resolved to the DNS name and then be used by hostname again.

Since it works I will post this as an answer, thus I don't know if my reasoning for the solution is correct. If anybody knows the exact reason please add it as a comment or edit my answer.



Related Topics



Leave a reply



Submit