Hadoop No Filesystem for Scheme: File

java.io.IOException: No FileSystem for scheme : hdfs

I have got through this problem after some detailed search and did different trial methods. Basically, the problem seems to be due to unavailability of the hadoop-hdfs jars but while submitting spark application, the dependent jars could not be found, even after using maven-assembly-plugin or maven-jar-plugin/maven-dependency-plugin

In the maven-jar-plugin/maven-dependency-plugin combination, the main class jar and the dependent jars are being created but still providing the dependent jars with --jar option led to the same error as follows

./spark-submit --class Spark_App_Main_Class_Name --master spark://localhost.localdomain:7077 --deploy-mode client --executor-memory 4G --jars ../apps/Spark_App_Target_Jar_Name-dep.jar ../apps/Spark_App_Target_Jar_Name.jar

Using maven-shade-plugin as suggested in hadoop-no-filesystem-for-scheme-file by "krookedking" seems to hit the problem at the right point, since creating a single jar file comprising main class and all dependent classes eliminated the classpath issues.

My final working spark-submit command stands as follows:

./spark-submit --class Spark_App_Main_Class_Name --master spark://localhost.localdomain:7077 --deploy-mode client --executor-memory 4G ../apps/Spark_App_Target_Jar_Name.jar

The maven-shade-plugin in my project pom.xml is as follows:

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.2</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
</transformers>
</configuration>
</execution>
</executions>
</plugin>

Note: The excludes in the filter will enable to get rid of

java.lang.SecurityException: Invalid signature file digest for Manifest main attributes

No FileSystem for scheme:hdfs and Class org.apache.hadoop.DistributedFileSystem not found

DistributedFileSystem is part of hadoop-core.

To fix this problem, you need to include hadoop-core-1.2.1.jar also (Note: I am using Maven for building):

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>

Overall, I am using following Maven dependencies:

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.1</version>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.7.1</version>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>

No FileSystem for scheme: sftp

The exception is coming, because Hadoop is not able to find a file system implementation for the scheme: sftp.

The exception occurs in FileSystem.java. The framework tries to find the value for configuration parameter fs.sftp.impl and when it does not find it, it throws this exception.

As far as I know, Hadoop does not support sftp file system by default. This JIRA ticket [Add SFTP FileSystem][https://issues.apache.org/jira/browse/HADOOP-5732], indicates that, SFTP is available from Hadoop version 2.8.0.

To fix this, you need to do 2 things:

  1. Add a jar containing sftp file system implementation to your HADOOP deployment.
  2. Set the config parameter: fs.sftp.impl to a fully qualified class name of the sftp implementation.

I came across this git repository, which contains sftp implementation for Hadoop: https://github.com/wnagele/hadoop-filesystem-sftp. To use this, you need to set property fs.sftp.impl to org.apache.hadoop.fs.sftp.SFTPFileSystem.



Related Topics



Leave a reply



Submit