java.io.IOException: No FileSystem for scheme : hdfs
I have got through this problem after some detailed search and did different trial methods. Basically, the problem seems to be due to unavailability of the hadoop-hdfs jars but while submitting spark application, the dependent jars could not be found, even after using maven-assembly-plugin
or maven-jar-plugin
/maven-dependency-plugin
In the maven-jar-plugin
/maven-dependency-plugin
combination, the main class jar and the dependent jars are being created but still providing the dependent jars with --jar
option led to the same error as follows
./spark-submit --class Spark_App_Main_Class_Name --master spark://localhost.localdomain:7077 --deploy-mode client --executor-memory 4G --jars ../apps/Spark_App_Target_Jar_Name-dep.jar ../apps/Spark_App_Target_Jar_Name.jar
Using maven-shade-plugin
as suggested in hadoop-no-filesystem-for-scheme-file by "krookedking" seems to hit the problem at the right point, since creating a single jar file comprising main class and all dependent classes eliminated the classpath issues.
My final working spark-submit command stands as follows:
./spark-submit --class Spark_App_Main_Class_Name --master spark://localhost.localdomain:7077 --deploy-mode client --executor-memory 4G ../apps/Spark_App_Target_Jar_Name.jar
The maven-shade-plugin
in my project pom.xml is as follows:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.2</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
Note: The excludes in the filter will enable to get rid of
java.lang.SecurityException: Invalid signature file digest for Manifest main attributes
No FileSystem for scheme:hdfs and Class org.apache.hadoop.DistributedFileSystem not found
DistributedFileSystem
is part of hadoop-core
.
To fix this problem, you need to include hadoop-core-1.2.1.jar
also (Note: I am using Maven for building):
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
Overall, I am using following Maven dependencies:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
No FileSystem for scheme: sftp
The exception is coming, because Hadoop is not able to find a file system implementation for the scheme: sftp
.
The exception occurs in FileSystem.java
. The framework tries to find the value for configuration parameter fs.sftp.impl
and when it does not find it, it throws this exception.
As far as I know, Hadoop does not support sftp
file system by default. This JIRA ticket [Add SFTP FileSystem][https://issues.apache.org/jira/browse/HADOOP-5732], indicates that, SFTP is available from Hadoop version 2.8.0.
To fix this, you need to do 2 things:
- Add a jar containing
sftp
file system implementation to your HADOOP deployment. - Set the config parameter:
fs.sftp.impl
to a fully qualified class name of thesftp
implementation.
I came across this git repository, which contains sftp
implementation for Hadoop: https://github.com/wnagele/hadoop-filesystem-sftp. To use this, you need to set property fs.sftp.impl
to org.apache.hadoop.fs.sftp.SFTPFileSystem
.
Related Topics
Run a Single Test Method with Maven
Cast Double to Integer in Java
Increasing the Jvm Maximum Heap Size for Memory Intensive Applications
How to Simulate Keyboard Presses in Java
How Do Java Interfaces Simulate Multiple Inheritance
Java Lambda Stream Distinct() on Arbitrary Key
Using Hibernate's Scrollableresults to Slowly Read 90 Million Records
How to Decode JSON with Unknown Field Using Gson
Jax-Rs - How to Return JSON and Http Status Code Together
Declaring an Unsigned Int in Java
How to Get the Caller Class in Java
Why Are the Rsa-Sha256 Signatures I Generate with Openssl and Java Different
Package Conflicts with Automatic Modules in Java 9
The Method Getdispatchertype() Is Undefined for the Type Httpservletrequest