Can sparklyr be used with spark deployed on yarn-managed hadoop cluster?
Yes, sparklyr can be used against a yarn-managed cluster. In order to connect to yarn-managed clusters one needs to:
- Set SPARK_HOME environment variable to point to the right spark home directory.
- Connect to the spark cluster using the appropriate master location, for instance:
sc <- spark_connect(master = "yarn-client")
See also: http://spark.rstudio.com/deployment.html
ERROR sparklyr: Gateway xxxxx failed calling take on xxx when running spark-apply
Solved with assistance on the package github issues page here: https://github.com/rstudio/sparklyr/issues/1121
The relevant part:
Still not sure why but adding
config=list()
tospark_connect
did
it - idea came from
Can sparklyr be used with spark deployed on yarn-managed hadoop cluster?.
Related Topics
Plot a Jpg Image Using Base Graphics in R
Force Facet_Wrap to Fill Bottom Row (And Leave Any "Gaps" in the Top Row)
How to Add Legend to Geom_Smooth in Ggplot in R
Documentation for Special Variables in Ggplot (..Count.., ..Density.., etc.)
R: Matrix by Vector Multiplication
How to Split a Data Frame Among Columns, Say at Every Nth Column
How to Save the Wordcloud in R
Lm(): What Is Qraux Returned by Qr Decomposition in Linpack/Lapack
Logical Comparison of Two Vectors with Binary (0/1) Result
Linking Intel's Math Kernel Library (Mkl) to R on Windows
Adding Multiple Columns in a Dplyr Mutate Call
Extract Date from Given String in R
Knn in R: 'Train and Class Have Different Lengths'
How to Capture the Output of System()
Converting Yearmon Column to Last Date of the Month in R
X^(1/3)' Behaves Differently for Negative Scalar 'X' and Vector 'X' with Negative Values