Can Sparklyr Be Used with Spark Deployed on Yarn-Managed Hadoop Cluster

Can sparklyr be used with spark deployed on yarn-managed hadoop cluster?

Yes, sparklyr can be used against a yarn-managed cluster. In order to connect to yarn-managed clusters one needs to:

  1. Set SPARK_HOME environment variable to point to the right spark home directory.
  2. Connect to the spark cluster using the appropriate master location, for instance: sc <- spark_connect(master = "yarn-client")

See also: http://spark.rstudio.com/deployment.html

ERROR sparklyr: Gateway xxxxx failed calling take on xxx when running spark-apply

Solved with assistance on the package github issues page here: https://github.com/rstudio/sparklyr/issues/1121

The relevant part:

Still not sure why but adding config=list() to spark_connect did
it - idea came from
Can sparklyr be used with spark deployed on yarn-managed hadoop cluster?.



Related Topics



Leave a reply



Submit