Installing of Sparkr

Installing of SparkR

You can install directly from a GitHub repository:

if (!require('devtools')) install.packages('devtools')
devtools::install_github('apache/spark@v2.x.x', subdir='R/pkg')

You should choose tag (v2.x.x above) corresponding to the version of Spark you use. You can find a full list of tags on the project page or directly from R using GitHub API:

jsonlite::fromJSON("https://api.github.com/repos/apache/spark/tags")$name

If you've downloaded binary package from a downloads page R library is in a R/lib/SparkR subdirectory. It can be used to install SparkR directly. For example:

$ export SPARK_HOME=/path/to/spark/directory
$ cd $SPARK_HOME/R/pkg/
$ R -e "devtools::install('.')"

You can also add R lib to .libPaths (taken from here):

Sys.setenv(SPARK_HOME='/path/to/spark/directory')
.libPaths(c(file.path(Sys.getenv('SPARK_HOME'), 'R', 'lib'), .libPaths()))

Finally, you can use sparkR shell without any additional steps:

$ /path/to/spark/directory/bin/sparkR

Edit

According to Spark 2.1.0 Release Notes should be available on CRAN in the future:

Standalone installable package built with the Apache Spark release. We will be submitting this to CRAN soon.

You can follow SPARK-15799 to check the progress.

Edit 2

While SPARK-15799 has been merged, satisfying CRAN requirements proved to be challenging (see for example discussions about 2.2.2, 2.3.1, 2.4.0), and the packages has been subsequently removed (see for example SparkR was removed from CRAN on 2018-05-01, CRAN SparkR package removed?). As the result methods listed in the original post are still the most reliable solutions.

Edit 3

OK, SparkR is back up on CRAN again, v2.4.1. install.packages('SparkR') should work again (it may take a couple of days for the mirrors to reflect this)

How to update to SparkR 2.0.0 package in R

SparkR requires not just an R package but an entire Spark backend to be pulled in. When you want to upgrade SparkR, you are upgrading Spark, not just the R package.

Nowadays you may want to refer to the sparklyr package as it makes all of this a whole lot easier.

install.packages("devtools")
devtools::install_github("rstudio/sparklyr")
library(sparklyr)
spark_install(version = "1.6.2")
spark_install(version = "2.0.0")

It also offers more functionality than SparkR.

Install SparkR that comes with Spark 1.4

@DavidArenburg put me on the right track.

Following the Windows documentation in the C:\spark-1.4.0\R\WINDOWS.md, I installed RTools and added R.exe and RTools to my computers PATH.

Then, I ran install-dev.bat in C:\spark-1.4.0\R This added the lib\SparkR\ installation that I was missing.

Then, from the command prompt, I ran

mklink /D "C:\Program Files\R\R-3.1.3\library\SparkR" "C:\spark-1.4.0\R\lib\SparkR"

This added a link in my R packages directory to the installation in the spark folder.

library(SparkR) # this should run now.

Error while Installing sparkR

Read before proceeding:

amplab-extras/SparkR-pkg is no longer maintained. Current versions of SparkR are shipped with Spark itself. See also Installing of SparkR


Sbt download links in the repository are invalid and what you get is actually a HTML file. You can either correct URL2 in pkg/src/sbt/sbt so it points to:

http://repo.typesafe.com/typesafe/ivy-releases/org.scala-sbt/sbt-launch/0.13.6/sbt-launch.jar

or download and install sbt, clone the repository:

git clone https://github.com/amplab-extras/SparkR-pkg.git

go to src:

cd SparkR-pkg/pkg/src

assembly:

sbt assembly

and install:

R -e "devtools::install('.')"


Related Topics



Leave a reply



Submit