Installing of SparkR
You can install directly from a GitHub repository:
if (!require('devtools')) install.packages('devtools')
devtools::install_github('apache/spark@v2.x.x', subdir='R/pkg')
You should choose tag (v2.x.x
above) corresponding to the version of Spark you use. You can find a full list of tags on the project page or directly from R using GitHub API:
jsonlite::fromJSON("https://api.github.com/repos/apache/spark/tags")$name
If you've downloaded binary package from a downloads page R library is in a R/lib/SparkR
subdirectory. It can be used to install SparkR
directly. For example:
$ export SPARK_HOME=/path/to/spark/directory
$ cd $SPARK_HOME/R/pkg/
$ R -e "devtools::install('.')"
You can also add R lib to .libPaths
(taken from here):
Sys.setenv(SPARK_HOME='/path/to/spark/directory')
.libPaths(c(file.path(Sys.getenv('SPARK_HOME'), 'R', 'lib'), .libPaths()))
Finally, you can use sparkR
shell without any additional steps:
$ /path/to/spark/directory/bin/sparkR
Edit
According to Spark 2.1.0 Release Notes should be available on CRAN in the future:
Standalone installable package built with the Apache Spark release. We will be submitting this to CRAN soon.
You can follow SPARK-15799 to check the progress.
Edit 2
While SPARK-15799 has been merged, satisfying CRAN requirements proved to be challenging (see for example discussions about 2.2.2, 2.3.1, 2.4.0), and the packages has been subsequently removed (see for example SparkR was removed from CRAN on 2018-05-01, CRAN SparkR package removed?). As the result methods listed in the original post are still the most reliable solutions.
Edit 3
OK, SparkR
is back up on CRAN again, v2.4.1. install.packages('SparkR')
should work again (it may take a couple of days for the mirrors to reflect this)
How to update to SparkR 2.0.0 package in R
SparkR requires not just an R package but an entire Spark backend to be pulled in. When you want to upgrade SparkR, you are upgrading Spark, not just the R package.
Nowadays you may want to refer to the sparklyr
package as it makes all of this a whole lot easier.
install.packages("devtools")
devtools::install_github("rstudio/sparklyr")
library(sparklyr)
spark_install(version = "1.6.2")
spark_install(version = "2.0.0")
It also offers more functionality than SparkR.
Install SparkR that comes with Spark 1.4
@DavidArenburg put me on the right track.
Following the Windows documentation in the C:\spark-1.4.0\R\WINDOWS.md, I installed RTools and added R.exe and RTools to my computers PATH.
Then, I ran install-dev.bat in C:\spark-1.4.0\R This added the lib\SparkR\ installation that I was missing.
Then, from the command prompt, I ran
mklink /D "C:\Program Files\R\R-3.1.3\library\SparkR" "C:\spark-1.4.0\R\lib\SparkR"
This added a link in my R packages directory to the installation in the spark folder.
library(SparkR) # this should run now.
Error while Installing sparkR
Read before proceeding:
amplab-extras/SparkR-pkg is no longer maintained. Current versions of SparkR are shipped with Spark itself. See also Installing of SparkR
Sbt download links in the repository are invalid and what you get is actually a HTML file. You can either correct URL2
in pkg/src/sbt/sbt
so it points to:
http://repo.typesafe.com/typesafe/ivy-releases/org.scala-sbt/sbt-launch/0.13.6/sbt-launch.jar
or download and install sbt, clone the repository:
git clone https://github.com/amplab-extras/SparkR-pkg.git
go to src:
cd SparkR-pkg/pkg/src
assembly:
sbt assembly
and install:
R -e "devtools::install('.')"
Related Topics
How to Sort Letters in a String
Is There a More Elegant Way to Convert Two-Digit Years to Four-Digit Years with Lubridate
Deleting Reversed Duplicates with R
How to Use a String Variable to Select a Data Frame Column Using $ Notation
Subtract a Column in a Dataframe from Many Columns in R
R Knitr Chunk Options for Figure Height/Width Are Not Working
Determine the Data Types of a Data Frame's Columns
How to Wait for a Keypress in R
Options for Caching/Memoization/Hashing in R
Replace Negative Values by Zero
Split Up '...' Arguments and Distribute to Multiple Functions
Linear Regression Loop for Each Independent Variable Individually Against Dependent
Examples of the Perils of Globals in R and Stata
Starting a Daily Time Series in R
Subsetting Data.Table by 2Nd Column Only of a 2 Column Key, Using Binary Search Not Vector Scan