Loading csv-files in sparkR
Liste is a local list which can be written with write.csv, data is a SparkR DataFrame which can't be written with write.csv: it only writes its pointer, not the DataFrame. That's why it only is 33 kb
using sparklyr in RStudio, can I upload a LOCAL csv file to a spark cluster
You cannot. File has to be reachable from each machine in your cluster either as a local copy or placed on distributed files system / object storage.
How to read csv into sparkR ver 1.4?
You have to start sparkR console each time like this:
sparkR --packages com.databricks:spark-csv_2.10:1.0.3
Empty output when reading a csv file into Rstudio using SparkR
Pre-built Spark distributions are still built with Scala 2.10, not 2.11. So, if you use such a distribution (which I think you do), you need also a spark-csv
build that is for Scala 2.10, not for Scala 2.11 (as the one you use in your code). The following code should then work fine:
library(rJava)
library(SparkR)
library(nycflights13)
df <- flights[1:4, 1:4]
df
year month day dep_time
1 2013 1 1 517
2 2013 1 1 533
3 2013 1 1 542
4 2013 1 1 544
write.csv(df, file="~/scripts/temp.csv", quote=FALSE, row.names=FALSE)
sc <- sparkR.init(sparkHome= "/usr/local/bin/spark-1.5.1-bin-hadoop2.6/",
master="local",
sparkPackages="com.databricks:spark-csv_2.10:1.2.0") # 2.10 here
sqlContext <- sparkRSQL.init(sc)
df_spark <- read.df(sqlContext, "/home/vagrant/scripts/temp.csv", "com.databricks.spark.csv", header="true")
head(df_spark)
year month day dep_time
1 2013 1 1 517
2 2013 1 1 533
3 2013 1 1 542
4 2013 1 1 544
importing csv file in rstudio from hdfs using sparkR
You can use the fread
function of the data.table
library to read from HDFS. You'd have to specify the path of the hdfs
executable in your system. For instance, assuming that the path to hdfs is /usr/bin/hdfs
, you can try something like this:
your_table <- fread("/usr/bin/hdfs dfs -text /afs/Accounts.csv")
If your "Accounts.csv" is a directory, you can use a wildcard as well /afs/Accounts.csv/*
You can also specify the column classes. For instance:
your_table <- fread("/usr/bin/hdfs dfs -text /afs/Accounts.csv", fill = TRUE, header = TRUE,
colClasses = c("numeric", "character", ...))
I hope this helps.
Spark 2.0.0: SparkR CSV Import
I have the same problem.
But similary problem with this simple code
createDataFrame(iris)
May be some wrong in installation ?
UPD. YES ! I find solution.
This solution based on this: Apache Spark MLlib with DataFrame API gives java.net.URISyntaxException when createDataFrame() or read().csv(...)
For R just start session by this code:
sparkR.session(sparkConfig = list(spark.sql.warehouse.dir="/file:C:/temp"))
Loading com.databricks.spark.csv via RStudio
This is the right syntax (after hours of trying):
(Note - You've to focus on the first line. Notice to double-quotes)
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.0.3" "sparkr-shell"')
library(SparkR)
library(magrittr)
# Initialize SparkContext and SQLContext
sc <- sparkR.init(appName="SparkR-Flights-example")
sqlContext <- sparkRSQL.init(sc)
# The SparkSQL context should already be created for you as sqlContext
sqlContext
# Java ref type org.apache.spark.sql.SQLContext id 1
# Load the flights CSV file using `read.df`. Note that we use the CSV reader Spark package here.
flights <- read.df(sqlContext, "nycflights13.csv", "com.databricks.spark.csv", header="true")
Related Topics
Obtaining Percent Scales Reflective of Individual Facets with Ggplot2
Is There a Command Similar to Matlab's "Close All" in R? (How to Close All Graphics Devices)
How to Replace the String Exactly Using Gsub()
Saving a File to Sharepoint with R
Using Facet Tags and Strip Labels Together in Ggplot2
How to Make Join Operations in Dplyr Silent
Simple Comparing of Two Texts in R
Installing R Packages Error in Readrds(File):Error Reading from Connection
What Is the "Embracing Operator" '{{ }}'
Scraping a Complex HTML Table into a Data.Frame in R
Pivot_Longer Multiple Variables of Different Kinds
Change Background Colour of Knitr::Kable Headers