Cannot install RHDFS for RStudio (R version 3.3.1)
For the latest version of R the 'rmr' package is probably no longer maintained.
Though you may be able to get it as described in this answer the surprising thing here is that you seem to need it.
Based on the documentation in the comment by @abhiieor you should need 'rmr2' and not 'rmr'.
My suggestions:
- Install 'rmr2' and try if that allows you to install 'rhdfs'
- If that somehow fails via the way you try it, try to install the packages in the way that is described on this site: https://github.com/RevolutionAnalytics/RHadoop/wiki/Installing-RHadoop-on-RHEL (it also contains files for windows)
Cannot install RHDFS for RStudio (R version 3.3.1)
For the latest version of R the 'rmr' package is probably no longer maintained.
Though you may be able to get it as described in this answer the surprising thing here is that you seem to need it.
Based on the documentation in the comment by @abhiieor you should need 'rmr2' and not 'rmr'.
My suggestions:
- Install 'rmr2' and try if that allows you to install 'rhdfs'
- If that somehow fails via the way you try it, try to install the packages in the way that is described on this site: https://github.com/RevolutionAnalytics/RHadoop/wiki/Installing-RHadoop-on-RHEL (it also contains files for windows)
Rhadoop - wordcount using rmr
Firstly, you'll have to set the HADOOP_STREAMING
environment variable in your code.
Try the below code, and note that the code assumes that you have copied your text file to the hdfs
folder examples/wordcount/data
R Code:
Sys.setenv("HADOOP_CMD"="/usr/local/hadoop/bin/hadoop")
Sys.setenv("HADOOP_STREAMING"="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.4.0.jar")
# load librarys
library(rmr2)
library(rhdfs)
# initiate rhdfs package
hdfs.init()
map <- function(k,lines) {
words.list <- strsplit(lines, '\\s')
words <- unlist(words.list)
return( keyval(words, 1) )
}
reduce <- function(word, counts) {
keyval(word, sum(counts))
}
wordcount <- function (input, output=NULL) {
mapreduce(input=input, output=output, input.format="text", map=map, reduce=reduce)
}
## read text files from folder example/wordcount/data
hdfs.root <- 'example/wordcount'
hdfs.data <- file.path(hdfs.root, 'data')
## save result in folder example/wordcount/out
hdfs.out <- file.path(hdfs.root, 'out')
## Submit job
out <- wordcount(hdfs.data, hdfs.out)
## Fetch results from HDFS
results <- from.dfs(out)
results.df <- as.data.frame(results, stringsAsFactors=F)
colnames(results.df) <- c('word', 'count')
head(results.df)
Output:
word count
AS 16
As 5
B. 1
BE 13
BY 23
By 7
For your reference, here is another example of running R word count map reduce program.
Hope this helps.
RHadoop - Rstudio - Install arulesViz library
Package curl has the following requirements (see https://cran.r-project.org/web/packages/curl/index.html):
SystemRequirements: libcurl: libcurl-devel (rpm) or libcurl4-openssl-dev (deb).
Install the library and it will probably work.
Install RHadoop on 32-bit Ubuntu
Unfortunately 32 bit is not supported and there are reports that it actually doesn't work because of details in the serialization code. There are some fixes in dev but we do not test on 32-bit so they may or may not work. Lastly we have a dedicated google group for RHadoop where we are trying to build a community. We are a small community so we can't afford to be dispersed over github, SO, FB, quora, .
String character in RHDFS output
If you look at the hdfs.write function in the source code, you can see that it can take raw bytes instead of having R serialize it for you. So essentially you can do this for characters
ofile = hdfs.file("brian.txt", "w")
hdfs.write(charToRaw("hi", ofile))
hdfs.close(ofile)
Related Topics
Plot Margin of PDF Plot Device: Y-Axis Label Falling Outside Graphics Window
Get Stack Trace on Trycatch'Ed Error in R
Ggplot2:How to Reduce the Width and the Space Between Bars with Geom_Bar
If_Else() 'False' Must Be Type Double, Not Integer - in R
Test for Na and Select Values Based on Result
Converting Date to a Day of Week in R
Export All User Inputs in a Shiny App to File and Load Them Later
Reshape Long Structured Data.Table into a Wide Structure Using Data.Table Functionality
Divide Each Data Frame Row by Vector in R
Dplyr Count Number of One Specific Value of Variable
A^K for Matrix Multiplication in R
Passing by Reference a Data.Frame and Updating It with Rcpp
Replace Numbers in Data Frame Column in R
How to Create Geom_Boxplot with Large Amount of Continuous X-Variables