How to Install Rhadoop Packages (Rmr, Rhdfs, Rhbase)

Cannot install RHDFS for RStudio (R version 3.3.1)

For the latest version of R the 'rmr' package is probably no longer maintained.
Though you may be able to get it as described in this answer the surprising thing here is that you seem to need it.

Based on the documentation in the comment by @abhiieor you should need 'rmr2' and not 'rmr'.

My suggestions:

  1. Install 'rmr2' and try if that allows you to install 'rhdfs'
  2. If that somehow fails via the way you try it, try to install the packages in the way that is described on this site: https://github.com/RevolutionAnalytics/RHadoop/wiki/Installing-RHadoop-on-RHEL (it also contains files for windows)

Cannot install RHDFS for RStudio (R version 3.3.1)

For the latest version of R the 'rmr' package is probably no longer maintained.
Though you may be able to get it as described in this answer the surprising thing here is that you seem to need it.

Based on the documentation in the comment by @abhiieor you should need 'rmr2' and not 'rmr'.

My suggestions:

  1. Install 'rmr2' and try if that allows you to install 'rhdfs'
  2. If that somehow fails via the way you try it, try to install the packages in the way that is described on this site: https://github.com/RevolutionAnalytics/RHadoop/wiki/Installing-RHadoop-on-RHEL (it also contains files for windows)

Rhadoop - wordcount using rmr

Firstly, you'll have to set the HADOOP_STREAMING environment variable in your code.

Try the below code, and note that the code assumes that you have copied your text file to the hdfs folder examples/wordcount/data

R Code:

Sys.setenv("HADOOP_CMD"="/usr/local/hadoop/bin/hadoop")
Sys.setenv("HADOOP_STREAMING"="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.4.0.jar")

# load librarys
library(rmr2)
library(rhdfs)

# initiate rhdfs package
hdfs.init()

map <- function(k,lines) {
words.list <- strsplit(lines, '\\s')
words <- unlist(words.list)
return( keyval(words, 1) )
}

reduce <- function(word, counts) {
keyval(word, sum(counts))
}

wordcount <- function (input, output=NULL) {
mapreduce(input=input, output=output, input.format="text", map=map, reduce=reduce)
}

## read text files from folder example/wordcount/data
hdfs.root <- 'example/wordcount'
hdfs.data <- file.path(hdfs.root, 'data')

## save result in folder example/wordcount/out
hdfs.out <- file.path(hdfs.root, 'out')

## Submit job
out <- wordcount(hdfs.data, hdfs.out)

## Fetch results from HDFS
results <- from.dfs(out)
results.df <- as.data.frame(results, stringsAsFactors=F)
colnames(results.df) <- c('word', 'count')

head(results.df)

Output:

word count
AS 16
As 5
B. 1
BE 13
BY 23
By 7

For your reference, here is another example of running R word count map reduce program.

Hope this helps.

RHadoop - Rstudio - Install arulesViz library

Package curl has the following requirements (see https://cran.r-project.org/web/packages/curl/index.html):

SystemRequirements: libcurl: libcurl-devel (rpm) or libcurl4-openssl-dev (deb).

Install the library and it will probably work.

Install RHadoop on 32-bit Ubuntu

Unfortunately 32 bit is not supported and there are reports that it actually doesn't work because of details in the serialization code. There are some fixes in dev but we do not test on 32-bit so they may or may not work. Lastly we have a dedicated google group for RHadoop where we are trying to build a community. We are a small community so we can't afford to be dispersed over github, SO, FB, quora, .

String character in RHDFS output

If you look at the hdfs.write function in the source code, you can see that it can take raw bytes instead of having R serialize it for you. So essentially you can do this for characters

ofile = hdfs.file("brian.txt", "w")
hdfs.write(charToRaw("hi", ofile))
hdfs.close(ofile)


Related Topics



Leave a reply



Submit