Doing a Plyr Operation on Every Row of a Data Frame in R

doing a plyr operation on every row of a data frame in R

Just treat it like an array and work on each row:

adply(df, 1, transform, max = max(x, y))

Applying a function to every row of a table using dplyr?

As of dplyr 0.2 (I think) rowwise() is implemented, so the answer to this problem becomes:

iris %>% 
rowwise() %>%
mutate(Max.Len= max(Sepal.Length,Petal.Length))

Non rowwise alternative

Five years (!) later this answer still gets a lot of traffic. Since it was given, rowwise is increasingly not recommended, although lots of people seem to find it intuitive. Do yourself a favour and go through Jenny Bryan's Row-oriented workflows in R with the tidyverse material to get a good handle on this topic.

The most straightforward way I have found is based on one of Hadley's examples using pmap:

iris %>% 
mutate(Max.Len= purrr::pmap_dbl(list(Sepal.Length, Petal.Length), max))

Using this approach, you can give an arbitrary number of arguments to the function (.f) inside pmap.

pmap is a good conceptual approach because it reflects the fact that when you're doing row wise operations you're actually working with tuples from a list of vectors (the columns in a dataframe).

Call apply-like function on each row of dataframe with multiple arguments from each row

You can apply apply to a subset of the original data.

 dat <- data.frame(x=c(1,2), y=c(3,4), z=c(5,6))
apply(dat[,c('x','z')], 1, function(x) sum(x) )

or if your function is just sum use the vectorized version:

rowSums(dat[,c('x','z')])
[1] 6 8

If you want to use testFunc

 testFunc <- function(a, b) a + b
apply(dat[,c('x','z')], 1, function(x) testFunc(x[1],x[2]))

EDIT To access columns by name and not index you can do something like this:

 testFunc <- function(a, b) a + b
apply(dat[,c('x','z')], 1, function(y) testFunc(y['z'],y['x']))

Apply a function to every row of a matrix or a data frame

You simply use the apply() function:

R> M <- matrix(1:6, nrow=3, byrow=TRUE)
R> M
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
R> apply(M, 1, function(x) 2*x[1]+x[2])
[1] 4 10 16
R>

This takes a matrix and applies a (silly) function to each row. You pass extra arguments to the function as fourth, fifth, ... arguments to apply().

Applying a function on each row of a data frame in R

You may have to use lapply instead of apply to force the result to be a list.

> rhymesWithBrave <- function(x) substring(x,nchar(x)-2) =="ave"
> do.call(rbind,lapply(1:nrow(dfr),function(i,dfr)
+ if(rhymesWithBrave(dfr[i,"name"])) dfr[i,] else NULL,
+ dfr))
id size name
1 1 100 dave

But in this case, subset would be more appropriate:

> subset(dfr,rhymesWithBrave(name))
id size name
1 1 100 dave

If you want to perform additional transformations before returning the result, you can go back to the lapply approach above:

> add100tosize <- function(x) within(x,size <- size+100)
> do.call(rbind,lapply(1:nrow(dfr),function(i,dfr)
+ if(rhymesWithBrave(dfr[i,"name"])) add100tosize(dfr[i,])
+ else NULL,dfr))
id size name
1 1 200 dave

Or, in this simple case, apply the function to the output of subset.

> add100tosize(subset(dfr,rhymesWithBrave(name)))
id size name
1 1 200 dave

UPDATE:

To select rows that do not fall between start and end, you might construct a different function (note: when summing result of boolean/logical vectors, TRUE values are converted to 1s and FALSE values are converted to 0s)

test <- function(x)
rowSums(mapply(function(start,end,x) x >= start & x <= end,
start=c(100,250,698,1988),
end=c(200,400,1520,2147))) == 0

subset(dfr,test(size))

creating a data frame with lapply(and plyr package)

I think this deals with what you're trying to do. Basically, I think you need to read all the data in once, then deal with that data.frame. There are several questions dealing with how to read it all in, here is how I would do it so I maintain a record of which file each row in the data.frame comes from, which can also be used for grouping:

filenames <- list.files(".", pattern="^[2-3].txt")
import <- mdply(filenames, read.table, header = T, quote = "\"")
import$file <- filenames[import$X1]

Now import is a big dataframe with all your files in it (I'm assuming your pattern recognition etc for reading in files is correct). You can then do summaries based on whatever criteria you like.

I'm not sure what you're trying to achieve in line 3 of your code above, but for the ddply below that, you just need to do:

ddply(import[import$Reqresponse==9,],.(Condition,Reqresponse,file),summarise,Score=mean(Score)) 

There's so much going on in the rest of your code that it's hard to make out exactly what you want.

I think the important thing is that to make this efficient, and easier to follow, you need to read your data in once, then work on that dataset - making subsets if necessary, doing summary stats or whatever else it is.

As an example of how you can work with this, here's an attempt to deal with your problem of dealing with trials (rows?) that have reqresponse == 9 and the following two. There are probably ways of doing this more efficiently, but this is slightly based on how you were doing it to show you briefly how to work with the larger dataframe. Now modified to remove the first two trials of each file:

  import.clean <- ddply(import, .(file), function(x) {
index <- which(x$reqresponse == 9)
if(length(index) > 0) {
index <- unique(c(index, index + 1, index + 2, 1, 2))
}
else index <- c(1,2)
x <- x[-index,]
return(x)
})

Is there a way to apply plyr's count() function to every column individually?

You can use lapply:

lapply(df, plyr::count)

R: converting each row of a data frame into a list item

Just use split. It's a few times faster than your adply line.

> system.time(myList <- alply( df, 1, function(x) data.frame(x) ))
user system elapsed
7.53 0.00 7.57
> system.time( splitList <- split(df, 1:NROW(df)) )
user system elapsed
1.73 0.00 1.74
>

I suspect the parallel backend on adply is only for function evaluation (not splitting and re-combining).

UPDATE:

If you can convert your data.frame to a matrix, the solution below will be über-fast. You may be able to use split, but it will drop names and return a vector in each list element.

> m <- as.matrix(df)
> system.time( matrixList <- lapply(1:NROW(m), function(i) m[i,,drop=FALSE]) )
user system elapsed
0.02 0.00 0.02
> str(matrixList[[1]])
num [1, 1:2] -0.0956 -1.5887
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "a" "b"
> system.time( matrixSplitList <- split(m, 1:NROW(m)) )
user system elapsed
0.01 0.00 0.02
> str(matrixSplitList[[1]])
num [1:2] -0.0956 -1.5887


Related Topics



Leave a reply



Submit