How to Print the Name of Current Row When Using Apply in R

How to print the name of current row when using apply in R?

As far as I know you cannot do that with apply, but you could loop through the rownames of your data frame. Lame example:

lapply(rownames(mtcars), function(x) sprintf('The mpg of %s is %s.', x, mtcars[x, 1]))

Print column name in an r apply and save as new column on a dataframe

I think a lot of people will look for this same issue, so I'm answering my own question (having eventually found the answers). As below, there are other answers to both parts (thanks!) but non-combining these issues (and some of the examples are more complex).

First, it seems the "colnames" element really isn't something you can get around (seems weird to me!), so you 'loop' over the column names, and within the function call the actual vectors by name [c(x)].

Then the key thing is that to assign, so create your new columns, within an apply, you use '<<'

apply(colnames(df[c("a","b","c")]),function(x) {
z <- (ChISEQCIS[c(paste0(x))]/ChISEQCIS[c("V1")])
ChISEQCIS[c(paste0(x,"ind"))] <<- z
}
)

The << is discussed e.g. https://stackoverflow.com/questions/2628621/how-do-you-use-scoping-assignment-in-r

I got confused because I only vaguely thought about wanting to save the outputs initially and I figured I needed both the column (I incorrectly assumed apply worked like a loop so I could use a counter as an index or something) and that there should be same way to get the name separately (e.g. colname(x)).

There are a couple of related stack questions:

  • https://stackoverflow.com/questions/9624866/access-to-column-name-of-dataframe-with-apply-function
  • https://stackoverflow.com/questions/21512041/printing-a-column-name-inside-lapply-function
  • https://stackoverflow.com/questions/10956873/how-to-print-the-name-of-current-row-when-using-apply-in-r
  • https://stackoverflow.com/questions/7681013/apply-over-matrix-by-column-any-way-to-get-column-name (easiest to understand)

how to get row names from the apply() function output?

Here, the sd returns a single value and as the apply is with MARGIN = 2 i,e columnwise, we are getting a named vector. So, names(out) would get the names instead of row.names. Using a reproducible example with the inbuilt dataset iris

data(iris)
out <- apply(iris[1:4], 2, sd, na.rm = TRUE)
names(out)
#[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"

Also, by wrapping the output of apply with data.frame, we can use the row.names

out1 <- data.frame(val = out)
row.names(out1)
#[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"

If we need a data.frame as output, this can he directly created with data.frame call

data.frame(names = names(out), values = out)

Also, this can be done in tidyverse

library(dplyr)
library(tidyr)
iris %>%
summarise_if(is.numeric, sd, na.rm = TRUE) %>%
gather
# key value
#1 Sepal.Length 0.8280661
#2 Sepal.Width 0.4358663
#3 Petal.Length 1.7652982
#4 Petal.Width 0.7622377

Or convert to a list and enframe

library(tibble)
iris %>%
summarise_if(is.numeric, sd, na.rm = TRUE) %>%
as.list %>%
enframe

Row/column counter in 'apply' functions

What I usually do is to run sapply on the row numbers 1:nrow(test) instead of test, and use test[i,] inside the function:

t(sapply(1:nrow(test), function(i) test[i,]^(1/i)))

I am not sure this is really efficient, though.

Get column names in apply function with data frame (R)

Here is how I would do it with purrr::iwalk():

purrr::iwalk(airquality, ~ message(sprintf("%s has %s cases.\nNA values: %s",
.y,
sum(!is.na(.x)),
sum(is.na(.x)))))

Output:

Ozone has 116 cases.
NA values: 37
Solar.R has 146 cases.
NA values: 7
Wind has 153 cases.
NA values: 0
Temp has 153 cases.
NA values: 0
Month has 153 cases.
NA values: 0
Day has 153 cases.
NA values: 0

R print out matrix with row and column names using apply

One way is to use the melt function from the reshape2 package.

x <- matrix(1:4, nrow = 2, ncol = 2,
dimnames = list(dim1 = c("a", "b"), dim2 = c("a", "b")))

library(reshape2)
melt(x)
# dim1 dim2 value
# 1 a a 1
# 2 b a 2
# 3 a b 3
# 4 b b 4

Edit
If your data is so big that speed is an issue, I would also suggest:

data.frame(dim1 = rep(rownames(x), ncol(x)),
dim2 = rep(colnames(x), each = nrow(x)),
value = c(x))

Edit2

After testing with relatively big data, I would not rule out melt:

x <- matrix(runif(9e6), nrow = 3000, ncol = 3000,
dimnames = list(dim1 = paste0("x", runif(3000)),
dim2 = paste0("y", runif(3000))))

system.time(y1 <- melt(x))
# user system elapsed
# 1.17 0.44 1.61

system.time(y2 <- data.frame(dim1 = rep(rownames(x), ncol(x)),
dim2 = rep(colnames(x), each = nrow(x)),
value = c(x)))
# user system elapsed
# 1.98 0.37 2.36

Access to column name of dataframe with *apply function

In this case apply is what you need. All of the data columns are of the same type and you don't have any worries about loosing attributes, which is where apply causes problems. You will need to write your function differently so it just takes one vector of length 4:

 fDist <- function(vec) {
return (0.1*((vec[1] - vec[2])^2 + (vec[3]-vec[4])^2)^0.5)
}
data$f_dist <- apply(data, 1, fDist)
data
X1 Y1 X2 Y2 f_dist
1 3.5 2.1 4.1 2.9 0.1843909
2 3.1 1.2 0.8 4.3 0.3982462

If you wanted to use the names of the columns in 'data' then they need to be spelled correctly:

 fDist <- function(vec) {
return (0.1*((vec['X1'] - vec['X2'])^2 + (vec['Y1']-vec['Y2'])^2)^0.5)
}
data$f_dist <- apply(data, 1, fDist)
data
#--------
X1 Y1 X2 Y2 f_dist
1 3.5 2.1 4.1 2.9 0.1000000
2 3.1 1.2 0.8 4.3 0.3860052

Your updated (and very different) question is easy to resolve. When you use apply it coerces to the lowest common mode denominator, in this case 'character'. You have two choices: either 1) add as.numeric to all of your arguments inside the functions, or 2) only send the columns that are needed which I will illustrate:

data2$f_dist <- apply(data2[ , c("X2", "Y2") ], 1, function(coords) 
{fDist2(data2[1,]$X1,data2[1,]$Y1, coords)} )

I really do not like how you are passing parameters to this function. Using "[" and "$" within the formals list "just looks wrong." And you should know that "df" will not be a dataframe, but rather a vector. Because it's not a dataframe (or a list) you should alter the function inside so that it uses "[" rather than "[[". Since you only want two of the coordinates, then only pass the two (numeric) ones that you would be using.

R: rownames, colnames, dimnames and names in apply

I think your confusion stems from the fact that apply does not pass an array (or matrix) to the function specified in FUN.

It passes each row of the matrix in turn. Each row is itself "only" a (named) vector:

> m[1,]
a b c
0.48768161 0.61447934 0.08718875

So your function has only this named vector to work with.

For your middle example, as documented in apply:

If each call to FUN returns a vector of length n, then apply returns
an array of dimension c(n, dim(X)[MARGIN]) if n > 1. If n equals 1,
apply returns a vector if MARGIN has length 1 and an array of
dimension dim(X)[MARGIN] otherwise.

So function(x) names(x) returns a vector of length 3 for each row, so the final result is the matrix you see. But that matrix is being constructed at the end of the apply function, on the results of FUN being applied to each row individually.

R: Accessing and using names of data.frames within a list during an `apply` function

This is really, really, really dirty, but I think it works as you described. Of course for more than nine data.frames it needs adjustment in the substitutes part.

csvout <- function (y, csvnames) {
write.table(y, file = paste0("test",
csvnames[as.numeric(substr(deparse(substitute(y)),4,4))],
".csv"),
sep = ",",
row.names = FALSE)
}

sapply(somelist, FUN=csvout, names(somelist))

I suppose you know that, but if you implemented a FOR-loop instead of sapply this would be much easier because you could directly reference the data.frame names with the names function.

Edit:
This is the FOR-loop solution which works no matter how many data.frames you've got:

csvout <- function (y) {
for (i in 1:length(y)){
write.table(y[i], file = paste0("test",
names(y)[i],
".csv"),
sep = ",",
row.names = FALSE)
}
}

csvout(somelist)


Related Topics



Leave a reply



Submit