Getting the Column Names of a Data Frame with Sapply

sapply + if - retain column names

The root of your issue is that you are using apply on a data frame. apply is built to work on matrices, so the first thing it does is convert your data frame to a matrix, which is unnecessary, and then the default data frame methods when you convert back "fix" the column names in a way you don't like. You may be able to fix this by adding check.names = FALSE to your as.data.frame() call, but a better approach would use lapply on a data frame, apply on a matrix, and even have it work if we give it a vector input.

I'd also strongly recommend not overwriting the built-in scale function with a similar-but-different function. That could easily cause bugs. I've rewritten your function calling it scale01() to make the distinction clear.

I also modified it so if the input is a constant vector with missing values, only the non-missing values will be filled in with 0.5, which seems safer.

I use S3 dispatch to work appropriately based on the input class, built on a default method that works on numeric vectors. Here it is, demonstrated on vector, data.frame, and matrix inputs:

## defining the functions
scale01 = function(x, ...) {
UseMethod("scale01")
}

scale01.numeric = function(x, ...) {
minx = min(x, na.rm = TRUE)
maxx = max(x, na.rm = TRUE)
if(minx == maxx) {
x[!is.na(x)] = 0.5
return(x)
}
(x - minx) / (maxx - minx)
}

scale01.data.frame = function(x, ...) {
x[] = lapply(x, scale01)
x
}

scale01.matrix = function(x, ...) {
apply(x, MARGIN = 2, FUN = scale01)
}
## demonstrating usage

scale01(rnorm(5))
# [1] 0.0000000 1.0000000 0.4198958 0.6104154 0.2108150

scale01(mtcars[1:5, ])
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 0.5609756 0.5 0.2063492 0.2073171 1.00000000 0.2678571 0.0000000 0 1 1 1.0000000
# Mazda RX4 Wag 0.5609756 0.5 0.2063492 0.2073171 1.00000000 0.4955357 0.1879195 0 1 1 1.0000000
# Datsun 710 1.0000000 0.0 0.0000000 0.0000000 0.93902439 0.0000000 0.7214765 1 1 1 0.0000000
# Hornet 4 Drive 0.6585366 0.5 0.5952381 0.2073171 0.00000000 0.7991071 1.0000000 1 0 0 0.0000000
# Hornet Sportabout 0.0000000 1.0 1.0000000 1.0000000 0.08536585 1.0000000 0.1879195 0 0 0 0.3333333

scale01(as.matrix(mtcars[1:5, ]))
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 0.5609756 0.5 0.2063492 0.2073171 1.00000000 0.2678571 0.0000000 0 1 1 1.0000000
# Mazda RX4 Wag 0.5609756 0.5 0.2063492 0.2073171 1.00000000 0.4955357 0.1879195 0 1 1 1.0000000
# Datsun 710 1.0000000 0.0 0.0000000 0.0000000 0.93902439 0.0000000 0.7214765 1 1 1 0.0000000
# Hornet 4 Drive 0.6585366 0.5 0.5952381 0.2073171 0.00000000 0.7991071 1.0000000 1 0 0 0.0000000
# Hornet Sportabout 0.0000000 1.0 1.0000000 1.0000000 0.08536585 1.0000000 0.1879195 0 0 0 0.3333333

weird_name_df = data.frame(`weird column` = rnorm(5), `INL_Avg(S-B0-ETC-CDS-06C~PM_CD1_D_B0_SI_P0V_B.NM)` = rnorm(5), check.names = FALSE)
scale01(weird_name_df)
# weird column INL_Avg(S-B0-ETC-CDS-06C~PM_CD1_D_B0_SI_P0V_B.NM)
# 1 0.6135744 0.2237905
# 2 0.0000000 0.4086837
# 3 1.0000000 1.0000000
# 4 0.7061441 0.2803262
# 5 0.7693184 0.0000000

If you want to transform all the numeric columns of a data frame, I would suggest:

## base version
numeric_cols = sapply(your_data, is.numeric)
your_data[numeric_cols] = scale01(your_data[numeric_cols])

## dplyr version
library(dplyr)
your_data %>%
mutate(across(where(is.numeric), scale01))

Get column names in apply function with data frame (R)

Here is how I would do it with purrr::iwalk():

purrr::iwalk(airquality, ~ message(sprintf("%s has %s cases.\nNA values: %s",
.y,
sum(!is.na(.x)),
sum(is.na(.x)))))

Output:

Ozone has 116 cases.
NA values: 37
Solar.R has 146 cases.
NA values: 7
Wind has 153 cases.
NA values: 0
Temp has 153 cases.
NA values: 0
Month has 153 cases.
NA values: 0
Day has 153 cases.
NA values: 0

Get column name in apply function

Well, if nobody answers you got to find out yourself... And I found out that you can call sapply with an index list and use this index in the function. So the solution is:

   x <- c(1,1,2,2,2,3)
y <- c(2,3,4,5,4,4)
Tb <- data.frame(x,y)
Dq_Hist <- function(i){
Name <- colnames(Tb)[i]
Ttl <- paste('Variable: ',Name,'')
hist(Tb[,i],main=Ttl,col=c('grey'),xlab=Name)
}
D <- sapply(1:ncol(Tb),Dq_Hist)

Access to column name of dataframe with *apply function

In this case apply is what you need. All of the data columns are of the same type and you don't have any worries about loosing attributes, which is where apply causes problems. You will need to write your function differently so it just takes one vector of length 4:

 fDist <- function(vec) {
return (0.1*((vec[1] - vec[2])^2 + (vec[3]-vec[4])^2)^0.5)
}
data$f_dist <- apply(data, 1, fDist)
data
X1 Y1 X2 Y2 f_dist
1 3.5 2.1 4.1 2.9 0.1843909
2 3.1 1.2 0.8 4.3 0.3982462

If you wanted to use the names of the columns in 'data' then they need to be spelled correctly:

 fDist <- function(vec) {
return (0.1*((vec['X1'] - vec['X2'])^2 + (vec['Y1']-vec['Y2'])^2)^0.5)
}
data$f_dist <- apply(data, 1, fDist)
data
#--------
X1 Y1 X2 Y2 f_dist
1 3.5 2.1 4.1 2.9 0.1000000
2 3.1 1.2 0.8 4.3 0.3860052

Your updated (and very different) question is easy to resolve. When you use apply it coerces to the lowest common mode denominator, in this case 'character'. You have two choices: either 1) add as.numeric to all of your arguments inside the functions, or 2) only send the columns that are needed which I will illustrate:

data2$f_dist <- apply(data2[ , c("X2", "Y2") ], 1, function(coords) 
{fDist2(data2[1,]$X1,data2[1,]$Y1, coords)} )

I really do not like how you are passing parameters to this function. Using "[" and "$" within the formals list "just looks wrong." And you should know that "df" will not be a dataframe, but rather a vector. Because it's not a dataframe (or a list) you should alter the function inside so that it uses "[" rather than "[[". Since you only want two of the coordinates, then only pass the two (numeric) ones that you would be using.

R - refer to column names rather than column index when using lapply with data frame

You can use sapply() as follows. The problem in this example is that you cannot set ranges of columns by name easily.

cols <- c("A", "B", "D", "F", "G", "H")

df[,cols] <- sapply(df[,cols], \(x) (5:1)[x])

The easiest way to select by a range of columns is to use eval_select() to return their positions by number. But if you do this, you might as well just use the dplyr solution. This is essentially an under the hood look at it.

library(tidyselect)

col_pos <- eval_select(expr(c(A:B, D, F:H)), df)

df[,col_pos] <- sapply(df[,col_pos], \(x) (5:1)[x])

Using lapply to set column names for a list of data frames?

It seems you want to update the original dataframes. In that case, your list MUST be named. ie check the code below.

List <- list(a = a, b = b, c = c, d = d)
list2env(lapply(List, setNames, nm = headers), globalenv())

Now if you call a you will note that it has been updated.



Related Topics



Leave a reply



Submit