Dynamically Select Data Frame Columns Using $ and a Character Value

Dynamically select data frame columns using $ and a character value

You can't do that kind of subsetting with $. In the source code (R/src/main/subset.c) it states:

/*The $ subset operator.

We need to be sure to only evaluate the first argument.

The second will be a symbol that needs to be matched, not evaluated.

*/

Second argument? What?! You have to realise that $, like everything else in R, (including for instance ( , + , ^ etc) is a function, that takes arguments and is evaluated. df$V1 could be rewritten as

`$`(df , V1)

or indeed

`$`(df , "V1")

But...

`$`(df , paste0("V1") )

...for instance will never work, nor will anything else that must first be evaluated in the second argument. You may only pass a string which is never evaluated.

Instead use [ (or [[ if you want to extract only a single column as a vector).

For example,

var <- "mpg"
#Doesn't work
mtcars$var
#These both work, but note that what they return is different
# the first is a vector, the second is a data.frame
mtcars[[var]]
mtcars[var]

You can perform the ordering without loops, using do.call to construct the call to order. Here is a reproducible example below:

#  set seed for reproducibility
set.seed(123)
df <- data.frame( col1 = sample(5,10,repl=T) , col2 = sample(5,10,repl=T) , col3 = sample(5,10,repl=T) )

# We want to sort by 'col3' then by 'col1'
sort_list <- c("col3","col1")

# Use 'do.call' to call order. Seccond argument in do.call is a list of arguments
# to pass to the first argument, in this case 'order'.
# Since a data.frame is really a list, we just subset the data.frame
# according to the columns we want to sort in, in that order
df[ do.call( order , df[ , match( sort_list , names(df) ) ] ) , ]

col1 col2 col3
10 3 5 1
9 3 2 2
7 3 2 3
8 5 1 3
6 1 5 4
3 3 4 4
2 4 3 4
5 5 1 4
1 2 5 5
4 5 3 5

Dynamically select data frame columns using $

How about this:

dfList <- split(mtcars, mtcars$cyl)

cols <- c('8', '6','4')

dfList[[cols[[1]]]]$mpg

How to select a column in a dataframe dynamically in R

df %>% select( {{ col_name }} )
#or
df %>% select( !!col_name )
#or
df[[col_name]]

in last case you will obtain a vector instead of data frame

operating R dataframe using variables for the column name

You could use:

df[[calc_col]] <- abs(df[[var]] - lag(df[[var]], 12))

Using a string from a list to select a column in R

Try this:

list<-list("Var1", "Var2", "Var3")
df1 <- data.frame("Var1" = 1:2, "Var2" = c(21,15), "Var3" = c(10,9))
df2<- data.frame("Var1" = 1, "Var2" = 16, "Var3" = 8)
#Sum
df1$Var4<- df1[,list[[1]]]+df2[,list[[1]]]

Var1 Var2 Var3 Var4
1 1 21 10 2
2 2 15 9 3

pyspark - Dynamically select column content based on other column from the same row

One way is to create a map out of column names and values for each row, and then access the map with the value defined in a desired column.

What's cool about this is that it can work for as many columns as you want.

Example:

from pyspark.sql import SparkSession
import pyspark.sql.functions as F

data = [
{"categoryName": "catA", "catA": 0.25, "catB": 0.75},
{"categoryName": "catB", "catA": 0.5, "catB": 0.7},
]

spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame(data)
df = (
df.withColumn(
"map", F.expr("map(" + ",".join([f"'{c}', {c}" for c in df.columns]) + ")")
)
.withColumn("score", F.expr("map[categoryName]"))
.drop("map")
)

Result:

+----+----+------------+-----+                                                  
|catA|catB|categoryName|score|
+----+----+------------+-----+
|0.25|0.75|catA |0.25 |
|0.5 |0.7 |catB |0.7 |
+----+----+------------+-----+

How to use a string variable to select a data frame column using $ notation

If you have a variable x with a column name in tmp, tmp[,x] or tmp[[x]] are the correct ways to extract it. You cannot get R to treat tmp$x as tmp$"Q5.3". tmp$x will always refer to the item named "x" in "tmp".

How to refer to columns in a table via a character variable in r?

# A tibble: 2 x 2
hello world
<dbl> <dbl>
1 5 6
2 5 7
> col = 'hello'
> A[[col]]
[1] 5 5

Dynamically select column from data frame in function

Use deparse(substitute(name)), i.e.

f2 <- function(data, name)
data[deparse(substitute(name))]

For example:

data <- data.frame(id=1:5, val=letters[1:5])
f2(data, val)
## val
##1 a
##2 b
##3 c
##4 d
##5 e

For multiple columns selection you may use:

f3 <- function(data, ...) {
cols <- sapply(as.list(match.call())[-(1:2)], as.character)
data[cols]
}

Which gives e.g.:

f3(data, val, id)
## val id
## 1 a 1
## 2 b 2
## 3 c 3
## 4 d 4
## 5 e 5


Related Topics



Leave a reply



Submit