Dynamically select data frame columns using $ and a character value
You can't do that kind of subsetting with $
. In the source code (R/src/main/subset.c
) it states:
/*The $ subset operator.
We need to be sure to only evaluate the first argument.
The second will be a symbol that needs to be matched, not evaluated.
*/
Second argument? What?! You have to realise that $
, like everything else in R, (including for instance (
, +
, ^
etc) is a function, that takes arguments and is evaluated. df$V1
could be rewritten as
`$`(df , V1)
or indeed
`$`(df , "V1")
But...
`$`(df , paste0("V1") )
...for instance will never work, nor will anything else that must first be evaluated in the second argument. You may only pass a string which is never evaluated.
Instead use [
(or [[
if you want to extract only a single column as a vector).
For example,
var <- "mpg"
#Doesn't work
mtcars$var
#These both work, but note that what they return is different
# the first is a vector, the second is a data.frame
mtcars[[var]]
mtcars[var]
You can perform the ordering without loops, using do.call
to construct the call to order
. Here is a reproducible example below:
# set seed for reproducibility
set.seed(123)
df <- data.frame( col1 = sample(5,10,repl=T) , col2 = sample(5,10,repl=T) , col3 = sample(5,10,repl=T) )
# We want to sort by 'col3' then by 'col1'
sort_list <- c("col3","col1")
# Use 'do.call' to call order. Seccond argument in do.call is a list of arguments
# to pass to the first argument, in this case 'order'.
# Since a data.frame is really a list, we just subset the data.frame
# according to the columns we want to sort in, in that order
df[ do.call( order , df[ , match( sort_list , names(df) ) ] ) , ]
col1 col2 col3
10 3 5 1
9 3 2 2
7 3 2 3
8 5 1 3
6 1 5 4
3 3 4 4
2 4 3 4
5 5 1 4
1 2 5 5
4 5 3 5
Dynamically select data frame columns using $
How about this:
dfList <- split(mtcars, mtcars$cyl)
cols <- c('8', '6','4')
dfList[[cols[[1]]]]$mpg
How to select a column in a dataframe dynamically in R
df %>% select( {{ col_name }} )
#or
df %>% select( !!col_name )
#or
df[[col_name]]
in last case you will obtain a vector instead of data frame
operating R dataframe using variables for the column name
You could use:
df[[calc_col]] <- abs(df[[var]] - lag(df[[var]], 12))
Using a string from a list to select a column in R
Try this:
list<-list("Var1", "Var2", "Var3")
df1 <- data.frame("Var1" = 1:2, "Var2" = c(21,15), "Var3" = c(10,9))
df2<- data.frame("Var1" = 1, "Var2" = 16, "Var3" = 8)
#Sum
df1$Var4<- df1[,list[[1]]]+df2[,list[[1]]]
Var1 Var2 Var3 Var4
1 1 21 10 2
2 2 15 9 3
pyspark - Dynamically select column content based on other column from the same row
One way is to create a map out of column names and values for each row, and then access the map with the value defined in a desired column.
What's cool about this is that it can work for as many columns as you want.
Example:
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
data = [
{"categoryName": "catA", "catA": 0.25, "catB": 0.75},
{"categoryName": "catB", "catA": 0.5, "catB": 0.7},
]
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame(data)
df = (
df.withColumn(
"map", F.expr("map(" + ",".join([f"'{c}', {c}" for c in df.columns]) + ")")
)
.withColumn("score", F.expr("map[categoryName]"))
.drop("map")
)
Result:
+----+----+------------+-----+
|catA|catB|categoryName|score|
+----+----+------------+-----+
|0.25|0.75|catA |0.25 |
|0.5 |0.7 |catB |0.7 |
+----+----+------------+-----+
How to use a string variable to select a data frame column using $ notation
If you have a variable x
with a column name in tmp
, tmp[,x]
or tmp[[x]]
are the correct ways to extract it. You cannot get R to treat tmp$x
as tmp$"Q5.3"
. tmp$x
will always refer to the item named "x" in "tmp".
How to refer to columns in a table via a character variable in r?
# A tibble: 2 x 2
hello world
<dbl> <dbl>
1 5 6
2 5 7
> col = 'hello'
> A[[col]]
[1] 5 5
Dynamically select column from data frame in function
Use deparse(substitute(name))
, i.e.
f2 <- function(data, name)
data[deparse(substitute(name))]
For example:
data <- data.frame(id=1:5, val=letters[1:5])
f2(data, val)
## val
##1 a
##2 b
##3 c
##4 d
##5 e
For multiple columns selection you may use:
f3 <- function(data, ...) {
cols <- sapply(as.list(match.call())[-(1:2)], as.character)
data[cols]
}
Which gives e.g.:
f3(data, val, id)
## val id
## 1 a 1
## 2 b 2
## 3 c 3
## 4 d 4
## 5 e 5
Related Topics
How to Remove Na from a Factor Variable (And from a Ggplot Chart)
How to Convert a Data Frame Column to Numeric Type
How to Remove the Negative Values from a Data Frame in R
Sum Across Multiple Columns With Dplyr
Ggplot With 2 Y Axes on Each Side and Different Scales
Why Is '[' Better Than 'Subset'
How to Escape Backslashes in R String
How to Convert Variable With Mixed Date Formats to One Format
Create Grouping Variable For Consecutive Sequences and Split Vector
How to Plot Two Histograms Together in R
How to Show Code But Hide Output in Rmarkdown
R - Getting Characters After Symbol
Aggregate/Summarize Multiple Variables Per Group (E.G. Sum, Mean)
Numbering Rows Within Groups in a Data Frame
Finding Local Maxima and Minima
How to Combine Multiple Conditions to Subset a Data-Frame Using "Or"