How to Use a String Variable to Select a Data Frame Column Using $ Notation

How to use a string variable to select a data frame column using $ notation

If you have a variable x with a column name in tmp, tmp[,x] or tmp[[x]] are the correct ways to extract it. You cannot get R to treat tmp$x as tmp$"Q5.3". tmp$x will always refer to the item named "x" in "tmp".

Dynamically select data frame columns using $ and a character value

You can't do that kind of subsetting with $. In the source code (R/src/main/subset.c) it states:

/*The $ subset operator.

We need to be sure to only evaluate the first argument.

The second will be a symbol that needs to be matched, not evaluated.

*/

Second argument? What?! You have to realise that $, like everything else in R, (including for instance ( , + , ^ etc) is a function, that takes arguments and is evaluated. df$V1 could be rewritten as

`$`(df , V1)

or indeed

`$`(df , "V1")

But...

`$`(df , paste0("V1") )

...for instance will never work, nor will anything else that must first be evaluated in the second argument. You may only pass a string which is never evaluated.

Instead use [ (or [[ if you want to extract only a single column as a vector).

For example,

var <- "mpg"
#Doesn't work
mtcars$var
#These both work, but note that what they return is different
# the first is a vector, the second is a data.frame
mtcars[[var]]
mtcars[var]

You can perform the ordering without loops, using do.call to construct the call to order. Here is a reproducible example below:

#  set seed for reproducibility
set.seed(123)
df <- data.frame( col1 = sample(5,10,repl=T) , col2 = sample(5,10,repl=T) , col3 = sample(5,10,repl=T) )

# We want to sort by 'col3' then by 'col1'
sort_list <- c("col3","col1")

# Use 'do.call' to call order. Seccond argument in do.call is a list of arguments
# to pass to the first argument, in this case 'order'.
# Since a data.frame is really a list, we just subset the data.frame
# according to the columns we want to sort in, in that order
df[ do.call( order , df[ , match( sort_list , names(df) ) ] ) , ]

col1 col2 col3
10 3 5 1
9 3 2 2
7 3 2 3
8 5 1 3
6 1 5 4
3 3 4 4
2 4 3 4
5 5 1 4
1 2 5 5
4 5 3 5

select columns in a dataframe via indexing notation and not column names

Step 0 : Initialize an empty list k

Step 1 : Iterate through all the columns using for loop on df.shape[1] "START"

Step 2 : Iterate through all rows in each column i did this using df. shape[0]

Step 3: search for "START"

Step 4: once found store the column no. and row number in variables.

Step 5: use those variables to index all rows and columns you want. so you use row+1 since you want everything below START and col ,col+ 1 so on.

step 6: add the dataframe to the list k

final step : you can see k[0] gives first instance of start, k[1] giving second instance of start, you can use this as more general code.
If you don't want all instances use break as soon as first dataframe is found.

k=[]
for i in range(df.shape[0]):
for y in range(df.shape[0]):
if df.iloc[y,i] == 'START':
col = i
row = y
k.append(df.iloc[row+1:,[col,col+1,col+2,-3,-2,-1]])
print("first START")
print(k[0])
print("\n Second START")
print(k[1])

access data frame column using variable

After creating your data frame, you need to use ?colnames. For example, you would have:

d = data.frame(a=c(1,2,3), b=c(4,5,6))
colnames(d) <- c("col1", "col2")

You can also name your variables when you create the data frame. For example:

d = data.frame(col1=c(1,2,3), col2=c(4,5,6))

Further, if you have the names of columns stored in variables, as in

a <- "col1"

you can't use $ to select a column via d$a. R will look for a column whose name is a. Instead, you can do either d[[a]] or d[,a].

How to read Pandas Dataframe column information through string variable iteration

You won't be able to do it with '.' notation, but you should be able to do this in square brackets with a 'f' string.

for i in range(1, N + 1):
Access_Matrix.append(df[f"Var_{i}_Access"])

Or, perhaps a better approach would be to build up a list of the column names and extract them into a new dataframe in one go from df, e.g.:

cols = [f"Var_{i}_Access" for i in range(1, N+1)]
all_cols = df[cols]

Reference R data frame column name from string

You can access the variables using the get() command, like this:

var1 = "wt"
var2 = "mpg"
p <- ggplot(data=mtcars, aes(get(var1), get(var2)))
p + geom_point()

which outputs: Sample Image

get is a way of calling an object using a character string. e.g.

e<-c(1,2,3,4)

print("e")
[1] "e"
print(get("e"))
[1] 1 2 3 4
print(e)
[1] 1 2 3 4

identical(e,get("e"))
[1] TRUE
identical("e",e)
[1] FALSE

How to format string to place variable in python

You should definetly check on these indexing basic in pandas. So, about your answer you can use the most basic indexing by brackets [] and string column name, for example c_data['ABC'], so you can iterate like this:

c_data = pd.read_csv("/home/fileName.csv")
list = ['ABC', 'DEF']
for f in list:
print(c_data[f].unique())

If you want/need to use format method, you can just replace column name with formatted string:

c_data = pd.read_csv("/home/fileName.csv")
list = ['ABC', 'DEF']
for f in list:
print(c_data['{0}'.format(f)].unique()])

Also, you can use bracket indexing with a list of string, which will give you another DataFrame. Then you can iterate over DataFrame itself which will give you column names:

c_data = pd.read_csv("/home/fileName.csv")
f_data = c_data[['ABC', 'DEF']]
for f in f_data:
print(f_data[f].unique())


Related Topics



Leave a reply



Submit