How to use a string variable to select a data frame column using $ notation
If you have a variable x
with a column name in tmp
, tmp[,x]
or tmp[[x]]
are the correct ways to extract it. You cannot get R to treat tmp$x
as tmp$"Q5.3"
. tmp$x
will always refer to the item named "x" in "tmp".
Dynamically select data frame columns using $ and a character value
You can't do that kind of subsetting with $
. In the source code (R/src/main/subset.c
) it states:
/*The $ subset operator.
We need to be sure to only evaluate the first argument.
The second will be a symbol that needs to be matched, not evaluated.
*/
Second argument? What?! You have to realise that $
, like everything else in R, (including for instance (
, +
, ^
etc) is a function, that takes arguments and is evaluated. df$V1
could be rewritten as
`$`(df , V1)
or indeed
`$`(df , "V1")
But...
`$`(df , paste0("V1") )
...for instance will never work, nor will anything else that must first be evaluated in the second argument. You may only pass a string which is never evaluated.
Instead use [
(or [[
if you want to extract only a single column as a vector).
For example,
var <- "mpg"
#Doesn't work
mtcars$var
#These both work, but note that what they return is different
# the first is a vector, the second is a data.frame
mtcars[[var]]
mtcars[var]
You can perform the ordering without loops, using do.call
to construct the call to order
. Here is a reproducible example below:
# set seed for reproducibility
set.seed(123)
df <- data.frame( col1 = sample(5,10,repl=T) , col2 = sample(5,10,repl=T) , col3 = sample(5,10,repl=T) )
# We want to sort by 'col3' then by 'col1'
sort_list <- c("col3","col1")
# Use 'do.call' to call order. Seccond argument in do.call is a list of arguments
# to pass to the first argument, in this case 'order'.
# Since a data.frame is really a list, we just subset the data.frame
# according to the columns we want to sort in, in that order
df[ do.call( order , df[ , match( sort_list , names(df) ) ] ) , ]
col1 col2 col3
10 3 5 1
9 3 2 2
7 3 2 3
8 5 1 3
6 1 5 4
3 3 4 4
2 4 3 4
5 5 1 4
1 2 5 5
4 5 3 5
select columns in a dataframe via indexing notation and not column names
Step 0 : Initialize an empty list k
Step 1 : Iterate through all the columns using for loop on df.shape[1] "START"
Step 2 : Iterate through all rows in each column i did this using df. shape[0]
Step 3: search for "START"
Step 4: once found store the column no. and row number in variables.
Step 5: use those variables to index all rows and columns you want. so you use row+1 since you want everything below START and col ,col+ 1 so on.
step 6: add the dataframe to the list k
final step : you can see k[0] gives first instance of start, k[1] giving second instance of start, you can use this as more general code.
If you don't want all instances use break as soon as first dataframe is found.
k=[]
for i in range(df.shape[0]):
for y in range(df.shape[0]):
if df.iloc[y,i] == 'START':
col = i
row = y
k.append(df.iloc[row+1:,[col,col+1,col+2,-3,-2,-1]])
print("first START")
print(k[0])
print("\n Second START")
print(k[1])
access data frame column using variable
After creating your data frame, you need to use ?colnames. For example, you would have:
d = data.frame(a=c(1,2,3), b=c(4,5,6))
colnames(d) <- c("col1", "col2")
You can also name your variables when you create the data frame. For example:
d = data.frame(col1=c(1,2,3), col2=c(4,5,6))
Further, if you have the names of columns stored in variables, as in
a <- "col1"
you can't use $
to select a column via d$a
. R will look for a column whose name is a
. Instead, you can do either d[[a]]
or d[,a]
.
How to read Pandas Dataframe column information through string variable iteration
You won't be able to do it with '.' notation, but you should be able to do this in square brackets with a 'f' string.
for i in range(1, N + 1):
Access_Matrix.append(df[f"Var_{i}_Access"])
Or, perhaps a better approach would be to build up a list of the column names and extract them into a new dataframe in one go from df
, e.g.:
cols = [f"Var_{i}_Access" for i in range(1, N+1)]
all_cols = df[cols]
Reference R data frame column name from string
You can access the variables using the get()
command, like this:
var1 = "wt"
var2 = "mpg"
p <- ggplot(data=mtcars, aes(get(var1), get(var2)))
p + geom_point()
which outputs:
get
is a way of calling an object using a character string. e.g.
e<-c(1,2,3,4)
print("e")
[1] "e"
print(get("e"))
[1] 1 2 3 4
print(e)
[1] 1 2 3 4
identical(e,get("e"))
[1] TRUE
identical("e",e)
[1] FALSE
How to format string to place variable in python
You should definetly check on these indexing basic in pandas. So, about your answer you can use the most basic indexing by brackets [] and string column name, for example c_data['ABC'], so you can iterate like this:
c_data = pd.read_csv("/home/fileName.csv")
list = ['ABC', 'DEF']
for f in list:
print(c_data[f].unique())
If you want/need to use format method, you can just replace column name with formatted string:
c_data = pd.read_csv("/home/fileName.csv")
list = ['ABC', 'DEF']
for f in list:
print(c_data['{0}'.format(f)].unique()])
Also, you can use bracket indexing with a list of string, which will give you another DataFrame. Then you can iterate over DataFrame itself which will give you column names:
c_data = pd.read_csv("/home/fileName.csv")
f_data = c_data[['ABC', 'DEF']]
for f in f_data:
print(f_data[f].unique())
Related Topics
Argument Is of Length Zero in If Statement
Logical Operators (And, Or) with Na, True and False
Making a Stacked Bar Plot for Multiple Variables - Ggplot2 in R
Remove Groups with Less Than Three Unique Observations
Convert Named Character Vector to Data.Frame
Using Cut and Quartile to Generate Breaks in R Function
Sparklyr: How to Center a Spark Table Based on Column
How to Loop/Repeat a Linear Regression in R
Arrange Base Plots and Grid.Tables on the Same Page
How to Tell Lapply to Ignore an Error and Process the Next Thing in the List
Calculate Cumulative Average (Mean)
Reverse Order of Discrete Y Axis in Ggplot2
Cumulative Sum That Resets When 0 Is Encountered