Select values from different columns based on a variable containing column names
An excuse to use the obscure .BY
:
DT[, newval := .SD[[.BY[[1]]]], by=new]
col1 col2 col3 new newval
1: 1 4 55 col1 1
2: 2 3 44 col2 3
3: 3 34 35 col2 34
4: 4 44 87 col3 87
How it works. This splits the data into groups based on the strings in new
. The value of the string for each group is stored in newname = .BY[[1]]
. We use this string to select the corresponding column of .SD
via .SD[[newname]]
. .SD
stands for Subset of Data.
Alternatives. get(.BY[[1]])
should work just as well in place of .SD[[.BY[[1]]]]
. According to a benchmark run by @David, the two ways are equally fast.
How do I return values from multiple columns when the column names are based on a variable result
The following query using a dynamic UNPIVOT
operation will do the work:
CREATE TABLE #yourTable ( [record id] INT,[current stage] VARCHAR(255), [met client] DATE, [contract agreed] DATE, [service completed] DATE, [on hold] DATE)
INSERT INTO #yourTable VALUES
(11111, 'met client', '2019-01-02', NULL, NULL, NULL),
(22222, 'contract agreed', '2019-01-02', '2019-01-20', NULL, NULL),
(33333, 'on hold', '2019-01-02', '2019-01-20', NULL, '2019-02-10'),
(44444, 'service completed', '2019-01-02', '2019-01-20', '2019-03-01', '2019-02-10')
DECLARE @col NVARCHAR(MAX) = '';
SELECT @col += ',' + QUOTENAME([current stage]) FROM #yourTable
SET @col = STUFF(@col,1,1,'')
EXEC ( 'SELECT unpiv.[record id], unpiv.[current stage], [Date] AS [Date_of_current_stage] FROM #yourTable UNPIVOT ([Date] FOR [Stage] IN ('+@col+') ) unpiv WHERE [current stage] = [Stage]')
Create new column containing names of other columns based on values of those columns
You can use the following code:
library(dplyr)
df %>%
rowwise() %>%
mutate(V4 = paste0(names(.)[c_across() == 1], collapse = ','))
Output:
# A tibble: 4 × 4
# Rowwise:
V1 V2 V3 V4
<dbl> <dbl> <dbl> <chr>
1 1 0 1 "V1,V3"
2 0 1 1 "V2,V3"
3 0 0 0 ""
4 1 1 1 "V1,V2,V3"
Data
df <- data.frame(
V1 = c(1,0,0,1),
V2 = c(0,1,0,1),
V3 = c(1,1,0,1)
)
Calling a column name based on a different columns values?
A DataFrame usually contains multiple rows (and columns).
So if you ask whether particular column (say xx) has some value:
df.xx == 20
you will get a boolean Series with:
- indices copied from df,
- value stating whether xx column in this row == 20.
So I assume that you question about particular value in a given column
should actually be expressed as: Does any element in this column
have particular value?.
You can check it with any() function:
(df.xx == 22).any()
This time the result will be a single boolean.
In your case you can write:
if (df.column_name == '1-5').any():
result = df.Minutes
Of course it is open to question what if not?
Do you want another column in result variable?
Another approach is to set the column name in some variable,
say src_col, based on some your logic.
Then, having this variable set, you can refer to the required column as:
result = df[src_col]
Note that this time:
- the column name is between brackets,
- but it is not surrounded with apostrophes,
so the target column name is expressed by the value of this variable.
And a remark about the comment by Chris90:
If you write df.loc[df['column_name'] == '1-5','Minutes']
you will get a single value from:
- row containing 1-5 (string) in column_name,
- Minutes column.
But you wrote that you wanted all values from this column.
Extract data frame columns based on multiple criteria on column names
It can be as straightforward as
df[c("id", grep("col", names(df), value = TRUE), "Gender")]
Select column dynamically based on value from another column in R
By looping through the sequence of rows, extract the value with get
and assign it to create 'y'
dt[, y := .SD[, get(x), seq_len(.N)]$V1]
dt
# a b c x y
#1: 2 5 1 a 2
#2: 3 7 2 b 7
#3: 5 7 3 c 3
select columns based on columns names containing a specific string in pandas
alternative methods:
In [13]: df.loc[:, df.columns.str.startswith('alp')]
Out[13]:
alp1 alp2
0 0.357564 0.108907
1 0.341087 0.198098
2 0.416215 0.644166
3 0.814056 0.121044
4 0.382681 0.110829
5 0.130343 0.219829
6 0.110049 0.681618
7 0.949599 0.089632
8 0.047945 0.855116
9 0.561441 0.291182
In [14]: df.loc[:, df.columns.str.contains('alp')]
Out[14]:
alp1 alp2
0 0.357564 0.108907
1 0.341087 0.198098
2 0.416215 0.644166
3 0.814056 0.121044
4 0.382681 0.110829
5 0.130343 0.219829
6 0.110049 0.681618
7 0.949599 0.089632
8 0.047945 0.855116
9 0.561441 0.291182
Select column using the value in other row of a data.table in R
We can use get
after looping through sequence of rows
DT[, W := get(Z) , 1:nrow(DT)]
Or with eval(as.name
DT[, W := eval(as.name(Z)) , 1:nrow(DT)]
Data Table - Select Value of Column by Name From Another Column
Another option:
d[ , value.of.col := diag(as.matrix(.SD)), .SDcols = d[ , name.of.col]]
> d
value.1 value.2 name.of.col value.of.col
1: one two value.1 one
2: uno dos value.2 dos
3: 1 2 value.1 1
EDIT add a faster solution:
d[ , value.of.col :=
melt(d,id.vars='name.of.col')[name.of.col==variable, value]]
Related Topics
Why am I Getting X. in My Column Names When Reading a Data Frame
Scraping a Dynamic Ecommerce Page with Infinite Scroll
Remove Rows in R Matrix Where All Data Is Na
Using Gsub to Extract Character String Before White Space in R
Detecting Operating System in R (E.G. for Adaptive .Rprofile Files)
Efficiently Sum Across Multiple Columns in R
Ggplot2, Facet_Grid, Free Scales
Data.Table - Select First N Rows Within Group
Converting Latitude and Longitude Points to Utm
How to Connect Two Coordinates with a Line Using Leaflet in R
How to Draw Stacked Bars in Ggplot2 That Show Percentages Based on Group
Unicode Characters in Ggplot2 PDF Output
Explain Ggplot2 Warning: "Removed K Rows Containing Missing Values"
Function to Calculate Geospatial Distance Between Two Points (Lat,Long) Using R