Extract column value based on another column in Pandas
You could use loc
to get series which satisfying your condition and then iloc
to get first element:
In [2]: df
Out[2]:
A B
0 p1 1
1 p1 2
2 p3 3
3 p2 4
In [3]: df.loc[df['B'] == 3, 'A']
Out[3]:
2 p3
Name: A, dtype: object
In [4]: df.loc[df['B'] == 3, 'A'].iloc[0]
Out[4]: 'p3'
extracting value based on another column
As @Justin said, you might be working with a data.frame
instead of a matrix
. All the better. Using your example above, if your data.frame looks as follows:
df <- data.frame(x=c(FALSE,TRUE), freq=c(40, 6))
> df
x freq
1 FALSE 40
2 TRUE 6
The following will get you what you want irrespective of whether there are FALSE
values or not.
df$freq[df$x==TRUE]
[1] 6
EDIT: As @DWin points out, you can simplify further by using the fact that df$x
is logical:
> df$freq[df$x]
[1] 6
> df$freq[!df$x]
[1] 40
For example:
> df2 <- data.frame(x=TRUE, freq=46)
> df2
x freq
1 TRUE 46
Still works:
> df2$freq[df2$x==TRUE]
[1] 46
extract the top values from one column based on another column
I think this is what you are looking for:
Data:
ID,genre,plays
12345,pop,23
12345,pop,576
12345,dance,18
12345,world,45
12345,dance,23
12345,pop,456
Input:
df = df.groupby(['ID','genre'])['plays'].sum().reset_index()
df.sort_values(by=['plays'], ascending=False)
Output:
ID genre plays
1 12345 pop 1055
2 12345 world 45
0 12345 dance 41
Extract pattern from a column based on another column's value
You can use a regex with str.extract
in a groupby
+apply
:
import re
df['match'] = (df.groupby('root')['word']
.apply(lambda g: g.str.extract(f'^(.*{re.escape(g.name)})'))
)
Or, if you expect few repeated "root" values:
import re
df['match'] = df.apply(lambda r: m.group()
if (m:=re.match(f'.*{re.escape(r["root"])}', r['word']))
else None, axis=1)
output:
word root match
0 replay play replay
1 replayed play replay
2 playable play play
3 thinker think think
4 think think think
5 thoughtful think NaN
extract a column in dataframe based on condition for another column R
You can use the subset
function in base R -
subset(df, g == 'b', select = b)
# b
#bb 2
#cc 3
How to access another column's value from a given id number in pandas DataFrame?
Like this:
In [21]: df.set_index('id').loc['b', 'label']
Out[21]: 'sal'
Or, use df.query
:
In [28]: df.query('id == "b"')['label']
Out[28]:
1 sal
Name: label, dtype: object
How to search for and extract unique values from one column in another column?
I think this works for you:
mutate(df, Col_C = stringr::str_extract(
Col_A,
paste0("\\b(", paste0(unique(Col_B), collapse = "|"), ")\\b")))
# Col_A Col_B Col_C
# 1 blue shovel 1024 blue blue
# 2 red shovel 1022 red red
# 3 green bucket 3021 green green
# 4 green rake 3021 blue green
# 5 yellow shovel 1023 yellow yellow
Breakdown:
paste0(unique(Col_B), collapse="|")
takes the words inCol_B
, de-duplicates it, and concatenates them all together with|
symbols; that is,c("blue","red","green")
-->"blue|red|green"
. In regex, the|
symbol is an "OR" operator.\\b(
and)\\b
are word-boundaries, meaning that there isn't a word-like character immediately before (first) or after (second) the patterns; by adding this around the words, we prevent a partial match ofblu
onblue
(in case that ever happens); while it is not apparent that this changes anything here, it's a more defensive/specific pattern. The parens add grouping, more evident in the next bullet.- With all of that, our overall pattern looks something like
"\\b(blue|red|green)\\b"
(abbreviated). This translates into "findblue
orred
orgreen
such that there is a word-boundary on both ends of whichever one(s) you find".
Related Topics
R Shiny Conditionalpanel Output Value
How to Read \" Double-Quote Escaped Values with Read.Table in R
Using Annotate to Add Different Annotations to Different Facets
Sub-Assign by Reference on Vector in R
R Shiny Error: Object Input Not Found
Split Date Data (M/D/Y) into 3 Separate Columns
Replace Character at Certain Location Within String
R - Customizing X Axis Values in Histogram
Using R to "Click" a Download File Button on a Webpage
Installation of Rodbc on Os X Yosemite
Regular Analysis Over Irregular Time Series
R: Generate All Permutations of Vector Without Duplicated Elements
Dplyr::N() Returns "Error: This Function Should Not Be Called Directly"
How to Plot Ellipse Given a General Equation in R
Ggplot Year by Year Comparison
Ggplot2: Have Shorter Tick Marks for Tick Marks Without Labels
Icu Init Failed: U_File_Access_Error - When Running Swirl
How to Make the Horizontal Scrollbar Visible in Dt::Datatable