Extracting Value Based on Another Column

Extract column value based on another column in Pandas

You could use loc to get series which satisfying your condition and then iloc to get first element:

In [2]: df
Out[2]:
A B
0 p1 1
1 p1 2
2 p3 3
3 p2 4

In [3]: df.loc[df['B'] == 3, 'A']
Out[3]:
2 p3
Name: A, dtype: object

In [4]: df.loc[df['B'] == 3, 'A'].iloc[0]
Out[4]: 'p3'

extracting value based on another column

As @Justin said, you might be working with a data.frame instead of a matrix. All the better. Using your example above, if your data.frame looks as follows:

df <- data.frame(x=c(FALSE,TRUE), freq=c(40, 6))
> df
x freq
1 FALSE 40
2 TRUE 6

The following will get you what you want irrespective of whether there are FALSE values or not.

df$freq[df$x==TRUE]
[1] 6

EDIT: As @DWin points out, you can simplify further by using the fact that df$x is logical:

> df$freq[df$x]
[1] 6
> df$freq[!df$x]
[1] 40

For example:

> df2 <- data.frame(x=TRUE, freq=46)
> df2
x freq
1 TRUE 46

Still works:

> df2$freq[df2$x==TRUE]
[1] 46

extract the top values from one column based on another column

I think this is what you are looking for:

Data:

ID,genre,plays
12345,pop,23
12345,pop,576
12345,dance,18
12345,world,45
12345,dance,23
12345,pop,456

Input:

df = df.groupby(['ID','genre'])['plays'].sum().reset_index()
df.sort_values(by=['plays'], ascending=False)

Output:

    ID      genre   plays
1 12345 pop 1055
2 12345 world 45
0 12345 dance 41

Extract pattern from a column based on another column's value

You can use a regex with str.extract in a groupby+apply:

import re
df['match'] = (df.groupby('root')['word']
.apply(lambda g: g.str.extract(f'^(.*{re.escape(g.name)})'))
)

Or, if you expect few repeated "root" values:

import re
df['match'] = df.apply(lambda r: m.group()
if (m:=re.match(f'.*{re.escape(r["root"])}', r['word']))
else None, axis=1)

output:

         word   root   match
0 replay play replay
1 replayed play replay
2 playable play play
3 thinker think think
4 think think think
5 thoughtful think NaN

extract a column in dataframe based on condition for another column R

You can use the subset function in base R -

subset(df, g == 'b', select = b)

# b
#bb 2
#cc 3

How to access another column's value from a given id number in pandas DataFrame?

Like this:

In [21]: df.set_index('id').loc['b', 'label']
Out[21]: 'sal'

Or, use df.query:

In [28]: df.query('id == "b"')['label']
Out[28]:
1 sal
Name: label, dtype: object

How to search for and extract unique values from one column in another column?

I think this works for you:

mutate(df, Col_C = stringr::str_extract(
Col_A,
paste0("\\b(", paste0(unique(Col_B), collapse = "|"), ")\\b")))
# Col_A Col_B Col_C
# 1 blue shovel 1024 blue blue
# 2 red shovel 1022 red red
# 3 green bucket 3021 green green
# 4 green rake 3021 blue green
# 5 yellow shovel 1023 yellow yellow

Breakdown:

  • paste0(unique(Col_B), collapse="|") takes the words in Col_B, de-duplicates it, and concatenates them all together with | symbols; that is, c("blue","red","green") --> "blue|red|green". In regex, the | symbol is an "OR" operator.
  • \\b( and )\\b are word-boundaries, meaning that there isn't a word-like character immediately before (first) or after (second) the patterns; by adding this around the words, we prevent a partial match of blu on blue (in case that ever happens); while it is not apparent that this changes anything here, it's a more defensive/specific pattern. The parens add grouping, more evident in the next bullet.
  • With all of that, our overall pattern looks something like "\\b(blue|red|green)\\b" (abbreviated). This translates into "find blue or red or green such that there is a word-boundary on both ends of whichever one(s) you find".


Related Topics



Leave a reply



Submit