Extract Column Value Based on Another Column Pandas Dataframe

Extract column value based on another column in Pandas

You could use loc to get series which satisfying your condition and then iloc to get first element:

In [2]: df
Out[2]:
A B
0 p1 1
1 p1 2
2 p3 3
3 p2 4

In [3]: df.loc[df['B'] == 3, 'A']
Out[3]:
2 p3
Name: A, dtype: object

In [4]: df.loc[df['B'] == 3, 'A'].iloc[0]
Out[4]: 'p3'

How to access another column's value from a given id number in pandas DataFrame?

Like this:

In [21]: df.set_index('id').loc['b', 'label']
Out[21]: 'sal'

Or, use df.query:

In [28]: df.query('id == "b"')['label']
Out[28]:
1 sal
Name: label, dtype: object

Extract pattern from a column based on another column's value

You can use a regex with str.extract in a groupby+apply:

import re
df['match'] = (df.groupby('root')['word']
.apply(lambda g: g.str.extract(f'^(.*{re.escape(g.name)})'))
)

Or, if you expect few repeated "root" values:

import re
df['match'] = df.apply(lambda r: m.group()
if (m:=re.match(f'.*{re.escape(r["root"])}', r['word']))
else None, axis=1)

output:

         word   root   match
0 replay play replay
1 replayed play replay
2 playable play play
3 thinker think think
4 think think think
5 thoughtful think NaN

Iterate over column values matched value based on another column pandas dataframe

You can get all matching values by without using .iloc[0] in the df.loc code, as follows:

df.loc[df['B'] == 3, 'A']

Output:

2    p3
4 p4
Name: A, dtype: object

The 2 4 on the left of the output are the original row indices. You can use this information if want to know from which rows are these 2 extracted data originated from.

extract the top values from one column based on another column

I think this is what you are looking for:

Data:

ID,genre,plays
12345,pop,23
12345,pop,576
12345,dance,18
12345,world,45
12345,dance,23
12345,pop,456

Input:

df = df.groupby(['ID','genre'])['plays'].sum().reset_index()
df.sort_values(by=['plays'], ascending=False)

Output:

    ID      genre   plays
1 12345 pop 1055
2 12345 world 45
0 12345 dance 41

Change column value based on another column's first characters in pandas

Or use np.where:

df['end_date'] = np.where(df.end_date.str[:4] == '9999', df.start_date.str[:4] + df.end_date.str[4:], df.end_date)

df
start_date end_date
0 2020-12-25 2020-12-28
1 2021-02-02 2021-02-09
2 2019-02-13 2019-02-15

Get values in one column based on an AND condition in another column in python

You can groupby.apply a lambda that checks if the unique "item_id"s include both "A" and "B" for each "Order_number"; then filter the ones that do:

out = df.groupby('Order_number')['item_id'].apply(lambda x: {'A','B'}.issubset(x.unique().tolist())).pipe(lambda x: x.index[x]).tolist()

Another option is to use groupby.any twice; once for "A" and again for "B" to create boolean Series objects that return True if an item_id exists for an Order_number; then since we want both to exist, we use & and filter the "Order_number"s:

out = (df['item_id'].eq('A').groupby(df['Order_number']).any() & df['item_id'].eq('B').groupby(df['Order_number']).any()).pipe(lambda x: x.index[x].tolist())

Output:

[12345, 84573]


Related Topics



Leave a reply



Submit