Extract column value based on another column in Pandas
You could use loc
to get series which satisfying your condition and then iloc
to get first element:
In [2]: df
Out[2]:
A B
0 p1 1
1 p1 2
2 p3 3
3 p2 4
In [3]: df.loc[df['B'] == 3, 'A']
Out[3]:
2 p3
Name: A, dtype: object
In [4]: df.loc[df['B'] == 3, 'A'].iloc[0]
Out[4]: 'p3'
How to access another column's value from a given id number in pandas DataFrame?
Like this:
In [21]: df.set_index('id').loc['b', 'label']
Out[21]: 'sal'
Or, use df.query
:
In [28]: df.query('id == "b"')['label']
Out[28]:
1 sal
Name: label, dtype: object
Extract pattern from a column based on another column's value
You can use a regex with str.extract
in a groupby
+apply
:
import re
df['match'] = (df.groupby('root')['word']
.apply(lambda g: g.str.extract(f'^(.*{re.escape(g.name)})'))
)
Or, if you expect few repeated "root" values:
import re
df['match'] = df.apply(lambda r: m.group()
if (m:=re.match(f'.*{re.escape(r["root"])}', r['word']))
else None, axis=1)
output:
word root match
0 replay play replay
1 replayed play replay
2 playable play play
3 thinker think think
4 think think think
5 thoughtful think NaN
Iterate over column values matched value based on another column pandas dataframe
You can get all matching values by without using .iloc[0]
in the df.loc
code, as follows:
df.loc[df['B'] == 3, 'A']
Output:
2 p3
4 p4
Name: A, dtype: object
The 2
4
on the left of the output are the original row indices. You can use this information if want to know from which rows are these 2 extracted data originated from.
extract the top values from one column based on another column
I think this is what you are looking for:
Data:
ID,genre,plays
12345,pop,23
12345,pop,576
12345,dance,18
12345,world,45
12345,dance,23
12345,pop,456
Input:
df = df.groupby(['ID','genre'])['plays'].sum().reset_index()
df.sort_values(by=['plays'], ascending=False)
Output:
ID genre plays
1 12345 pop 1055
2 12345 world 45
0 12345 dance 41
Change column value based on another column's first characters in pandas
Or use np.where
:
df['end_date'] = np.where(df.end_date.str[:4] == '9999', df.start_date.str[:4] + df.end_date.str[4:], df.end_date)
df
start_date end_date
0 2020-12-25 2020-12-28
1 2021-02-02 2021-02-09
2 2019-02-13 2019-02-15
Get values in one column based on an AND condition in another column in python
You can groupby.apply
a lambda that checks if the unique "item_id"s include both "A" and "B" for each "Order_number"; then filter the ones that do:
out = df.groupby('Order_number')['item_id'].apply(lambda x: {'A','B'}.issubset(x.unique().tolist())).pipe(lambda x: x.index[x]).tolist()
Another option is to use groupby.any
twice; once for "A" and again for "B" to create boolean Series objects that return True if an item_id
exists for an Order_number
; then since we want both to exist, we use &
and filter the "Order_number"s:
out = (df['item_id'].eq('A').groupby(df['Order_number']).any() & df['item_id'].eq('B').groupby(df['Order_number']).any()).pipe(lambda x: x.index[x].tolist())
Output:
[12345, 84573]
Related Topics
Using Beautifulsoup to Extract Text Without Tags
Flask Application Traceback Doesn't Show Up in Server Log
Explaining Python's '_Enter_' and '_Exit_'
How Find Specific Data Attribute from HTML Tag in Beautifulsoup4
How to Send Non-English Unicode String Using Http Header
Reduce Number of Levels for Large Categorical Variables
Integration Testing for a Web App
Does Ruby Support Conditional Regular Expressions
Convert Uiimage from Bgr to Rgb
How to Perform Element-Wise Multiplication of Two Lists
How to Create a Large Pandas Dataframe from an SQL Query Without Running Out of Memory
Python: Maximum Recursion Depth Exceeded
What Is the Cause of the Bad Request Error When Submitting Form in Flask Application
How to Reversibly Store and Load a Pandas Dataframe To/From Disk
How to Make Setuptools Install a Package That's Not on Pypi
What's an Efficient Way to Find If a Point Lies in the Convex Hull of a Point Cloud