vlookup in Pandas using join
Perform a left
merge, this will use sku
column as the column to join on:
In [26]:
df.merge(df1, on='sku', how='left')
Out[26]:
sku loc flag dept
0 122 61 True b
1 122 62 True b
2 122 63 False b
3 123 61 True b
4 123 62 False b
5 113 62 True a
6 301 63 True c
If sku
is in fact your index then do this:
In [28]:
df.merge(df1, left_index=True, right_index=True, how='left')
Out[28]:
loc flag dept
sku
113 62 True a
122 61 True b
122 62 True b
122 63 False b
123 61 True b
123 62 False b
301 63 True c
Another method is to use map
, if you set sku
as the index on your second df, so in effect it becomes a Series then the code simplifies to this:
In [19]:
df['dept']=df.sku.map(df1.dept)
df
Out[19]:
sku loc flag dept
0 122 61 True b
1 123 61 True b
2 113 62 True a
3 122 62 True b
4 123 62 False b
5 122 63 False b
6 301 63 True c
Vlookup in Pandas - Join or Merge?
Does this not get you what you want doing merge? I'm unsure why you have the null column for role and everything under user but you can rename columns.
print('df')
print(df)
print('df2')
print(df2)
print('out_df')
print(out_df)
df.merge(df2[['By', 'Role']], on='By')
df
CA# Created By $
0 9xxx12 User 1 10
1 9xxx13 User 2 20
2 9xxx14 User 3 25
df2
Created By Role
0 User 1 Sales
1 User 2 Maintenance
2 User 3 Operations
out_df
CA# Created By $ User Role
0 9xxx12 User 1 10 Sales NaN
1 9xxx13 User 2 20 Maintenance NaN
2 9xxx14 User 3 25 Operations NaN
Out[40]:
CA# Created By $ Role
0 9xxx12 User 1 10 Sales
1 9xxx13 User 2 20 Maintenance
2 9xxx14 User 3 25 Operations
Edit: Sorry, some of the issue is the clipboard parsing. The logic applies. If you're still having issues can you provide examples of "lines" that are not joining properly?
How to do a VLOOKUP with pandas but only get the first matches?
Maybe this is what you need. Also, never use for loop in pandas. Read more on pandas merge
#Merge based on "hotel name" key
main_table = pd.read_excel(file_path, 'market_segment', header = 0)
ref_table = pd.read_excel(file_path, '2018', header = 0)
df = pd.merge(main_table, ref_table, on="hotel name", how="left")
#keep only first results
df = df.drop_duplicates(subset=["Discount", "hotel name"], keep="first")
Pandas: Join with pratial match (like VLOOKUP) but in certain order
You can craft a regex to extract the country Abb, then use this as a merging key:
# we need to sort the Abb by decreasing length to ensure
# specific Abb match before more generic (e.g. Gou/GRE match before G)
regex = '|'.join(df1['Abb'].sort_values(key=lambda s: s.str.len(),
ascending=False)
)
# 'GRE|Gou|G|B'
out = df2.merge(df1, right_on='Abb',
left_on=df2['AreaName'].str.extract(f'^({regex})', expand=False)
)
If case does not matter:
key = df1['Abb'].str.lower()
regex = '|'.join(key
.sort_values(key=lambda s: s.str.len(), ascending=False)
)
# 'gre|gou|g|b'
out = df2.merge(df1, right_on=key,
left_on=df2['AreaName']
.str.lower()
.str.extract(f'^({regex})', expand=False)
).drop(columns='key_0')
output:
OrderNo AreaName Abb FullName
0 INV20561 GRE65335 GRE GreenLand
1 INV20562 Gou6D654 Gou Gouna
2 INV20563 Gddd654 G Gouna
3 INV20564 B65465 B Bahr
Pandas VLOOKUP for two dataframes with NaN values
What you are looking for is Series.map
:
df["download_date"] = df["mobile_no"].map(df2.set_index("mobile_no")["download_date"])
print (df)
name mobile_no download_date
0 Hector ABC 123.0 2021-05-30
1 Hector ABC 287.0 2020-09-28
2 Jose JKD 567.0 NaN
3 Luis AH NaN NaN
4 Billy DH NaN NaN
5 Harry AC 569.0 2020-01-15
Pandas merge stop at first match like vlookup instead of duplicating
pandas' merge
behaves (mostly) like a SQL merge and will provide all combinations of matching keys. If you only want the first item, simply remove it from the data you feed to merge.
Use drop_duplicates
on mat_grp
:
merged = pd.merge(pos, mat_grp.drop_duplicates('matgrp'), how='left', on='matgrp')
output:
PO matgrp commodity
0 123456 1001 foo - 10001
1 654321 803A spam - 100003
2 971358 803B eggs - 10003
Related Topics
Set Markers for Individual Points on a Line in Matplotlib
How to Get Item's Position in a List
How to Apply Gradient Clipping in Tensorflow
Pandas Group by and Find First Non Null Value for All Columns
Yield in List Comprehensions and Generator Expressions
Why the Global Interpreter Lock
Getting Started with the Python Debugger, Pdb
Speed of Calculating Powers (In Python)
How to Use Groupby to Concatenate Strings in Python Pandas
Connecting to Microsoft SQL Server Using Python
Why Are 0D Arrays in Numpy Not Considered Scalar
What Is the Most Pythonic Way to Check If an Object Is a Number
How to Improve the Label Placement in Scatter Plot