Add a new column to a dataframe using matching values of another dataframe
merge(table1, table2[, c("pid", "val2")], by="pid")
Add in the all.x=TRUE
argument in order to keep all of the pids in table1 that don't have matches in table2...
You were on the right track. Here's a way using match...
table1$val2 <- table2$val2[match(table1$pid, table2$pid)]
Create new column in dataframe with match values from other dataframe
Need map
by Series
first for new column:
df1['Total2'] = df1['Name'].map(df2.set_index('Name')['Total2'])
print (df1)
Name Total Total2
0 Accounting 3 120
1 Reporting 1 100
2 Finance 1 400
3 Audit 1 500
4 Template 2 1200
And then set_index
with DataFrame.plot.bar
:
df1.set_index('Name').plot.bar()
Append a new column on a dataframe based on other dataframe with matching rows and fill the non-matching ones with value from the existing column
Use numpy.where
:
df1['new'] = np.where(df1['UserID'].isin(df2['UserID']), 'test-Red',
np.where(df1['group'] == 'test','test-NonRed',df1['group']))
print (df1)
UserID group day sp PU new
0 213 test 12/11/14 3 311 test-Red
1 314 control 13/11/14 4 345 control
2 354 test 13/08/14 5 376 test-NonRed
Or numpy.select
:
m1 = df1['UserID'].isin(df2['UserID'])
m2 = df1['group'] == 'test'
df1['new'] = np.select([m1,m2], ['test-Red', 'test-NonRed'],default=df1['group'])
print (df1)
UserID group day sp PU new
0 213 test 12/11/14 3 311 test-Red
1 314 control 13/11/14 4 345 control
2 354 test 13/08/14 5 376 test-NonRed
More general solution:
print (df1)
UserID group day sp PU
0 213 test 12/11/14 3 311
1 314 control 13/11/14 4 345
2 354 test 13/08/14 5 376
3 2131 test1 12/11/14 3 311
4 314 control1 13/11/14 4 345
5 354 test1 13/08/14 5 376
df2 = pd.DataFrame({'UserID':[213, 2131]})
m1 = df1['UserID'].isin(df2['UserID'])
m2 = df1['group'].isin(df1.loc[m1, 'group'])
df1['new'] = np.select([m1,m2],
[df1['group'] + '-Red', df1['group'] + '-NonRed'],
default=df1['group'])
print (df1)
UserID group day sp PU new
0 213 test 12/11/14 3 311 test-Red
1 314 control 13/11/14 4 345 control
2 354 test 13/08/14 5 376 test-NonRed
3 2131 test1 12/11/14 3 311 test1-Red
4 314 control1 13/11/14 4 345 control1
5 354 test1 13/08/14 5 376 test1-NonRed
New column based on matching values from another dataframe pandas
Check with stack
df1's list columns after re-create with DataFrame
then map
the value from df2
Also since you asking not using for loop I am using sum
, and sum
for this case is much slower than *for loop*
or itertools
s=pd.DataFrame(df1.column2.tolist()).stack()
df1['New']=s.map(df2.set_index('column3').column4).sum(level=0).apply(set)
df1
Out[36]:
column1 column2 New
0 a1 [A, B] {2, 4, 3, 1}
1 a2 [A, B, C] {3, 5, 4, 2, 1}
2 a3 [B, C] {4, 3, 1, 5}
As I mentioned and most of us suggested , also you can check with For loops with pandas - When should I care?
import itertools
d=dict(zip(df2.column3,df2.column4))
l=[set(itertools.chain(*[d[y] for y in x ])) for x in df1.column2.tolist()]
df1['New']=l
How to add new column from another dataframe based on values in column of first dataframe?
This is done via a join
operation which in pandas can be done with .merge()
.
Kindly try using the following:
df = df.merge(population,how='left',on='Province')
Also please consider reading the following answer for a detailed guide on joins and merges
Pandas populate new dataframe column based on matching columns in another dataframe
Consider the following dataframes df
and df2
df = pd.DataFrame(dict(
AUTHOR_NAME=list('AAABBCCCCDEEFGG'),
title= list('zyxwvutsrqponml')
))
df2 = pd.DataFrame(dict(
AUTHOR_NAME=list('AABCCEGG'),
title =list('zwvtrpml'),
CATEGORY =list('11223344')
))
option 1merge
df.merge(df2, how='left')
option 2join
cols = ['AUTHOR_NAME', 'title']
df.join(df2.set_index(cols), on=cols)
both options yield
R: How do I add a column to a dataframe based on paired values in another dataframe with different length?
Using left_join from dplyr package:
library(dplyr)
data <- left_join(df1,df2, by="id")
Adding value to new column based on if two other columns match
I tried with this little dataset and it's working, let me know if you have some problems
df1 = pd.DataFrame({"title" : ["Biocryst ($BCRX) continues to remain undervalued", "AAPL is good, buy it"], 'score' : [120,420] , 'Stock_name' : ["",""] })
df2 = pd.DataFrame({'Company name' : ['BioCryst Pharmaceuticals, Inc.','GameStop','Apple Inc.'], 'Symbol' : ["BCRX","GME","AAPL"]})
df1
title score Stock_name
0 Biocryst ($BCRX) continues to remain undervalued 120
1 AAPL is good, buy it 420
df2
Company name Symbol
0 BioCryst Pharmaceuticals, Inc. BCRX
1 GameStop GME
2 Apple Inc. AAPL
for j in range(0,len(df2)):
for i in range(0,len(df1)):
if df2['Symbol'][j] in df1['title'][i]:
df1['Stock_name'][i] = df2['Symbol'][j]
df1
title score Stock_name
0 Biocryst ($BCRX) continues to remain undervalued 120 BCRX
1 AAPL is good, buy it 420 AAPL
Related Topics
Ggplot Replace Count with Percentage in Geom_Bar
R Grep: Is There an and Operator
Protect/Encrypt R Package Code for Distribution
What Does "Error: Object '<Myvariable>' Not Found" Mean
Use a Variable Within a Plotmath Expression
Make Readline Wait for Input in R
Create a Time Interval of 15 Minutes from Minutely Data in R
Plots Generated by 'Plot' and 'Ggplot' Side-By-Side
Multiple Time Series in One Plot
How to Get Top N Companies from a Data Frame in Decreasing Order
Call by Reference in R (Using Function to Modify an Object)
Finding the Index Inside a Vector Satisfying a Condition
Outputting Multiple Lines of Text with Rendertext() in R Shiny
How to Perform Multiple Left Joins Using Dplyr in R
Purrr Map Equivalent of Nested for Loop