Transfer values from one dataframe to another
Using data.table
:
require(data.table)
dt1 <- data.table(df1, key="id")
dt2 <- data.table(df2)
dt1[dt2$id, value]
# id value
# 1: 1 1.000000
# 2: 2 6.210526
# 3: 3 11.421053
# 4: 4 16.631579
# 5: 5 21.842105
# 6: 21 NA
# 7: 22 NA
# 8: 23 NA
or using base merge
as @TheodoreLytras mentioned under comment:
# you don't need to have `v2` column in df2
merge(df2, df1, by="id", all.x=T, sort=F)
# id v2 value
# 1 1 NA 1.000000
# 2 2 NA 6.210526
# 3 3 NA 11.421053
# 4 4 NA 16.631579
# 5 5 NA 21.842105
# 6 21 NA NA
# 7 22 NA NA
# 8 23 NA NA
Copying a column from one DataFrame to another gives NaN values?
The culprit is unalignable indexes
Your DataFrames' indexes are different (and correspondingly, the indexes for each columns), so when trying to assign a column of one DataFrame to another, pandas will try to align the indexes, and failing to do so, insert NaNs.
Consider the following examples to understand what this means:
# Setup
A = pd.DataFrame(index=['a', 'b', 'c'])
B = pd.DataFrame(index=['b', 'c', 'd', 'f'])
C = pd.DataFrame(index=[1, 2, 3])
# Example of alignable indexes - A & B (complete or partial overlap of indexes)
A.index B.index
a
b b (overlap)
c c (overlap)
d
f
# Example of unalignable indexes - A & C (no overlap at all)
A.index C.index
a
b
c
1
2
3
When there are no overlaps, pandas cannot match even a single value between the two DataFrames to put in the result of the assignment, so the output is a column full of NaNs.
If you're working on an IPython notebook, you can check that this is indeed the root cause using,
df1.index.equals(df2.index)
# False
df1.index.intersection(df2.index).empty
# True
You can use any of the following solutions to fix this issue.
Solution 1: Reset both DataFrames' indexes
You may prefer this option if you didn't mean to have different indices in the first place, or if you don't particularly care about preserving the index.
# Optional, if you want a RangeIndex => [0, 1, 2, ...]
# df1.index = pd.RangeIndex(len(df))
# Homogenize the index values,
df2.index = df1.index
# Assign the columns.
df2[['date', 'hour']] = df1[['date', 'hour']]
If you want to keep the existing index, but as a column, you may use reset_index()
instead.
Solution 2: Assign NumPy arrays (bypass index alignment)
This solution will only work if the lengths of the two DataFrames match.
# pandas >= 0.24
df2['date'] = df1['date'].to_numpy()
# pandas < 0.24
df2['date'] = df1['date'].values
To assign multiple columns easily, use,
df2[['date', 'hour']] = df1[['date', 'hour']].to_numpy()
R: How can I transfer values from one data frame to another data frame depending on certain circumstances?
You need to join the 2 tables up, there are lots of methods and packages to do this but I am always a fan of the tidyverse, in this case dplyr
joins.
Without seeing your table specifics it will look something like this.
df_joined <- left_join(df1, df2, by = c("Country of Birth" = "Country", "Year of Birth" = "Year")
How to move values from one dataframe to another in pandas?
You really don't need df2
here. You can compute the result directly from df
using some simple reshaping functions set_index
, unstack
and reindex
. You just need the symbols list.
(df.assign(Shares=np.where(df.Order == 'BUY', df.Shares, -df.Shares))
.drop('Order', 1)
.set_index('Symbol', append=True)['Shares']
.unstack(1)
.reindex(df2.columns, axis=1)) # you can replace df2.columns with a list
GOOG AAPL XOM IBM Cash
Date
2009-01-14 NaN 150.0 NaN NaN NaN
2009-01-21 NaN -150.0 NaN 400.0 NaN
Copy value from one dataframe to another based on multiple column index
You can use DataFrame.merge
by select 2 columns in df1
and no on
parameter for merge by intersection of columns:
df = df1[['item','shop']].merge(df2)
So it working same like:
df = df1[['item','shop']].merge(df2, on=['item','shop'])
Your solution should be changed with DataFrame.set_index
by 2 columns for MultiIndex
:
df11 = df1.set_index(['item','shop'])
df11.update(df2.set_index(['item','shop']))
df = df11.reset_index()
Copy contents from one Dataframe to another based on column values in Pandas
Building off of Rabinzel's answer:
output = df2.merge(df1, how='left', on='First Name', suffixes=[None, '_old'])
df3 = output[['First Name', 'Age', 'Gender', 'Weight', 'Height']]
cols = df1.columns[1:-1]
modval = pd.DataFrame()
for col in cols:
modval = pd.concat([modval, output[['First Name', col + '_old']][output[col] != output[col + '_old']].dropna()])
modval.rename(columns={col +'_old':col}, inplace=True)
newentries = df2[~df2['First Name'].isin(df1['First Name'])]
deletedentries = df1[~df1['First Name'].isin(df2['First Name'])]
print(df3, newentries, deletedentries, modval, sep='\n\n')
Output:
First Name Age Gender Weight Height
0 James 25 Male 155 5'10
1 John 27 Male 175 5'9
2 Patricia 23 Female 135 5'3
3 Mary 22 Female 125 5'4
4 Martin 30 Male 185 NaN
5 Margaret 29 Female 141 NaN
6 Kevin 22 Male 198 6'2
First Name Age Gender Weight
4 Martin 30 Male 185
5 Margaret 29 Female 141
First Name Age Gender Weight Height
2 Matthew 29 Male 183 6'0
5 Rachel 29 Male 123 5'3
6 Jose 20 Male 175 5'11
First Name Age Gender Weight
0 James NaN NaN 165.0
6 Kevin NaN NaN 192.0
How to transfer values from one dataframe to another?
Do you mean, join once on ID and X_A to get X_B, and afterwards ID and Y_A to get Y_B? Note that row 10 is different:
df2 %>%
left_join(select(df1, ID, X_A, X_B),
by = c("ID", "X_A")) %>%
left_join(select(df1, ID, Y_A, Y_B),
by = c("ID", "Y_A"))
# ID X_A Y_A X_B Y_B
# 1 A 1 1 1 1
# 2 A 2 2 2 2
# 3 A 3 3 3 NA
# 4 A 4 4 4 NA
# 5 A 5 5 5 NA
# 6 A 6 6 NA NA
# 7 A 7 7 NA NA
# 8 A 8 8 NA NA
# 9 A 9 9 NA NA
# 10 A 10 10 NA 10
# 11 B 1 1 NA NA
# 12 B 2 2 NA NA
# 13 B 3 3 NA NA
# 14 B 4 4 NA NA
# 15 B 5 5 NA NA
# 16 B 6 6 NA NA
# 17 B 7 7 NA NA
# 18 B 8 8 8 8
# 19 B 9 9 9 9
# 20 B 10 10 10 10
Base R:
want <- merge(df2, subset(df1, select = c(ID, X_A, X_B)), by = c("ID", "X_A"), all.x = TRUE)
(want <- merge(want, subset(df1, select = c(ID, Y_A, Y_B)), by = c("ID", "Y_A"), all.x = TRUE))
how to pass value from one dataframe to another dataframe?
Store sql result into a variable
using mkString
and then use the variable in your where
clause.
Example:
val df=Seq((1,"a"),(2,"b")).toDF("CID","n")
df.createOrReplaceTempView("AAA")
val df1=Seq((1,"a"),(2,"b")).toDF("C_ID","j")
df1.createOrReplaceTempView("NST")
val a=spark.sql("select max(CID) from AAA").collect()(0).mkString
spark.sql(s"select * from NST where C_ID=${a}").show()
#+----+---+
#|C_ID| j|
#+----+---+
#| 2| b|
#+----+---+
python - how do I transfer values from one df to another
df = pd.DataFrame({
'Rating' : ['A', 'AAA', 'AA', 'BBB', 'BB', 'B'],
'val' : [4560.0, 64.0, 456.0, 34.0, 534.0, 54.0]
})
df
###
Rating val
0 A 4560.0
1 AAA 64.0
2 AA 456.0
3 BBB 34.0
4 BB 534.0
5 B 54.0
Keeping df1
as yours, but don't set_index()
additionally.
df1 = pd.DataFrame(['AA','AA','AA','AA','A','A'],columns=['Rating'])
df1
###
Rating
0 AA
1 AA
2 AA
3 AA
4 A
5 A
Doing the merge()
df1 = df1.merge(df,left_on='Rating', right_on='Rating')
df1
###
Rating val
0 AA 456.0
1 AA 456.0
2 AA 456.0
3 AA 456.0
4 A 4560.0
5 A 4560.0
Then set_index()
df1.set_index('Rating', inplace=True)
df1
###
val
Rating
AA 456.0
AA 456.0
AA 456.0
AA 456.0
A 4560.0
A 4560.0
With different df1
df1 = pd.DataFrame(['AA', 'A', 'A', 'A', 'AA', 'AA'], columns=['Rating'])
df1
###
Rating
0 AA
1 A
2 A
3 A
4 AA
5 AA
Doing the merge()
df1 = df1.merge(df,left_on='Rating', right_on='Rating', how='left')
df1
###
Rating val
0 AA 456.0
1 A 4560.0
2 A 4560.0
3 A 4560.0
4 AA 456.0
5 AA 456.0
Related Topics
Collapse Consecutive Runs of Numbers to a String of Ranges
How to Programmatically Darken the Color Given Rgb Values
Can .Sd Be Viewed from a Browser Within [.Data.Table()
How to Remove Groups of Observation with Dplyr::Filter()
Control Transparency of Smoother and Confidence Interval
Different Results with Randomforest() and Caret's Randomforest (Method = "Rf")
How to Rotate the X-Axis Labels 90 Degrees in Levelplot
Plotting Multiple Lines from a Data Frame with Ggplot2
Remove a Character from the Entire Data Frame
R: How to Select Files in Directory Which Satisfy Conditions Both on the Beginning and End of Name
Include Zero Frequencies in Frequency Table for Likert Data
Ggplot Object Not Found Error When Adding Layer with Different Data
How Do Add a Column in a Data Frame in R
When Writing My Own R Package, I Can't Seem to Get Other Packages to Import Correctly
Calculating Minimum Distance Between a Point and the Coast
How to Find the First and Last Occurrences of an Element in a Data.Frame