Subtract Values in One Dataframe from Another

Subtract values in one dataframe from another

Use this:

within(merge(A,B,by="Link"), {
VU <- VU.x - VU.y
U <- U.x - U.y
P <- P.x - P.y
})[,c("Link","VU","U","P")]

EDIT: Bonus: if there are too many paired columns (not just VU, U and P) you can use this:

M <- merge(A,B,by="Link")

S <- M[,grepl("*\\.x$",names(M))] - M[,grepl("*\\.y$",names(M))]

cbind(M[,1,drop=FALSE],S)

# Link VU.x U.x P.x
#1 DVH1 5 1 22
#2 DVH2 3 0 24
#3 DVH3 10 1 30

How to subtract values in one dataframe from the other based on multiple columns?

Use Series.sub with fill_value=0 parameter for subtraction with convert columns id1, id2 for MultiIndex, so subtract is based by these columns:

df = (df1.set_index(['id1','id2'])['Total']
.sub(df2.set_index(['id1','id2'])['Part1'], fill_value=0)
.reset_index(name='new'))
print (df)
id1 id2 new
0 625 AF 70.0
1 625 AG 65.0
2 625 AP 73.0
3 625 BA 112.0
4 625 BM -3.0
5 725 AF 25.0
6 725 AP 42.0
7 725 BA -2.0
8 725 BM 72.0
9 1130 AF 34.0
10 1130 AG -6.0
11 1130 BC 80.0
12 1130 BM 19.0

In Pandas : How can i subtract two dataframes values based on other two dataframe same Column which contain same Values

We can start by merging both df_1 and df_2 on matching values of df_1 on strike_price using a left merge :

>>> df = pd.merge(df_1[['strike_price', 'close']],
... df_2[['strike_price', 'close']],
... how='left',
... left_on=['strike_price'],
... right_on=['strike_price'],
... suffixes=['_df_1',
... '_df_2'])
>>> df
strike_price close_df_1 close_df_2
0 30000 3131.20 3000.0
1 30300 2836.30 NaN
2 30400 2736.95 2744.0
3 30500 2630.00 2800.0
4 30600 2530.60 2650.6

Then, we can build a column diff subtracting columns close_df_1 and close_df_2 to get the expected result :

>>> df['diff'] = df['close_df_1'] - df['close_df_2']
>>> df
strike_price close_df_1 close_df_2 diff
0 30000 3131.20 3000.0 131.20
1 30300 2836.30 NaN NaN
2 30400 2736.95 2744.0 -7.05
3 30500 2630.00 2800.0 -170.00
4 30600 2530.60 2650.6 -120.00

How to subtract rows between two different dataframes and replace original value?

First solution is create index in df22 by Bankname for align by df1 for correct row subracting:

df.set_index('BankName').sub(df2.set_index([['Bank1']]), fill_value=0)

df.set_index('BankName').sub(df2.set_index([['Bank2']]), fill_value=0)

You need create new column to df2 with BankName, convert BankName to index in both DataFrames, so possible subtract by this row:

df22 = df2.assign(BankName = 'Bank1').set_index('BankName')
df = df1.set_index('BankName').sub(df22, fill_value=0).reset_index()
print (df)
BankName Value1 Value2
0 Bank1 7.0 53.0
1 Bank2 15.0 65.0
2 Bank3 14.0 54.0

Subtract by Bank2:

df22 = df2.assign(BankName = 'Bank2').set_index('BankName')
df = df1.set_index('BankName').sub(df22, fill_value=0).reset_index()
print (df)

BankName Value1 Value2
0 Bank1 10.0 55.0
1 Bank2 12.0 63.0
2 Bank3 14.0 54.0

Another solution with filter by BankName:

m = df1['BankName']=='Bank1'
df1.loc[m, df2.columns] = df1.loc[m, df2.columns].sub(df2.iloc[0])
print (df1)
BankName Value1 Value2
0 Bank1 7 53
1 Bank2 15 65
2 Bank3 14 54

m = df1['BankName']=='Bank2'
df1.loc[m, df2.columns] = df1.loc[m, df2.columns].sub(df2.iloc[0])

How to Subtract one column in pandas from another?

It looks like you want to create new rows. You can index the dataframe by Account which also has the advantage that the remaining columns are the things you want to subtract. Then subtract and add a new row.

>>> df = pd.DataFrame({'Accounts':['Cash','Build','Build Dep', 'Car', 'Car Dep'],
... 'Debits':[300,500,0,100,0],
... 'Credits':[0,0,250,0,50]})
>>>
>>> df = df.set_index('Accounts')
>>> df.loc['Build Delta'] = df.loc['Build Dep'] - df.loc['Build']
>>> df.loc['Car Delta'] = df.loc['Car'] - df.loc['Car Dep']
>>>
>>> print(df)
Debits Credits
Accounts
Cash 300 0
Build 500 0
Build Dep 0 250
Car 100 0
Car Dep 0 50
Build Delta -500 250
Car Delta 100 -50

If you want to have a column of deltas for all of the rows, just subtract the columns. This is the beauty of numpy and pandas. You can apply operations to entire columns with small amounts of code and get better performance than if you did it in vanilla python.

>>> df = pd.DataFrame({'Accounts':['Cash','Build','Build Dep', 'Car', 'Car Dep'],
... 'Debits':[300,500,0,100,0],
... 'Credits':[0,0,250,0,50]})
>>>
>>> df = df.set_index('Accounts')
>>>
>>>
>>>
>>> df['Delta'] = df['Credits'] - df['Debits']
>>> df
Debits Credits Delta
Accounts
Cash 300 0 -300
Build 500 0 -500
Build Dep 0 250 250
Car 100 0 -100
Car Dep 0 50 50

Subtracting one dataframe column from another dataframe column for multiple columns

You could use Dataframe.subtract to subtract columns in the two dataframes. We loop over columns in df2 and if that column is found in df1, we perform the subtraction in that column. finally we save the result in a separate column whose name ends with "Result".

In [1]: import pandas as pd

In [2]: df1 = pd.DataFrame({"branch A(pkg XYZ)":[20,10,30,20,50], "branch A(pkg ABC)":[30,30,40,30,10], "branch B(pkg X
...: YZ)":[50, 50, 50, 50, 50]})

In [3]: df1
Out[3]:
branch A(pkg XYZ) branch A(pkg ABC) branch B(pkg XYZ)
0 20 30 50
1 10 30 50
2 30 40 50
3 20 30 50
4 50 10 50

In [4]: df2 = pd.DataFrame({"branch A(pkg XYZ)":[3,2,3,1,4], "branch A(pkg ABC)":[5,6,7,2,0], "branch B(pkg XYZ)":[50,5
...: 0,50,50,50]})

In [5]: df2
Out[5]:
branch A(pkg XYZ) branch A(pkg ABC) branch B(pkg XYZ)
0 3 5 50
1 2 6 50
2 3 7 50
3 1 2 50
4 4 0 50

In [25]: for i in df2.columns:
...: if i in df1.columns:
...: df2[i+"Result"] = df2[i].subtract(df1[i], fill_value=0)

In [29]: df2
Out[29]:
branch A(pkg XYZ) branch A(pkg ABC) branch B(pkg XYZ) \
0 3 5 50
1 2 6 50
2 3 7 50
3 1 2 50
4 4 0 50

branch A(pkg XYZ)Result branch A(pkg ABC)Result branch B(pkg XYZ)Result
0 -17 -25 0
1 -8 -24 0
2 -27 -33 0
3 -19 -28 0
4 -46 -10 0

An attempt with 1000 columns and 100 rows is quite efficient too:

In [40]: import numpy as np
In [41]: df1 = pd.DataFrame(np.random.random((100, 1000)))
In [42]: df2 = pd.DataFrame(np.random.random((100, 1000)))
In [45]: %%timeit
...: for i in df2.columns:
...: if i in df1.columns:
...: df2[str(i)+"Result"] = df2[i].subtract(df1[i], fill_value=0)
...:
...:
367 ms ± 97.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [49]: df2.head(5)
Out[49]:
0 1 2 3 4 5 6 \
0 0.327470 0.272503 0.549897 0.119997 0.985847 0.445402 0.582878
1 0.752375 0.606053 0.223085 0.062001 0.025440 0.638872 0.188112
2 0.174401 0.944870 0.630128 0.715326 0.298661 0.285740 0.360253
3 0.095649 0.355365 0.523830 0.114555 0.342535 0.393107 0.246344
4 0.250579 0.105054 0.761075 0.574047 0.733976 0.199406 0.658025

7 8 9 ... 990Result 991Result 992Result \
0 0.335388 0.613710 0.104878 ... -0.728738 0.147162 -0.841872
1 0.796243 0.709898 0.133040 ... -0.151361 -0.400989 0.012670
2 0.009304 0.472587 0.108229 ... -0.131590 -0.540945 -0.097455
3 0.798668 0.628953 0.701703 ... -0.461036 0.217387 -0.363704
4 0.387475 0.152143 0.825989 ... -0.021844 0.103296 -0.272207

993Result 994Result 995Result 996Result 997Result 998Result 999Result
0 0.389068 0.470042 0.556146 0.705036 -0.021659 0.250586 -0.662487
1 -0.456462 -0.206587 0.691951 -0.507585 -0.430838 -0.126303 -0.001411
2 -0.018339 0.226750 0.483076 -0.581611 -0.362906 0.796857 -0.367914
3 0.323971 -0.779884 -0.306404 -0.825982 -0.065974 -0.109321 -0.023654
4 0.178328 0.600110 0.222539 0.064416 -0.110039 -0.615137 -0.261765

[5 rows x 2000 columns]

How subtract a Dataframe with totals another Dataframe based on condition and until 0

Welcome to StackOverflow!

I believe a .cumsum() and .idxmin() will help with this.

  1. Join your dataframes on label
  2. Create a new "Running Quantity" column that is a .cumsum() on the "Quantity" column (pandas documenation on .cumsum(); blog post on .cumsum())
  3. Create a new "Running Total" column that is "Total" - "Running Quantity"
  4. Filter to only the positive values in "Running Total" (StackOverflow answer about filtering out negative values)
  5. Filter to the minimum value of "Running Total" per label using .idxmin() (pandas documentation on .idxmin(); StackOverflow answer about .idxmin())

This should give you a three-column data frame with one row per label, the date when the running total was closest to but not lower than 0, and the amount (Total - Running Quantity at that date).

Subtract values matching index in other dataframe

reindex() the discounts using the price df with fill_value=0:

A.set_index('ItemId').Price - B.Discount.reindex(A.ItemId, fill_value=0)

# ItemId
# a1 9.8
# a1 14.8
# a2 7.5
# a3 7.0
# dtype: float64

Timings of the current answers:

timings of map vs reindex vs merge

map_ = lambda A, B: A.Price - A.ItemId.map(B.Discount).fillna(0)
reindex_ = lambda A, B: A.set_index('ItemId').Price - B.Discount.reindex(A.ItemId, fill_value=0)
merge_ = lambda A, B: A.merge(B, on='ItemId', how='left').eval('Price - Discount.fillna(0)')

How to subtract one dataframe from another?

If you reset the index of your klmn1 dataframe to be that of the column L, then your dataframe will automatically align the indices with any series you subtract from it:

In [1]: klmn1.set_index('L')['M'] - m0
Out[1]:
L
a 0.777595
a -0.671791
b 0.779920
b -0.128690
Name: M


Related Topics



Leave a reply



Submit