Subtract Values in One Dataframe from Another

Subtract values in one dataframe from another

Use this:

within(merge(A,B,by="Link"), {
    VU <- VU.x - VU.y
    U <- U.x - U.y
    P <- P.x - P.y
})[,c("Link","VU","U","P")]

EDIT: Bonus: if there are too many paired columns (not just VU, U and P) you can use this:

M <- merge(A,B,by="Link")

S <- M[,grepl("*\\.x$",names(M))] - M[,grepl("*\\.y$",names(M))]

cbind(M[,1,drop=FALSE],S)

#  Link VU.x U.x P.x
#1 DVH1    5   1  22
#2 DVH2    3   0  24
#3 DVH3   10   1  30

How to subtract values in one dataframe from the other based on multiple columns?

Use Series.sub with fill_value=0 parameter for subtraction with convert columns id1, id2 for MultiIndex, so subtract is based by these columns:

df = (df1.set_index(['id1','id2'])['Total']
         .sub(df2.set_index(['id1','id2'])['Part1'], fill_value=0)
         .reset_index(name='new'))
print (df)
     id1 id2    new
0    625  AF   70.0
1    625  AG   65.0
2    625  AP   73.0
3    625  BA  112.0
4    625  BM   -3.0
5    725  AF   25.0
6    725  AP   42.0
7    725  BA   -2.0
8    725  BM   72.0
9   1130  AF   34.0
10  1130  AG   -6.0
11  1130  BC   80.0
12  1130  BM   19.0

In Pandas : How can i subtract two dataframes values based on other two dataframe same Column which contain same Values

We can start by merging both df_1 and df_2 on matching values of df_1 on strike_price using a left merge :

>>> df = pd.merge(df_1[['strike_price', 'close']],
...               df_2[['strike_price', 'close']],
...               how='left',
...               left_on=['strike_price'],
...               right_on=['strike_price'],
...               suffixes=['_df_1',
...                         '_df_2'])
>>> df
    strike_price    close_df_1  close_df_2
0   30000           3131.20     3000.0
1   30300           2836.30     NaN
2   30400           2736.95     2744.0
3   30500           2630.00     2800.0
4   30600           2530.60     2650.6

Then, we can build a column diff subtracting columns close_df_1 and close_df_2 to get the expected result :

>>> df['diff'] = df['close_df_1'] - df['close_df_2']
>>> df
    strike_price    close_df_1  close_df_2  diff
0   30000           3131.20     3000.0      131.20
1   30300           2836.30     NaN         NaN
2   30400           2736.95     2744.0      -7.05
3   30500           2630.00     2800.0      -170.00
4   30600           2530.60     2650.6      -120.00

How to subtract rows between two different dataframes and replace original value?

First solution is create index in df22 by Bankname for align by df1 for correct row subracting:

df.set_index('BankName').sub(df2.set_index([['Bank1']]), fill_value=0)

df.set_index('BankName').sub(df2.set_index([['Bank2']]), fill_value=0)

You need create new column to df2 with BankName, convert BankName to index in both DataFrames, so possible subtract by this row:

df22 = df2.assign(BankName = 'Bank1').set_index('BankName')
df = df1.set_index('BankName').sub(df22, fill_value=0).reset_index()
print (df)
  BankName  Value1  Value2
0    Bank1     7.0    53.0
1    Bank2    15.0    65.0
2    Bank3    14.0    54.0

Subtract by Bank2:

df22 = df2.assign(BankName = 'Bank2').set_index('BankName')
df = df1.set_index('BankName').sub(df22, fill_value=0).reset_index()
print (df)

  BankName  Value1  Value2
0    Bank1    10.0    55.0
1    Bank2    12.0    63.0
2    Bank3    14.0    54.0

Another solution with filter by BankName:

m = df1['BankName']=='Bank1'
df1.loc[m, df2.columns] = df1.loc[m, df2.columns].sub(df2.iloc[0])
print (df1)
  BankName  Value1  Value2
0    Bank1       7      53
1    Bank2      15      65
2    Bank3      14      54

m = df1['BankName']=='Bank2'
df1.loc[m, df2.columns] = df1.loc[m, df2.columns].sub(df2.iloc[0])

How to Subtract one column in pandas from another?

It looks like you want to create new rows. You can index the dataframe by Account which also has the advantage that the remaining columns are the things you want to subtract. Then subtract and add a new row.

>>> df = pd.DataFrame({'Accounts':['Cash','Build','Build Dep', 'Car', 'Car Dep'],
...               'Debits':[300,500,0,100,0],
...               'Credits':[0,0,250,0,50]})
>>> 
>>> df = df.set_index('Accounts')
>>> df.loc['Build Delta'] = df.loc['Build Dep'] - df.loc['Build']
>>> df.loc['Car Delta'] = df.loc['Car'] - df.loc['Car Dep']
>>> 
>>> print(df)
             Debits  Credits
Accounts                    
Cash            300        0
Build           500        0
Build Dep         0      250
Car             100        0
Car Dep           0       50
Build Delta    -500      250
Car Delta       100      -50

If you want to have a column of deltas for all of the rows, just subtract the columns. This is the beauty of numpy and pandas. You can apply operations to entire columns with small amounts of code and get better performance than if you did it in vanilla python.

>>> df = pd.DataFrame({'Accounts':['Cash','Build','Build Dep', 'Car', 'Car Dep'],
...               'Debits':[300,500,0,100,0],
...               'Credits':[0,0,250,0,50]})
>>> 
>>> df = df.set_index('Accounts')
>>> 
>>> 
>>> 
>>> df['Delta'] = df['Credits'] - df['Debits']
>>> df
           Debits  Credits  Delta
Accounts                         
Cash          300        0   -300
Build         500        0   -500
Build Dep       0      250    250
Car           100        0   -100
Car Dep         0       50     50

Subtracting one dataframe column from another dataframe column for multiple columns

You could use Dataframe.subtract to subtract columns in the two dataframes. We loop over columns in df2 and if that column is found in df1, we perform the subtraction in that column. finally we save the result in a separate column whose name ends with "Result".

In [1]: import pandas as pd

In [2]: df1 = pd.DataFrame({"branch A(pkg XYZ)":[20,10,30,20,50], "branch A(pkg ABC)":[30,30,40,30,10], "branch B(pkg X
   ...: YZ)":[50, 50, 50, 50, 50]})

In [3]: df1
Out[3]:
   branch A(pkg XYZ)  branch A(pkg ABC)  branch B(pkg XYZ)
0                 20                 30                 50
1                 10                 30                 50
2                 30                 40                 50
3                 20                 30                 50
4                 50                 10                 50

In [4]: df2 = pd.DataFrame({"branch A(pkg XYZ)":[3,2,3,1,4], "branch A(pkg ABC)":[5,6,7,2,0], "branch B(pkg XYZ)":[50,5
   ...: 0,50,50,50]})

In [5]: df2
Out[5]:
   branch A(pkg XYZ)  branch A(pkg ABC)  branch B(pkg XYZ)
0                  3                  5                 50
1                  2                  6                 50
2                  3                  7                 50
3                  1                  2                 50
4                  4                  0                 50

In [25]: for i in df2.columns:
    ...:     if i in df1.columns:
    ...:         df2[i+"Result"] = df2[i].subtract(df1[i], fill_value=0)

In [29]: df2
Out[29]:
   branch A(pkg XYZ)  branch A(pkg ABC)  branch B(pkg XYZ)  \
0                  3                  5                 50
1                  2                  6                 50
2                  3                  7                 50
3                  1                  2                 50
4                  4                  0                 50

   branch A(pkg XYZ)Result  branch A(pkg ABC)Result  branch B(pkg XYZ)Result
0                      -17                      -25                        0
1                       -8                      -24                        0
2                      -27                      -33                        0
3                      -19                      -28                        0
4                      -46                      -10                        0

An attempt with 1000 columns and 100 rows is quite efficient too:

In [40]: import numpy as np
In [41]: df1 = pd.DataFrame(np.random.random((100, 1000)))
In [42]: df2 = pd.DataFrame(np.random.random((100, 1000)))
In [45]: %%timeit
    ...: for i in df2.columns:
    ...:     if i in df1.columns:
    ...:         df2[str(i)+"Result"] = df2[i].subtract(df1[i], fill_value=0)
    ...:
    ...:
367 ms ± 97.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [49]: df2.head(5)
Out[49]:
          0         1         2         3         4         5         6  \
0  0.327470  0.272503  0.549897  0.119997  0.985847  0.445402  0.582878
1  0.752375  0.606053  0.223085  0.062001  0.025440  0.638872  0.188112
2  0.174401  0.944870  0.630128  0.715326  0.298661  0.285740  0.360253
3  0.095649  0.355365  0.523830  0.114555  0.342535  0.393107  0.246344
4  0.250579  0.105054  0.761075  0.574047  0.733976  0.199406  0.658025

          7         8         9  ...  990Result  991Result  992Result  \
0  0.335388  0.613710  0.104878  ...  -0.728738   0.147162  -0.841872
1  0.796243  0.709898  0.133040  ...  -0.151361  -0.400989   0.012670
2  0.009304  0.472587  0.108229  ...  -0.131590  -0.540945  -0.097455
3  0.798668  0.628953  0.701703  ...  -0.461036   0.217387  -0.363704
4  0.387475  0.152143  0.825989  ...  -0.021844   0.103296  -0.272207

   993Result  994Result  995Result  996Result  997Result  998Result  999Result
0   0.389068   0.470042   0.556146   0.705036  -0.021659   0.250586  -0.662487
1  -0.456462  -0.206587   0.691951  -0.507585  -0.430838  -0.126303  -0.001411
2  -0.018339   0.226750   0.483076  -0.581611  -0.362906   0.796857  -0.367914
3   0.323971  -0.779884  -0.306404  -0.825982  -0.065974  -0.109321  -0.023654
4   0.178328   0.600110   0.222539   0.064416  -0.110039  -0.615137  -0.261765

[5 rows x 2000 columns]

How subtract a Dataframe with totals another Dataframe based on condition and until 0

Welcome to StackOverflow!

I believe a .cumsum() and .idxmin() will help with this.

Join your dataframes on label
Create a new "Running Quantity" column that is a .cumsum() on the "Quantity" column (pandas documenation on .cumsum(); blog post on .cumsum())
Create a new "Running Total" column that is "Total" - "Running Quantity"
Filter to only the positive values in "Running Total" (StackOverflow answer about filtering out negative values)
Filter to the minimum value of "Running Total" per label using .idxmin() (pandas documentation on .idxmin(); StackOverflow answer about .idxmin())

This should give you a three-column data frame with one row per label, the date when the running total was closest to but not lower than 0, and the amount (Total - Running Quantity at that date).

Subtract values matching index in other dataframe

reindex() the discounts using the price df with fill_value=0:

A.set_index('ItemId').Price - B.Discount.reindex(A.ItemId, fill_value=0)

# ItemId
# a1     9.8
# a1    14.8
# a2     7.5
# a3     7.0
# dtype: float64

Timings of the current answers:

timings of map vs reindex vs merge

map_ = lambda A, B: A.Price - A.ItemId.map(B.Discount).fillna(0)
reindex_ = lambda A, B: A.set_index('ItemId').Price - B.Discount.reindex(A.ItemId, fill_value=0)
merge_ = lambda A, B: A.merge(B, on='ItemId', how='left').eval('Price - Discount.fillna(0)')

How to subtract one dataframe from another?

If you reset the index of your klmn1 dataframe to be that of the column L, then your dataframe will automatically align the indices with any series you subtract from it:

In [1]: klmn1.set_index('L')['M'] - m0
Out[1]:
L
a    0.777595
a   -0.671791
b    0.779920
b   -0.128690
Name: M

Subtract Values in One Dataframe from Another