Subtract values in one dataframe from another
Use this:
within(merge(A,B,by="Link"), {
VU <- VU.x - VU.y
U <- U.x - U.y
P <- P.x - P.y
})[,c("Link","VU","U","P")]
EDIT: Bonus: if there are too many paired columns (not just VU, U and P) you can use this:
M <- merge(A,B,by="Link")
S <- M[,grepl("*\\.x$",names(M))] - M[,grepl("*\\.y$",names(M))]
cbind(M[,1,drop=FALSE],S)
# Link VU.x U.x P.x
#1 DVH1 5 1 22
#2 DVH2 3 0 24
#3 DVH3 10 1 30
How to subtract values in one dataframe from the other based on multiple columns?
Use Series.sub
with fill_value=0
parameter for subtraction with convert columns id1, id2
for MultiIndex
, so subtract is based by these columns:
df = (df1.set_index(['id1','id2'])['Total']
.sub(df2.set_index(['id1','id2'])['Part1'], fill_value=0)
.reset_index(name='new'))
print (df)
id1 id2 new
0 625 AF 70.0
1 625 AG 65.0
2 625 AP 73.0
3 625 BA 112.0
4 625 BM -3.0
5 725 AF 25.0
6 725 AP 42.0
7 725 BA -2.0
8 725 BM 72.0
9 1130 AF 34.0
10 1130 AG -6.0
11 1130 BC 80.0
12 1130 BM 19.0
In Pandas : How can i subtract two dataframes values based on other two dataframe same Column which contain same Values
We can start by merging both df_1
and df_2
on matching values of df_1
on strike_price
using a left
merge
:
>>> df = pd.merge(df_1[['strike_price', 'close']],
... df_2[['strike_price', 'close']],
... how='left',
... left_on=['strike_price'],
... right_on=['strike_price'],
... suffixes=['_df_1',
... '_df_2'])
>>> df
strike_price close_df_1 close_df_2
0 30000 3131.20 3000.0
1 30300 2836.30 NaN
2 30400 2736.95 2744.0
3 30500 2630.00 2800.0
4 30600 2530.60 2650.6
Then, we can build a column diff
subtracting columns close_df_1
and close_df_2
to get the expected result :
>>> df['diff'] = df['close_df_1'] - df['close_df_2']
>>> df
strike_price close_df_1 close_df_2 diff
0 30000 3131.20 3000.0 131.20
1 30300 2836.30 NaN NaN
2 30400 2736.95 2744.0 -7.05
3 30500 2630.00 2800.0 -170.00
4 30600 2530.60 2650.6 -120.00
How to subtract rows between two different dataframes and replace original value?
First solution is create index
in df22
by Bankname
for align by df1
for correct row subracting:
df.set_index('BankName').sub(df2.set_index([['Bank1']]), fill_value=0)
df.set_index('BankName').sub(df2.set_index([['Bank2']]), fill_value=0)
You need create new column to df2
with BankName
, convert BankName
to index
in both DataFrame
s, so possible subtract by this row:
df22 = df2.assign(BankName = 'Bank1').set_index('BankName')
df = df1.set_index('BankName').sub(df22, fill_value=0).reset_index()
print (df)
BankName Value1 Value2
0 Bank1 7.0 53.0
1 Bank2 15.0 65.0
2 Bank3 14.0 54.0
Subtract by Bank2
:
df22 = df2.assign(BankName = 'Bank2').set_index('BankName')
df = df1.set_index('BankName').sub(df22, fill_value=0).reset_index()
print (df)
BankName Value1 Value2
0 Bank1 10.0 55.0
1 Bank2 12.0 63.0
2 Bank3 14.0 54.0
Another solution with filter by BankName
:
m = df1['BankName']=='Bank1'
df1.loc[m, df2.columns] = df1.loc[m, df2.columns].sub(df2.iloc[0])
print (df1)
BankName Value1 Value2
0 Bank1 7 53
1 Bank2 15 65
2 Bank3 14 54
m = df1['BankName']=='Bank2'
df1.loc[m, df2.columns] = df1.loc[m, df2.columns].sub(df2.iloc[0])
How to Subtract one column in pandas from another?
It looks like you want to create new rows. You can index the dataframe by Account
which also has the advantage that the remaining columns are the things you want to subtract. Then subtract and add a new row.
>>> df = pd.DataFrame({'Accounts':['Cash','Build','Build Dep', 'Car', 'Car Dep'],
... 'Debits':[300,500,0,100,0],
... 'Credits':[0,0,250,0,50]})
>>>
>>> df = df.set_index('Accounts')
>>> df.loc['Build Delta'] = df.loc['Build Dep'] - df.loc['Build']
>>> df.loc['Car Delta'] = df.loc['Car'] - df.loc['Car Dep']
>>>
>>> print(df)
Debits Credits
Accounts
Cash 300 0
Build 500 0
Build Dep 0 250
Car 100 0
Car Dep 0 50
Build Delta -500 250
Car Delta 100 -50
If you want to have a column of deltas for all of the rows, just subtract the columns. This is the beauty of numpy and pandas. You can apply operations to entire columns with small amounts of code and get better performance than if you did it in vanilla python.
>>> df = pd.DataFrame({'Accounts':['Cash','Build','Build Dep', 'Car', 'Car Dep'],
... 'Debits':[300,500,0,100,0],
... 'Credits':[0,0,250,0,50]})
>>>
>>> df = df.set_index('Accounts')
>>>
>>>
>>>
>>> df['Delta'] = df['Credits'] - df['Debits']
>>> df
Debits Credits Delta
Accounts
Cash 300 0 -300
Build 500 0 -500
Build Dep 0 250 250
Car 100 0 -100
Car Dep 0 50 50
Subtracting one dataframe column from another dataframe column for multiple columns
You could use Dataframe.subtract to subtract columns in the two dataframes. We loop over columns in df2
and if that column is found in df1
, we perform the subtraction in that column. finally we save the result in a separate column whose name ends with "Result".
In [1]: import pandas as pd
In [2]: df1 = pd.DataFrame({"branch A(pkg XYZ)":[20,10,30,20,50], "branch A(pkg ABC)":[30,30,40,30,10], "branch B(pkg X
...: YZ)":[50, 50, 50, 50, 50]})
In [3]: df1
Out[3]:
branch A(pkg XYZ) branch A(pkg ABC) branch B(pkg XYZ)
0 20 30 50
1 10 30 50
2 30 40 50
3 20 30 50
4 50 10 50
In [4]: df2 = pd.DataFrame({"branch A(pkg XYZ)":[3,2,3,1,4], "branch A(pkg ABC)":[5,6,7,2,0], "branch B(pkg XYZ)":[50,5
...: 0,50,50,50]})
In [5]: df2
Out[5]:
branch A(pkg XYZ) branch A(pkg ABC) branch B(pkg XYZ)
0 3 5 50
1 2 6 50
2 3 7 50
3 1 2 50
4 4 0 50
In [25]: for i in df2.columns:
...: if i in df1.columns:
...: df2[i+"Result"] = df2[i].subtract(df1[i], fill_value=0)
In [29]: df2
Out[29]:
branch A(pkg XYZ) branch A(pkg ABC) branch B(pkg XYZ) \
0 3 5 50
1 2 6 50
2 3 7 50
3 1 2 50
4 4 0 50
branch A(pkg XYZ)Result branch A(pkg ABC)Result branch B(pkg XYZ)Result
0 -17 -25 0
1 -8 -24 0
2 -27 -33 0
3 -19 -28 0
4 -46 -10 0
An attempt with 1000 columns and 100 rows is quite efficient too:
In [40]: import numpy as np
In [41]: df1 = pd.DataFrame(np.random.random((100, 1000)))
In [42]: df2 = pd.DataFrame(np.random.random((100, 1000)))
In [45]: %%timeit
...: for i in df2.columns:
...: if i in df1.columns:
...: df2[str(i)+"Result"] = df2[i].subtract(df1[i], fill_value=0)
...:
...:
367 ms ± 97.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [49]: df2.head(5)
Out[49]:
0 1 2 3 4 5 6 \
0 0.327470 0.272503 0.549897 0.119997 0.985847 0.445402 0.582878
1 0.752375 0.606053 0.223085 0.062001 0.025440 0.638872 0.188112
2 0.174401 0.944870 0.630128 0.715326 0.298661 0.285740 0.360253
3 0.095649 0.355365 0.523830 0.114555 0.342535 0.393107 0.246344
4 0.250579 0.105054 0.761075 0.574047 0.733976 0.199406 0.658025
7 8 9 ... 990Result 991Result 992Result \
0 0.335388 0.613710 0.104878 ... -0.728738 0.147162 -0.841872
1 0.796243 0.709898 0.133040 ... -0.151361 -0.400989 0.012670
2 0.009304 0.472587 0.108229 ... -0.131590 -0.540945 -0.097455
3 0.798668 0.628953 0.701703 ... -0.461036 0.217387 -0.363704
4 0.387475 0.152143 0.825989 ... -0.021844 0.103296 -0.272207
993Result 994Result 995Result 996Result 997Result 998Result 999Result
0 0.389068 0.470042 0.556146 0.705036 -0.021659 0.250586 -0.662487
1 -0.456462 -0.206587 0.691951 -0.507585 -0.430838 -0.126303 -0.001411
2 -0.018339 0.226750 0.483076 -0.581611 -0.362906 0.796857 -0.367914
3 0.323971 -0.779884 -0.306404 -0.825982 -0.065974 -0.109321 -0.023654
4 0.178328 0.600110 0.222539 0.064416 -0.110039 -0.615137 -0.261765
[5 rows x 2000 columns]
How subtract a Dataframe with totals another Dataframe based on condition and until 0
Welcome to StackOverflow!
I believe a .cumsum() and .idxmin() will help with this.
- Join your dataframes on label
- Create a new "Running Quantity" column that is a .cumsum() on the "Quantity" column (pandas documenation on .cumsum(); blog post on .cumsum())
- Create a new "Running Total" column that is "Total" - "Running Quantity"
- Filter to only the positive values in "Running Total" (StackOverflow answer about filtering out negative values)
- Filter to the minimum value of "Running Total" per label using .idxmin() (pandas documentation on .idxmin(); StackOverflow answer about .idxmin())
This should give you a three-column data frame with one row per label, the date when the running total was closest to but not lower than 0, and the amount (Total - Running Quantity at that date).
Subtract values matching index in other dataframe
reindex()
the discounts using the price df with fill_value=0
:
A.set_index('ItemId').Price - B.Discount.reindex(A.ItemId, fill_value=0)
# ItemId
# a1 9.8
# a1 14.8
# a2 7.5
# a3 7.0
# dtype: float64
Timings of the current answers:
map_ = lambda A, B: A.Price - A.ItemId.map(B.Discount).fillna(0)
reindex_ = lambda A, B: A.set_index('ItemId').Price - B.Discount.reindex(A.ItemId, fill_value=0)
merge_ = lambda A, B: A.merge(B, on='ItemId', how='left').eval('Price - Discount.fillna(0)')
How to subtract one dataframe from another?
If you reset the index of your klmn1 dataframe to be that of the column L, then your dataframe will automatically align the indices with any series you subtract from it:
In [1]: klmn1.set_index('L')['M'] - m0
Out[1]:
L
a 0.777595
a -0.671791
b 0.779920
b -0.128690
Name: M
Related Topics
Extract Time (Hms) from Lubridate Date Time Object
Inserting Stargazer or Xable Table into Knitr Document
Understanding Lm and Environment
Multiple Groups in Geom_Density() Plot
Sum Amount Last 6 Month Prior to the Date of Transaction
Split a File Path into Folder Names Vector
Roracle Not Working in R Studio
Inline Function Code Doesn't Compile
How to Prevent User from Setting the End Date Before the Start Date Using the Shiny Daterangeinput
Using Data.Table to Create a Column of Regression Coefficients
Usemethod("Predict"):No Applicable Method for 'Predict' Applied to an Object of Class "Train"
Get(X) Does Not Work in R Data.Table When X Is Also a Column in the Data Table
How to Use R Package "Formattable" in Shiny Dashboard
Refer to Range of Columns by Name in R
How to Apply Separate Coord_Cartesian() to "Zoom In" into Individual Panels of a Facet_Grid()