sum cells of certain columns for each row
You could do something like this:
summed <- rowSums(zscore[, c(1, 2, 3, 5)])
summing rows of specific columns then dividing by the sum
Try this way to specify the column (by sub-setting Df
), and then indicating the margin as 1
Df_new = t(apply(Df[,c(1:3)], 1, \(x) x/sum(x)))
lose draw win
[1,] 0.5000 0.1428571 0.3571429
[2,] 0.0625 0.1250000 0.8125000
Sum values across a row if they are in certain columns, and if adjacent cells have a specific value
If the blocks are all right next to each as shown in your image, and the value you want to sum is always 1 cell to the right of the time value, and only the time values can be >= 1.5 (all of your MedX values are < 1), then this formula would work for you:
=SUMIF(D3:O3,">=1.5",E3:P3)
If it's possible for the MedX values to be >= 1.5, then this more explicit formula should work for you:
=SUMPRODUCT(--($D$2:$O$2="Time"),--(D3:O3>=1.5),--($E$2:$P$2="Med1"),E3:P3)
Sum if column name is higher than row value
We can check with np.greater_equal.outer
, then slice the column mask the unwanted cell with boolean output as NaN
s = pd.to_datetime(df.date).values
m = np.greater_equal.outer(pd.to_datetime(df.columns[:-1]).values,s).T
df = df.append(df.iloc[:,:-1].where(m).sum().to_frame('Total').T)
df
Out[381]:
01-01-2020 01-01-2021 01-01-2022 date
1 1.0 3.0 6.0 01-01-2020
2 4.0 4.0 2.0 01-10-2021
3 5.0 1.0 9.0 01-12-2021
Total 1.0 3.0 17.0 NaN
Pandas: sum DataFrame rows for given columns
You can just sum
and set param axis=1
to sum the rows, this will ignore none numeric columns:
In [91]:
df = pd.DataFrame({'a': [1,2,3], 'b': [2,3,4], 'c':['dd','ee','ff'], 'd':[5,9,1]})
df['e'] = df.sum(axis=1)
df
Out[91]:
a b c d e
0 1 2 dd 5 8
1 2 3 ee 9 14
2 3 4 ff 1 8
If you want to just sum specific columns then you can create a list of the columns and remove the ones you are not interested in:
In [98]:
col_list= list(df)
col_list.remove('d')
col_list
Out[98]:
['a', 'b', 'c']
In [99]:
df['e'] = df[col_list].sum(axis=1)
df
Out[99]:
a b c d e
0 1 2 dd 5 3
1 2 3 ee 9 5
2 3 4 ff 1 7
How to set the last column of a pandas dataframe as the sum of certain columns?
One of the advantages of pandas is that you can often use vectorized operations instead of loops. Thus in your case it's possible to sum over a 2-dimensional slice of the dataframe like this:
df['your_score'] = df.loc[:, 'Week1':'Week19'].sum(axis=1)
The loc
operator allows indexing and slicing by labels. :
selects all the rows, so we get a sub-dataframe. The df.sum()
method is based on the NumPy function of the same name, so you can select a dimension to sum over with the axis
argument.
Related Topics
Comma Separator for Numbers in R
Simplest Way to Get Rbind to Ignore Column Names
How to Round Up to the Nearest 10 (Or 100 or X)
Create Categories by Comparing a Numeric Column with a Fixed Value
How to Make Tibbles Display Significant Digits
Difference Between Passing Options in Aes() and Outside of It in Ggplot2
Efficient Way to Filter One Data Frame by Ranges in Another
Error in Plot.New():Figure Margins Too Large, Scatter Plot
How to Assign the Result of the Previous Expression to a Variable
Missing Legend with Ggplot2 and Geom_Line
Remove Rows from Data Frame Where a Row Matches a String
Perform a Semi-Join with Data.Table
Split Date into Different Columns for Year, Month and Day