Calculate row-wise proportions
And another alternative (though this is mostly a pretty version of sweep
)... prop.table
:
> cbind(x[1], prop.table(as.matrix(x[-1]), margin = 1))
id val0 val1 val2
1 a 0.08333333 0.3333333 0.5833333
2 b 0.13333333 0.3333333 0.5333333
3 c 0.16666667 0.3333333 0.5000000
From the "description" section of the help file at ?prop.table
:
This is really
sweep(x, margin, margin.table(x, margin), "/")
for newbies, except that if margin has length zero, then one gets x/sum(x).
So, you can see that underneath, this is really quite similar to @Jilber's solution.
And... it's nice for the R developers to be considerate of us newbies, isn't it? :)
Calculate row wise percentage in pandas
We can do the following:
- We slice the correct columns with
iloc
- Use
apply
withaxis=1
to apply each calculation row wise - We use
div
,sum
andmul
to divide each value to the rows sum and multiply it by 100 to get the percentages in whole numbers not decimals - We convert our
floats
back toint
withastype
df.iloc[:, 1:] = df.iloc[:, 1:].apply(lambda x: x.div(x.sum()).mul(100), axis=1).astype(int)
Output
id val1 val2 val3
0 a 50 30 20
1 b 40 36 24
2 c 40 35 25
Or a vectorized solution, accessing the numpy arrays
underneath our dataframe.
note: this method should perform better in terms of speed
df.iloc[:, 1:] = (df.iloc[:, 1:] / df.sum(axis=1)[:, None]).mul(100).astype(int)
Or similar but using the pandas DataFrame.div
method:
proposed by Jon Clements
df.iloc[:, 1:] = df.iloc[:, 1:].div(df.iloc[:, 1:].sum(1), axis=0).mul(100)
Apply row-wise transformation in R so that total percentage for each row will be 100%
You can use proportions
to get percentages.
proportions(as.matrix(df[1:4,-1]), 1) * 100
# A B C D
#1 38.33780 32.84182 16.21984 12.60054
#2 39.73214 33.18452 16.66667 10.41667
#3 39.27126 32.25371 16.32928 12.14575
#4 36.26374 31.71115 18.36735 13.65777
Calculate multi-group row-wise proportion in R
This is how I ended up doing it. It's a combination of splitting the data and within each folder
level doing some filtering, and a little name change to be able to rejoin later.
Once each z_stack_id
has the proper value for each channel (FITC_blobs, TRITC_blobs, Cy5_blobs) we can bind_rows
and do the proportions. This method still gives spurious proportions, but they can be filtered out somewhat easily.
I had to do some column renaming because my real data had different columns than the ones posted in the simplified question. I condensed it into a function.
calculate_blob_proportions <- function(dataframe){
dataframe <- dataframe %>% ungroup()
# prepare a list
li <- list()
for (i in unique(dataframe$folder)){
# Get each folder
my_df <- dataframe %>% filter(folder == i) %>%
mutate(filename_cells = ifelse(is.na(filename_cells),
filename_coloc,
filename_cells)) %>%
rename(filename = filename_cells) %>%
select(-filename_coloc)
Cy5 <- filter(my_df, binary_layer=="Cy5") %>%
rename(Cy5_blobs = n_blobs) %>%
select(-binary_layer, -filename) %>%
left_join(my_df)
TRITC <- filter(my_df, binary_layer=="TRITC") %>%
rename(TRITC_blobs = n_blobs) %>%
select(-binary_layer, -filename) %>%
left_join(my_df)
FITC <- filter(my_df, binary_layer=="FITC") %>%
rename(FITC_blobs = n_blobs) %>%
select(-binary_layer, -filename) %>%
left_join(my_df)
li[[i]] <- left_join(Cy5,left_join(TRITC,FITC)) %>%
select(RatID, folder, filename, z_stack_id,
binary_layer, n_blobs,
FITC_blobs, TRITC_blobs, Cy5_blobs)
}
df_out <- bind_rows(li) %>%
mutate(FITC_prop = n_blobs/FITC_blobs,
TRITC_prop = n_blobs/TRITC_blobs,
Cy5_prop = n_blobs/Cy5_blobs)
return(df_out)
}
Calculate proportions of rows in dataframe
I think you need divide sum of columns with exclude first 2 columns selected by DataFrame.iloc
:
df.iloc[:, 2:] = df.iloc[:, 2:].div(df.iloc[:, 2:].sum(axis=1).div(10), axis=0)
print (df)
education experience ExpenseA ExpenseB ExpenseC
0 uni yes 3.000000 2.000000 5.000000
1 uni no 3.333333 2.857143 3.809524
2 middle yes 2.000000 0.000000 8.000000
3 high no 4.800000 2.000000 3.200000
4 uni yes 2.000000 4.666667 3.333333
Or sum columns with Expense
substrings by DataFrame.filter
:
df1 = df.filter(like='Expense')
df[df1.columns] = df1.div(df1.sum(axis=1).div(10), axis=0)
Sum a row of specific columns, then find ratio of row, column entry to sum of row
We can use prop.table
with margin = 1 to calculate row-wise proportions.
cbind(df[1], prop.table(as.matrix(df[-1]), 1))
# a b c d
#1 8 0.333 0.370 0.296
#2 6 0.368 0.316 0.316
#3 4 0.400 0.300 0.300
#4 2 0.375 0.375 0.250
To make the selection of columns more explicit
cols <- c("b", "c", "d")
cbind(df[setdiff(names(df), cols)], prop.table(as.matrix(df[cols]), 1))
Tidy way to convert numeric columns from counts to proportions
Rephrase to the following:
df %>%
mutate_if(is.numeric, ~ . / rowSums(select(df, where(is.numeric))))
Output:
id x y
1 A 0.3333333 0.6666667
2 B 0.3333333 0.6666667
3 C 0.3333333 0.6666667
4 D 0.3333333 0.6666667
Edit: If you want an answer that doesn't use any additional packages besides dplyr and base, and that can be piped more easily, here's one other (hacky) solution:
df %>%
group_by(id) %>%
mutate(sum = as.character(rowSums(select(cur_data(), is.numeric)))) %>%
summarise_if(is.numeric, ~ . / as.numeric(sum))
The usual dplyr ways of referring to the current data within a function (e.g. cur_data
) don't seem to play nicely with rowSums
in my original phrasing, so I took a slightly different approach here. There is likely a better way of doing this though, so I'm open to suggestions.
Related Topics
Examples of the Perils of Globals in R and Stata
Avoid Clipping of Points Along Axis in Ggplot
Converting Nested List to Dataframe
Dplyr - Using Column Names as Function Arguments
Replacing Numbers Within a Range with a Factor
How to Change Library Location in R
Rolling Join on Data.Table with Duplicate Keys
Why and Where Are \N Newline Characters Getting Introduced to C()
How to Get the Classes of All Columns in a Data Frame
Interpolate Na Values in a Data Frame with Na.Approx
Can't Execute Rsdriver (Connection Refused)
Simplest Way to Get Rbind to Ignore Column Names
Align Ggplot2 Plots Vertically
What Is "Object of Type 'Closure' Is Not Subsettable" Error in Shiny
Remove Backslashes from Character String
Explain Ggplot2 Warning: "Removed K Rows Containing Missing Values"