Calculate Row-Wise Proportions

Calculate row-wise proportions

And another alternative (though this is mostly a pretty version of sweep)... prop.table:

> cbind(x[1], prop.table(as.matrix(x[-1]), margin = 1))
id val0 val1 val2
1 a 0.08333333 0.3333333 0.5833333
2 b 0.13333333 0.3333333 0.5333333
3 c 0.16666667 0.3333333 0.5000000

From the "description" section of the help file at ?prop.table:

This is really sweep(x, margin, margin.table(x, margin), "/") for newbies, except that if margin has length zero, then one gets x/sum(x).

So, you can see that underneath, this is really quite similar to @Jilber's solution.

And... it's nice for the R developers to be considerate of us newbies, isn't it? :)

Calculate row wise percentage in pandas

We can do the following:

  1. We slice the correct columns with iloc
  2. Use apply with axis=1 to apply each calculation row wise
  3. We use div, sum and mul to divide each value to the rows sum and multiply it by 100 to get the percentages in whole numbers not decimals
  4. We convert our floats back to int with astype
df.iloc[:, 1:] = df.iloc[:, 1:].apply(lambda x: x.div(x.sum()).mul(100), axis=1).astype(int)

Output

  id  val1  val2  val3
0 a 50 30 20
1 b 40 36 24
2 c 40 35 25

Or a vectorized solution, accessing the numpy arrays underneath our dataframe.

note: this method should perform better in terms of speed

df.iloc[:, 1:] = (df.iloc[:, 1:] / df.sum(axis=1)[:, None]).mul(100).astype(int)

Or similar but using the pandas DataFrame.div method:

proposed by Jon Clements

df.iloc[:, 1:] = df.iloc[:, 1:].div(df.iloc[:, 1:].sum(1), axis=0).mul(100)

Apply row-wise transformation in R so that total percentage for each row will be 100%

You can use proportions to get percentages.

proportions(as.matrix(df[1:4,-1]), 1) * 100
# A B C D
#1 38.33780 32.84182 16.21984 12.60054
#2 39.73214 33.18452 16.66667 10.41667
#3 39.27126 32.25371 16.32928 12.14575
#4 36.26374 31.71115 18.36735 13.65777

Calculate multi-group row-wise proportion in R

This is how I ended up doing it. It's a combination of splitting the data and within each folder level doing some filtering, and a little name change to be able to rejoin later.
Once each z_stack_id has the proper value for each channel (FITC_blobs, TRITC_blobs, Cy5_blobs) we can bind_rows and do the proportions. This method still gives spurious proportions, but they can be filtered out somewhat easily.

I had to do some column renaming because my real data had different columns than the ones posted in the simplified question. I condensed it into a function.

calculate_blob_proportions <- function(dataframe){




dataframe <- dataframe %>% ungroup()

# prepare a list

li <- list()


for (i in unique(dataframe$folder)){
# Get each folder
my_df <- dataframe %>% filter(folder == i) %>%
mutate(filename_cells = ifelse(is.na(filename_cells),
filename_coloc,
filename_cells)) %>%
rename(filename = filename_cells) %>%
select(-filename_coloc)

Cy5 <- filter(my_df, binary_layer=="Cy5") %>%
rename(Cy5_blobs = n_blobs) %>%
select(-binary_layer, -filename) %>%
left_join(my_df)

TRITC <- filter(my_df, binary_layer=="TRITC") %>%
rename(TRITC_blobs = n_blobs) %>%
select(-binary_layer, -filename) %>%
left_join(my_df)

FITC <- filter(my_df, binary_layer=="FITC") %>%
rename(FITC_blobs = n_blobs) %>%
select(-binary_layer, -filename) %>%
left_join(my_df)


li[[i]] <- left_join(Cy5,left_join(TRITC,FITC)) %>%
select(RatID, folder, filename, z_stack_id,
binary_layer, n_blobs,
FITC_blobs, TRITC_blobs, Cy5_blobs)

}


df_out <- bind_rows(li) %>%
mutate(FITC_prop = n_blobs/FITC_blobs,
TRITC_prop = n_blobs/TRITC_blobs,
Cy5_prop = n_blobs/Cy5_blobs)

return(df_out)

}

Calculate proportions of rows in dataframe

I think you need divide sum of columns with exclude first 2 columns selected by DataFrame.iloc:

df.iloc[:, 2:] = df.iloc[:, 2:].div(df.iloc[:, 2:].sum(axis=1).div(10), axis=0)
print (df)
education experience ExpenseA ExpenseB ExpenseC
0 uni yes 3.000000 2.000000 5.000000
1 uni no 3.333333 2.857143 3.809524
2 middle yes 2.000000 0.000000 8.000000
3 high no 4.800000 2.000000 3.200000
4 uni yes 2.000000 4.666667 3.333333

Or sum columns with Expense substrings by DataFrame.filter:

df1 = df.filter(like='Expense')

df[df1.columns] = df1.div(df1.sum(axis=1).div(10), axis=0)

Sum a row of specific columns, then find ratio of row, column entry to sum of row

We can use prop.table with margin = 1 to calculate row-wise proportions.

cbind(df[1], prop.table(as.matrix(df[-1]), 1))

# a b c d
#1 8 0.333 0.370 0.296
#2 6 0.368 0.316 0.316
#3 4 0.400 0.300 0.300
#4 2 0.375 0.375 0.250

To make the selection of columns more explicit

cols <- c("b", "c", "d")
cbind(df[setdiff(names(df), cols)], prop.table(as.matrix(df[cols]), 1))

Tidy way to convert numeric columns from counts to proportions

Rephrase to the following:

df %>%
mutate_if(is.numeric, ~ . / rowSums(select(df, where(is.numeric))))

Output:

  id         x         y
1 A 0.3333333 0.6666667
2 B 0.3333333 0.6666667
3 C 0.3333333 0.6666667
4 D 0.3333333 0.6666667

Edit: If you want an answer that doesn't use any additional packages besides dplyr and base, and that can be piped more easily, here's one other (hacky) solution:

df %>%
group_by(id) %>%
mutate(sum = as.character(rowSums(select(cur_data(), is.numeric)))) %>%
summarise_if(is.numeric, ~ . / as.numeric(sum))

The usual dplyr ways of referring to the current data within a function (e.g. cur_data) don't seem to play nicely with rowSums in my original phrasing, so I took a slightly different approach here. There is likely a better way of doing this though, so I'm open to suggestions.



Related Topics



Leave a reply



Submit