Equivalent to Rowmeans() for Min()

Equivalent to rowMeans() for min()

You could use pmin, but you would have to get each column of your matrix into a separate vector. One way to do that is to convert it to a data.frame then call pmin via do.call (since data.frames are lists).

system.time(do.call(pmin, as.data.frame(m)))
#    user  system elapsed 
#   0.940   0.000   0.949 
system.time(apply(m,1,min))
#    user  system elapsed 
#   16.84    0.00   16.95

How do I add a column to a data frame consisting of minimum values from other columns?

You can use apply() function to do this. See below.

df$C <- apply(df, 1, min)

The second argument allows you to choose the dimension in which you want min to be applied, in this case 1, applies min to all columns in each row separately.

You can choose specific columns from the dataframe, as follows:

df$newCol <- apply(df[c('A','B')], 1, min)

How to find if we have two or more equal minimum values in specific columns

If you have a large dataset, the following might be fast. It uses package matrixStats, function rowMins. See this answer.

icol <- grepl("^a", names(mydata))
min_row <- matrixStats::rowMins(as.matrix(mydata[icol]))

mydata$flag <- rowSums(mydata[icol] == min_row) > 1 & min_row < 200

Calculating row means without having to provide column names and selectively removing columns based on each columns sum

With select_if we can select the numeric columns

library(dplyr)
library(matrixStats)
df1 %>%
    mutate(Median = select_if(., is.numeric) %>% 
                               as.matrix %>% 
                              rowMedians, 
           Mean =select_if(., is.numeric) %>% 
                        rowMeans )

Or convert to 'long' format and then do the group by row

library(dplyr)
library(tidyr)
df1 %>% 
   select_if(is.numeric) %>%
   mutate(rn = row_number()) %>%
   pivot_longer(cols = -rn) %>%
   group_by(rn) %>%
   summarise(Median = median(value), Mean = mean(value), Min = min(value)) %>%
   select(-rn) %>% 
   bind_cols(df1, .)
#      ID loc1 loc2 loc3 loc4 Median   Mean Min
#1  daisy   10  100    0 1000     55  277.5   0
#2   lily   20  200    0 2000    110  555.0   0
#3   rose   30  300    0 3000    165  832.5   0
#4  tulip   40  400    0 4000    220 1110.0   0
#5  poppy   50  500    0 5000    275 1387.5   0
#6   iris   60  600    0 6000    330 1665.0   0
#7 orchid   70  700    0 7000    385 1942.5   0
#8  lotus   80  800    0 8000    440 2220.0   0
#9 crocus   90  900    0 9000    495 2497.5   0

To get the sum of numeric columns and with a condition to sum only if the column sum is greater than 0

df1 %>% 
     summarise_if(~is.numeric(.) && sum(.) > 0, sum)
#  loc1 loc2  loc4
#1  450 4500 45000

Or using base R

Filter(sum, colSums(df1[-1]))
#  loc1  loc2  loc4 
#   450  4500 45000

If the intention is to select the columns with sum > 0 and numeric, then use select_if

df1 %>% 
   select_if(~ is.numeric(.) && sum(.) > 0)
#  loc1 loc2 loc4
#1   10  100 1000
#2   20  200 2000
#3   30  300 3000
#4   40  400 4000
#5   50  500 5000
#6   60  600 6000
#7   70  700 7000
#8   80  800 8000
#9   90  900 9000

Or to include the first column factor as well

df1 %>% 
    select_if(~ is.factor(.)|(is.numeric(.) && sum(.) > 0))
#      ID loc1 loc2 loc4
#1  daisy   10  100 1000
#2   lily   20  200 2000
#3   rose   30  300 3000
#4  tulip   40  400 4000
#5  poppy   50  500 5000
#6   iris   60  600 6000
#7 orchid   70  700 7000
#8  lotus   80  800 8000
#9 crocus   90  900 9000

Or using the OP's code, we add + 1 to it because the cs was created by removing the first column

df1 %>% 
      select(which(cs > 0)+1)

Including the first column

df1 %>% 
     select(1, which(cs > 0)+1)

Or remove the first column from 'df1' and then use the code from the OP's post

df1 %>%
  select(-1) %>%
  select( which(cs > 0))

How to replace NAs with row means if proportion of row-wise NAs is below a certain threshold?

Here is a way to do it all in one chain using dplyr using your supplied data frame.

First create a vector of all column names of interest:

name_col <- colnames(mental)[2:16]

And now use dplyr

library(dplyr)

mental %>% 
  # First create the column of row means
  mutate(somatic_mean = rowMeans(.[name_col], na.rm = TRUE)) %>% 
  # Now calculate the proportion of NAs
  mutate(somatic_na = rowMeans(is.na(.[name_col]))) %>% 
  # Create this column for filtering out later
  mutate(somatic_usable = ifelse(somatic_na < 0.2,
                                 "yes", "no")) %>% 
  # Make the following replacement on a row basis 
  rowwise() %>%
  mutate_at(vars(name_col), # Designate eligible columns to check for NAs
            funs(replace(., 
                         is.na(.) & somatic_na < 0.2, # Both conditions need to be met
                         somatic_mean))) %>% # What we are subbing the NAs with
  ungroup() # Now ungroup the 'rowwise' in case you need to modify further

Now, if you wanted to only select the entries that have less than 20% NAs, you can pipe the above into the following:

filter(somatic_usable == "yes")

Also of note, if you wanted to instead make the condition less than or equal to 20%, you would need to replace the two somatic_na < 0.2 with somatic_na <= 0.2.

Hope this helps!

rowmean and standard deviation using data.table

The issues lies in sd() which doesn't work row-wise.

x[,
  c("meanY",'sdY',"nY") := 
    .(rowMeans(.SD, na.rm = TRUE), 
      apply(.SD, 1, sd, na.rm = TRUE), 
      rowSums(!is.na(.SD))), 
  .SDcols = 2:10]

Min and Median of Multiple Columns of a DF by Row in R

You can use apply like this (the 1 means calculate by row, 2 would calculate by column):

the_min <- apply(df, 1, min)   
the_median <- apply(df, 1, median)
df$Min <- the_min
df$Median <- the_median

Equivalent to Rowmeans() for Min()