Equivalent to Rowmeans() for Min()

Equivalent to rowMeans() for min()

You could use pmin, but you would have to get each column of your matrix into a separate vector. One way to do that is to convert it to a data.frame then call pmin via do.call (since data.frames are lists).

system.time(do.call(pmin, as.data.frame(m)))
# user system elapsed
# 0.940 0.000 0.949
system.time(apply(m,1,min))
# user system elapsed
# 16.84 0.00 16.95

How do I add a column to a data frame consisting of minimum values from other columns?

You can use apply() function to do this. See below.

df$C <- apply(df, 1, min)

The second argument allows you to choose the dimension in which you want min to be applied, in this case 1, applies min to all columns in each row separately.

You can choose specific columns from the dataframe, as follows:

df$newCol <- apply(df[c('A','B')], 1, min)

How to find if we have two or more equal minimum values in specific columns

If you have a large dataset, the following might be fast. It uses package matrixStats, function rowMins. See this answer.

icol <- grepl("^a", names(mydata))
min_row <- matrixStats::rowMins(as.matrix(mydata[icol]))

mydata$flag <- rowSums(mydata[icol] == min_row) > 1 & min_row < 200

Calculating row means without having to provide column names and selectively removing columns based on each columns sum

With select_if we can select the numeric columns

library(dplyr)
library(matrixStats)
df1 %>%
mutate(Median = select_if(., is.numeric) %>%
as.matrix %>%
rowMedians,
Mean =select_if(., is.numeric) %>%
rowMeans )

Or convert to 'long' format and then do the group by row

library(dplyr)
library(tidyr)
df1 %>%
select_if(is.numeric) %>%
mutate(rn = row_number()) %>%
pivot_longer(cols = -rn) %>%
group_by(rn) %>%
summarise(Median = median(value), Mean = mean(value), Min = min(value)) %>%
select(-rn) %>%
bind_cols(df1, .)
# ID loc1 loc2 loc3 loc4 Median Mean Min
#1 daisy 10 100 0 1000 55 277.5 0
#2 lily 20 200 0 2000 110 555.0 0
#3 rose 30 300 0 3000 165 832.5 0
#4 tulip 40 400 0 4000 220 1110.0 0
#5 poppy 50 500 0 5000 275 1387.5 0
#6 iris 60 600 0 6000 330 1665.0 0
#7 orchid 70 700 0 7000 385 1942.5 0
#8 lotus 80 800 0 8000 440 2220.0 0
#9 crocus 90 900 0 9000 495 2497.5 0

To get the sum of numeric columns and with a condition to sum only if the column sum is greater than 0

df1 %>% 
summarise_if(~is.numeric(.) && sum(.) > 0, sum)
# loc1 loc2 loc4
#1 450 4500 45000

Or using base R

Filter(sum, colSums(df1[-1]))
# loc1 loc2 loc4
# 450 4500 45000

If the intention is to select the columns with sum > 0 and numeric, then use select_if

df1 %>% 
select_if(~ is.numeric(.) && sum(.) > 0)
# loc1 loc2 loc4
#1 10 100 1000
#2 20 200 2000
#3 30 300 3000
#4 40 400 4000
#5 50 500 5000
#6 60 600 6000
#7 70 700 7000
#8 80 800 8000
#9 90 900 9000

Or to include the first column factor as well

df1 %>% 
select_if(~ is.factor(.)|(is.numeric(.) && sum(.) > 0))
# ID loc1 loc2 loc4
#1 daisy 10 100 1000
#2 lily 20 200 2000
#3 rose 30 300 3000
#4 tulip 40 400 4000
#5 poppy 50 500 5000
#6 iris 60 600 6000
#7 orchid 70 700 7000
#8 lotus 80 800 8000
#9 crocus 90 900 9000

Or using the OP's code, we add + 1 to it because the cs was created by removing the first column

df1 %>% 
select(which(cs > 0)+1)

Including the first column

df1 %>% 
select(1, which(cs > 0)+1)

Or remove the first column from 'df1' and then use the code from the OP's post

df1 %>%
select(-1) %>%
select( which(cs > 0))

How to replace NAs with row means if proportion of row-wise NAs is below a certain threshold?

Here is a way to do it all in one chain using dplyr using your supplied data frame.

First create a vector of all column names of interest:

name_col <- colnames(mental)[2:16]

And now use dplyr

library(dplyr)

mental %>%
# First create the column of row means
mutate(somatic_mean = rowMeans(.[name_col], na.rm = TRUE)) %>%
# Now calculate the proportion of NAs
mutate(somatic_na = rowMeans(is.na(.[name_col]))) %>%
# Create this column for filtering out later
mutate(somatic_usable = ifelse(somatic_na < 0.2,
"yes", "no")) %>%
# Make the following replacement on a row basis
rowwise() %>%
mutate_at(vars(name_col), # Designate eligible columns to check for NAs
funs(replace(.,
is.na(.) & somatic_na < 0.2, # Both conditions need to be met
somatic_mean))) %>% # What we are subbing the NAs with
ungroup() # Now ungroup the 'rowwise' in case you need to modify further

Now, if you wanted to only select the entries that have less than 20% NAs, you can pipe the above into the following:

filter(somatic_usable == "yes")

Also of note, if you wanted to instead make the condition less than or equal to 20%, you would need to replace the two somatic_na < 0.2 with somatic_na <= 0.2.

Hope this helps!

rowmean and standard deviation using data.table

The issues lies in sd() which doesn't work row-wise.

x[,
c("meanY",'sdY',"nY") :=
.(rowMeans(.SD, na.rm = TRUE),
apply(.SD, 1, sd, na.rm = TRUE),
rowSums(!is.na(.SD))),
.SDcols = 2:10]

Min and Median of Multiple Columns of a DF by Row in R

You can use apply like this (the 1 means calculate by row, 2 would calculate by column):

the_min <- apply(df, 1, min)   
the_median <- apply(df, 1, median)
df$Min <- the_min
df$Median <- the_median


Related Topics



Leave a reply



Submit