Equivalent to rowMeans() for min()
You could use pmin
, but you would have to get each column of your matrix into a separate vector. One way to do that is to convert it to a data.frame then call pmin
via do.call
(since data.frames are lists).
system.time(do.call(pmin, as.data.frame(m)))
# user system elapsed
# 0.940 0.000 0.949
system.time(apply(m,1,min))
# user system elapsed
# 16.84 0.00 16.95
How do I add a column to a data frame consisting of minimum values from other columns?
You can use apply()
function to do this. See below.
df$C <- apply(df, 1, min)
The second argument allows you to choose the dimension in which you want min
to be applied, in this case 1, applies min to all columns in each row separately.
You can choose specific columns from the dataframe, as follows:
df$newCol <- apply(df[c('A','B')], 1, min)
How to find if we have two or more equal minimum values in specific columns
If you have a large dataset, the following might be fast. It uses package matrixStats
, function rowMins
. See this answer.
icol <- grepl("^a", names(mydata))
min_row <- matrixStats::rowMins(as.matrix(mydata[icol]))
mydata$flag <- rowSums(mydata[icol] == min_row) > 1 & min_row < 200
Calculating row means without having to provide column names and selectively removing columns based on each columns sum
With select_if
we can select the numeric columns
library(dplyr)
library(matrixStats)
df1 %>%
mutate(Median = select_if(., is.numeric) %>%
as.matrix %>%
rowMedians,
Mean =select_if(., is.numeric) %>%
rowMeans )
Or convert to 'long' format and then do the group by row
library(dplyr)
library(tidyr)
df1 %>%
select_if(is.numeric) %>%
mutate(rn = row_number()) %>%
pivot_longer(cols = -rn) %>%
group_by(rn) %>%
summarise(Median = median(value), Mean = mean(value), Min = min(value)) %>%
select(-rn) %>%
bind_cols(df1, .)
# ID loc1 loc2 loc3 loc4 Median Mean Min
#1 daisy 10 100 0 1000 55 277.5 0
#2 lily 20 200 0 2000 110 555.0 0
#3 rose 30 300 0 3000 165 832.5 0
#4 tulip 40 400 0 4000 220 1110.0 0
#5 poppy 50 500 0 5000 275 1387.5 0
#6 iris 60 600 0 6000 330 1665.0 0
#7 orchid 70 700 0 7000 385 1942.5 0
#8 lotus 80 800 0 8000 440 2220.0 0
#9 crocus 90 900 0 9000 495 2497.5 0
To get the sum of numeric
columns and with a condition to sum
only if the column sum is greater than 0
df1 %>%
summarise_if(~is.numeric(.) && sum(.) > 0, sum)
# loc1 loc2 loc4
#1 450 4500 45000
Or using base R
Filter(sum, colSums(df1[-1]))
# loc1 loc2 loc4
# 450 4500 45000
If the intention is to select
the columns with sum
> 0 and numeric
, then use select_if
df1 %>%
select_if(~ is.numeric(.) && sum(.) > 0)
# loc1 loc2 loc4
#1 10 100 1000
#2 20 200 2000
#3 30 300 3000
#4 40 400 4000
#5 50 500 5000
#6 60 600 6000
#7 70 700 7000
#8 80 800 8000
#9 90 900 9000
Or to include the first column factor
as well
df1 %>%
select_if(~ is.factor(.)|(is.numeric(.) && sum(.) > 0))
# ID loc1 loc2 loc4
#1 daisy 10 100 1000
#2 lily 20 200 2000
#3 rose 30 300 3000
#4 tulip 40 400 4000
#5 poppy 50 500 5000
#6 iris 60 600 6000
#7 orchid 70 700 7000
#8 lotus 80 800 8000
#9 crocus 90 900 9000
Or using the OP's code, we add + 1
to it because the cs
was created by removing the first column
df1 %>%
select(which(cs > 0)+1)
Including the first column
df1 %>%
select(1, which(cs > 0)+1)
Or remove the first column from 'df1' and then use the code from the OP's post
df1 %>%
select(-1) %>%
select( which(cs > 0))
How to replace NAs with row means if proportion of row-wise NAs is below a certain threshold?
Here is a way to do it all in one chain using dplyr
using your supplied data frame.
First create a vector of all column names of interest:
name_col <- colnames(mental)[2:16]
And now use dplyr
library(dplyr)
mental %>%
# First create the column of row means
mutate(somatic_mean = rowMeans(.[name_col], na.rm = TRUE)) %>%
# Now calculate the proportion of NAs
mutate(somatic_na = rowMeans(is.na(.[name_col]))) %>%
# Create this column for filtering out later
mutate(somatic_usable = ifelse(somatic_na < 0.2,
"yes", "no")) %>%
# Make the following replacement on a row basis
rowwise() %>%
mutate_at(vars(name_col), # Designate eligible columns to check for NAs
funs(replace(.,
is.na(.) & somatic_na < 0.2, # Both conditions need to be met
somatic_mean))) %>% # What we are subbing the NAs with
ungroup() # Now ungroup the 'rowwise' in case you need to modify further
Now, if you wanted to only select the entries that have less than 20% NAs, you can pipe the above into the following:
filter(somatic_usable == "yes")
Also of note, if you wanted to instead make the condition less than or equal to 20%, you would need to replace the two somatic_na < 0.2
with somatic_na <= 0.2
.
Hope this helps!
rowmean and standard deviation using data.table
The issues lies in sd()
which doesn't work row-wise.
x[,
c("meanY",'sdY',"nY") :=
.(rowMeans(.SD, na.rm = TRUE),
apply(.SD, 1, sd, na.rm = TRUE),
rowSums(!is.na(.SD))),
.SDcols = 2:10]
Min and Median of Multiple Columns of a DF by Row in R
You can use apply
like this (the 1 means calculate by row, 2 would calculate by column):
the_min <- apply(df, 1, min)
the_median <- apply(df, 1, median)
df$Min <- the_min
df$Median <- the_median
Related Topics
How to Not Show All Labels on Ggplot Axis
Optimized Rolling Functions on Irregular Time Series with Time-Based Window
R Plot Filled Longitude-Latitude Grid Cells on Map
R: Sample() Command Subject to a Constraint
Dplyr Replacing Na Values in a Column Based on Multiple Conditions
What Does the R Function 'Poly' Really Do
Use Expression with a Variable R
Format Numbers to Significant Figures Nicely in R
How to Calculate Time Difference with Previous Row of a Data.Frame by Group
What Evaluates to True/False in R
Force Ggplot Legend to Show All Categories When No Values Are Present
Plotting During a Loop in Rstudio
Operations on Multiple Tables/Datasets with Edit Queries and R in Power Bi