Delete rows in data frame if entry appears fewer than x times
You can use ave
like this:
df[as.numeric(ave(df$Name, df$Name, FUN=length)) >= 2, ]
# Name Age ZipCode
# 1 Joe 16 60559
# 3 Bob 64 94127
# 4 Joe 23 94122
# 5 Bob 45 25462
This answer assumes that df$Name
is a character
vector, not a factor
vector.
You can also continue with table
as follows:
x <- table(df$Name)
df[df$Name %in% names(x[x >= 2]), ]
# Name Age ZipCode
# 1 Joe 16 60559
# 3 Bob 64 94127
# 4 Joe 23 94122
# 5 Bob 45 25462
Delete columns that had more than 30% of repeated values or more than 1% of values outside the range defined by the mean +- 2.5 SD in r
Write a function which incorporates all the rules you want to use to delete a column.
remove_col <- function(x) {
tab <- table(x)
sd <- sd(x)
mn <- mean(x)
!(mean(x %in% names(tab[tab > 1])) > 0.3 ||
sum(x > mn + 2.5 * sd | x < mn - 2.5 * sd) > 0.01*length(x))
}
Use it with Filter
.
Filter(remove_col, DF)
# x3
#1 4.0
#2 2.0
#3 3.0
#4 4.0
#5 5.0
#6 4.2
#7 4.6
#8 2.2
#9 2.7
#10 2.8
How to delete groups containing less than 3 rows of data in R?
One way to do it is to use the magic n()
function within filter
:
library(dplyr)
my_data <- data.frame(Year=1996, Site="A", Brood=c(1,1,2,2,2))
my_data %>%
group_by(Year, Site, Brood) %>%
filter(n() >= 3)
The n()
function gives the number of rows in the current group (or the number of rows total if there is no grouping).
Remove rows which have less than string into a specific column
If you want to use tidyverse packages you could use:
library(dplyr)
library(stringr)
dd %>% filter(str_count(text, " ") >= 3)
Here we assume that "less than 4 strings" means less than 3 spaces. By counting characters, you can have a much more efficient solution than actually going though the work of splitting the string up and allocating the memory for the separate pieces when you don't really need them.
Delete rows in R if a cell contains a value larger than x
rowSums
of the logical matrix df > 7
gives the number of 'TRUE' per each row. We get '0' if there are no 'TRUE' for that particular row. By negating the results, '0' will change to 'TRUE", and all other values not equal to 0 will be FALSE. This can be used for subsetting.
df[!rowSums(df >7),]
# a b c
#2 6 6 5
#4 7 4 7
For the 'V2', we use the same principle except that we are getting the logical matrix on a subset of 'df'. ie. selecting only the second and third columns.
df[!rowSums(df[-1] >7),]
# a b c
#2 6 6 5
#3 99 3 6
#4 7 4 7
#6 9 6 3
Related Topics
Boxplot, How to Match Outliers' Color to Fill Aesthetics
Retain Attributes When Using Gather from Tidyr (Attributes Are Not Identical)
Combine Multiple .Rdata Files Containing Objects with the Same Name into One Single .Rdata File
Flatten Nested List into 1-Deep List
Subset a Data.Frame with Multiple Conditions
Suppress Automatic Output to Console in R
Randomly Sample Data Frame into 3 Groups in R
How to Write Special Characters in Rmarkdown Latex Documents
How to Add a Title to Legend Scale Using Levelplot in R
Generating a Color Legend with Shifted Labels Using Ggplot2
R Plots: How to Draw a Border, Shadow or Buffer Around Text Labels
How to Create a Variable of Rownames
R Ggplot2 Boxplots - Ggpubr Stat_Compare_Means Not Working Properly
How to Merge Two Data Frames in R by a Common Column with Mismatched Date/Time Values
Setting Column Width in R Shiny Datatable Does Not Work in Case of Lots of Column
How to Convert a Character String Date to Date Class If Day Value Is Missing