Remove rows conditionally from a data.table in R
In this scenario it is not so different than data.frame
data <- data[ menuitem != 'coffee' | amount > 0]
Delete/add row by reference it is to be implemented. You find more info in this question
Regarding speed:
1 You can benefit from keys by doing something like:
setkey(data, menuitem)
data <- data[!"coffee"]
which will be faster than data <- data[ menuitem != 'coffee']
. However to apply the same filters you asked in the question you'll need a rolling join (I've finished my lunch break I can add something later :-)).
2 Even without key data.table is much faster for relatively big table (similar speed for handful amount of rows)
dt<-data.table(id=sample(letters,1000000,T),var=rnorm(1000000))
df<-data.frame(id=sample(letters,1000000,T),var=rnorm(1000000))
library(microbenchmark)
> microbenchmark(dt[ id == "a"], df[ df$id == "a",])
Unit: milliseconds
expr min lq median uq max neval
dt[id == "a"] 24.42193 25.74296 26.00996 26.35778 27.36355 100
df[df$id == "a", ] 138.17500 146.46729 147.38646 149.06766 154.10051 100
Delete row from data.frame based on condition
Thanks @Simon for the suggestions. One criteria I wanted was that the code made sense as I "read" it. As I thought more, another criteria is that I wanted to be deliberate about what changes to make. So I incorporated Simon's recommendation to make a separate column and then use dplyr::filter()
to exclude those variables. Here's what an example segment of code looked like:
#Change pre/post entries
data[data$UserID == 52118254, "Prepost"][2] <- 2
#Mark rows to delete
data$toDelete <- NA #Makes new empty column for marking deletions
data[data$UserID == 52118284,][2, "toDelete"] <- 1 #Marks row for deletion
#Filter to exclude rows
data %>% filter(is.na(toDelete))
#Optionally add "%>% select(-toDelete)" to remove the extra column
In my context, advantages here are that everything is deliberate rather than automatic and changes are anchored to data rather than row numbers that might change. I'd still welcome any feedback or other ways of achieving this (maybe in a single step).
Remove Rows From Data Frame where a Row matches a String
Just use the ==
with the negation symbol (!
). If dtfm is the name of your data.frame:
dtfm[!dtfm$C == "Foo", ]
Or, to move the negation in the comparison:
dtfm[dtfm$C != "Foo", ]
Or, even shorter using subset()
:
subset(dtfm, C!="Foo")
Removing rows from a data frame until a condition is met
Your while-loop doesn't redefine block2_df
. This should work:
while (dim(block_2_df)[1]>1) {
block_2_df <- remove_fun(block_2_df)
}
Delete rows that exist in another data frame?
You need the %in%
operator. So,
df1[!(df1$name %in% df2$name),]
should give you what you want.
df1$name %in% df2$name
tests whether the values indf1$name
are indf2$name
- The
!
operator reverses the result.
R: Deleting rows based on a value in a column from a large data set in R
I suggest you learn how to use dplyr
, and other packages in the tidyverse
. I find them to be an indispensable tool in cleaning data.
Here's how I would use dplyr
to filter out both Texas and New York in your data set:
library(dplyr)
customers = filter(customers, State != "TX" & State != "NY")
Alternatively,
customers = filter(customers, !(State %in% c("TX", "NY")))
How to remove row if it has a NA value in one certain column
The easiest solution is to use is.na()
:
df[!is.na(df$B), ]
which gives you:
A B C
1 NA 2 NA
2 1 2 3
4 1 2 3
R- Remove several rows based on a value
by(df,df$Year,function(x)x[!colSums(is.na(x))])
df$Year: 1980
Year Month stn1
1 1980 1 8
2 1980 2 4
3 1980 3 6
4 1980 4 3
5 1980 5 0
6 1980 6 1
7 1980 7 3
8 1980 8 6
9 1980 9 1
10 1980 10 2
11 1980 11 1
12 1980 12 4
------------------------------------------------------------------
df$Year: 1981
Year Month stn2
13 1981 1 4
14 1981 2 7
15 1981 3 9
16 1981 4 1
17 1981 5 2
18 1981 6 6
19 1981 7 9
20 1981 8 8
21 1981 9 5
22 1981 10 1
23 1981 11 3
24 1981 12 2
Related Topics
How to Convert a Data Frame Column to Numeric Type
Remove Total Value for One Column in Powerbi
How to Change Y Axis Limits in Decimal Points in R
Faster Ways to Calculate Frequencies and Cast from Long to Wide
How to Select the Rows With Maximum Values in Each Group With Dplyr
How to Create a Lag Variable Within Each Group
Filter Multiple Values on a String Column in Dplyr
What Exactly Is Copy-On-Modify Semantics in R, and Where Is the Canonical Source
Changing Column Names of a Data Frame
How to Show Code But Hide Output in Rmarkdown
Duplicate Columns in Spark Dataframe
Change Rows into Columns in R With Values Yes/No (1/0)
Add Legend to Ggplot2 Line Plot
Summarizing Multiple Columns With Dplyr
Why Are My Dplyr Group_By & Summarize Not Working Properly? (Name-Collision With Plyr)