R how to remove rows in a data frame based on the first character of a column
We can use grep
. The regex ^
indicates the beginning of the string. We match numeric element ([0-9]
) at the beginning of the string in the 'y' column using grep
. The output will be numeric index, which we use to subset the rows of the 'abc'.
abc[grep('^[0-9]', abc$y),]
# y z
#1 34TA912 23.12.2015
#4 34CC515 25.12.2015
How can i remove rows by condition (initial letters) in r?
You can also convert the data to a data frame and simply use filter()
by the dpylr
package. In addition to your code, you can have a solution like this:
Data <- data.frame(Data)
Then assign the filtered data to another dataframe.
DataFiltered <- Data %>% filter(Abbreviation %like% "SR_")
Similarly, you can use str_detect()
function from stringr
package for filtering. This works better, if there is a possibility that Abbreviation has entities having 'SR_' not only as the first 3 characters. You can use RegEx to specify that each entity in filtered data must have Abbreviation starting with 'SR_'.
DataFiltered <- Data %>% filter(str_detect(Abbreviation, pattern = "^SR_"))
Remove Rows occurring after a String R Data frame
We can subset with row_number
and which
library(dplyr)
df %>% filter(row_number() < which(A=='total'))
A B
1 Bob Smith 01005
2 Carl Jones 01008
3 Syndey Lewis 01185
Remove first X characters in a data.frame column
- Use
df$c1 <- gsub("Size=", "", df$c1)
on both columns to remove the characters - Use
df$c1 <- as.numeric(as.character(df$c1))
on both columns to change to numeric df$c3 <- df$c1*df$c2
will work to multiply the columns and create a new column of answers
How to remove the first three characters from every row in a column in R
You can do it with gsub
function and simple regex. Here is the code:
# Fake data frame
df <- data.frame(text_col = c("abcd", "abcde", "abcdef"))
df$text_col <- as.character(df$text_col)
# Replace first 3 chracters with empty string ""
df$text_col <- gsub("^.{0,3}", "", df$text_col)
R: delete rows that have two columns with the same first digit
You are using str_sub()
with the default end
parameter, which takes you to the end of the string:. Try this:
x.dist %>% filter(str_sub(county1,1,1)!=str_sub(county2,1,1))
Removing rows from dataframe that contains string in a particular column
There are multiple ways you can do this :
Convert to numeric and remove NA
values
subset(df, !is.na(as.numeric(Score)))
# ID Score
#1 1001 4
#2 1002 20
#5 1005 30
Or with grepl
find if there are any non-numeric characters in them and remove them
subset(df, !grepl('\\D', Score))
This can be done with grep
as well.
df[grep('\\D', df$Score, invert = TRUE), ]
data
df <- structure(list(ID = 1001:1005, Score = c("4", "20", "h", "v",
"30")), class = "data.frame", row.names = c(NA, -5L))
Remove Rows From Data Frame where a Row matches a String
Just use the ==
with the negation symbol (!
). If dtfm is the name of your data.frame:
dtfm[!dtfm$C == "Foo", ]
Or, to move the negation in the comparison:
dtfm[dtfm$C != "Foo", ]
Or, even shorter using subset()
:
subset(dtfm, C!="Foo")
How to remove rows from a data.frame containing a symbol in a particular column
grep!
mydata <- read.table(textConnection("session first last city
9cf571c8faa67cad2aa9ff41f3a26e38 cat biddix fresno
e30f853d4e54604fd62858badb68113a caleb+joey amos blah
63a5e839510a647c1ff3b8aed684c2a5 me+you amos blah"), header=T, stringsAsFactors=FALSE)
grep("\\+",mydata$first)
which returns
[1] 2 3
Telling you in column 2, rows 2 and 3 have a '+' in them.
So you could run:
mydata <- mydata[-grep("\\+",mydata$first),]
mydata
And those entire rows would be deleted. Not sure if it's a typo in your question or not but you say you want to "remove the rows in the first column", do you mean the entries or the entire row?
Delete rows containing specific strings in R
This should do the trick:
df[- grep("REVERSE", df$Name),]
Or a safer version would be:
df[!grepl("REVERSE", df$Name),]
Related Topics
How to Fill Histogram with Color Gradient
How to Convert Unix Timestamp (Milliseconds) and Timezone in R
Knitr Compile Problems with Rstudio (Windows)
Using Grep to Subset Rows from a Data.Table, Comparing Row Content
Overlapping the Predicted Time Series on the Original Series in R
How to Make Shiny's Input$Var Consumable for Dplyr::Summarise()
Sample Function Gives Different Result in Console and in Knitted Document When Seed Is Set
Rolling Join Grouped by a Second Variable in Data.Table
How to Manage a Table/Matrix to Obtain Information Using Conditions
Rstudio Calls Source() When Saving Script
Error in Bind_Rows_(X, .Id):Argument 1 Must Have Names
Convert Vector to Matrix Without Recycling
How to Split a Data Frame Among Columns, Say at Every Nth Column
R - How to Get a Value of a Multi-Dimensional Array by a Vector of Indices