Removing Particular Character in a Column in R

How to remove part of characters in data frame column

There are multiple ways of doing this:

  1. Using as.numeric on a column of your choice.
raw$Zipcode <- as.numeric(raw$Zipcode)

  1. If you want it to be a character then you can use stringr package.
library(stringr)
raw$Zipcode <- str_replace(raw$Zipcode, "^0+" ,"")

  1. There is another function called str_remove in stringr package.
raw$Zipcode <- str_remove(raw$Zipcode, "^0+")

  1. You can also use sub from base R.
raw$Zipcode <- sub("^0+", "", raw$Zipcode)

But if you want to remove n number of leading zeroes, replace + with {n} to remove them.

For instance to remove two 0's use sub("^0{2}", "", raw$Zipcode).

How to remove a specific character within a column in r?

You can use the gsub function. Use a ^ to indicate the start of the string so that you don't remove 0s elsewhere.

x$VARIANT_ID <- gsub("^0", "", x$VARIANT_ID)

How do I remove characters from items in a column?

Instead of .1, escape the . with \\. as it is a metacharacter in regex and can match any character. Here, we need just sub i.e. match once and replace with blank. The pattern below matches the . followed by one or more digits (\\d+) at the end ($) of the string

df$Sample <- sub("\\.\\d+$", "", df$Sample)

How to remove a character (asterisk) in column values in r?

The stringr package has some very handy functions for vectorized string manipulation.

In the following code I replace the * with ''. Note that in R, literals inside the regex have to be preceded by double slashes \\ instead of the usual single slash \.

library(stringr) 
LocationID <- c('*Yukon','*Lewis Rich', '*Kodiak', 'Kodiak', '*Rays')
AWC <- c(333, 485, 76, 666, 54)
df <- data.frame(LocationID, AWC)

df$location_clean <- stringr::str_replace(df$LocationID, '\\*', '')

Resulting in:

LocationID AWC location_clean
1 *Yukon 333 Yukon
2 *Lewis Rich 485 Lewis Rich
3 *Kodiak 76 Kodiak
4 Kodiak 666 Kodiak
5 *Rays 54 Rays

R Remove string characters from a range of rows in a column

If we want to substring and filter, an option is to use trimws (trims out the characters by default whitespace at either end of the string - if we want only left or right, specify the which by default is 'both') with whitespace as regex i.e. matching zero or more upper case letters followed by zero or more spaces ([A-Z]*\\s*), and then filter the rows where the elements are not blank

library(dplyr)
df %>%
mutate(Date = trimws(Date, whitespace = "[A-Z]*\\s*")) %>%
filter(nzchar(Date))

-output

       Date Date_Approved
1 1/27/2020 1/28/2020
2 1/29/2020 1/30/2020
3 1/30/2020 1/31/2020
4 2/1/2020 2/2/2020
5 2/9/2020 2/10/2020
6 2/15/2020 2/16/2020
7 2/16/2020 2/17/2020
8 2/17/2020 2/19/2020
9 2/18/2020 2/20/2020
10 2/22/2020 2/23/2020
11 2/25/2020 2/26/2020
12 2/28/2020 2/29/2020

Remove a number of character from string in a column

I think this will work:

library(dplyr)
library(stringr)

df %>%
mutate(col1 = str_remove(col1, "\\d+(_)"))

col1
1 A
2 B
3 C

How can I remove certain characters from column headers in R?

We can use sub to match the . (metacharacter - so escape) followed by one or more digits (\\d+) at the end ($) of the string and replace with blank ("")

names(df) <- sub("\\.\\d+$", "", names(df))

NOTE: If the data is data.frame, duplicate column names are not allowed and is not recommended

How to Remove characters that doesn't match the string pattern from a column of a data frame

Just replace strings that don't contain the word "Zimmer"

flat_cl_one$room[!grepl("Zimmer", flat_cl_one$room)] <- ""

flat_cl_one
#> room
#> 1 3Zimmer
#> 2 2Zimmer
#> 3 2Zimmer
#> 4 3Zimmer
#> 5
#> 6
#> 7 3Zimmer
#> 8 6Zimmer
#> 9 2Zimmer
#> 10 4Zimmer

Data

flat_cl_one <- data.frame(room = c("3Zimmer", "2Zimmer", "2Zimmer", "3Zimmer", 
"9586", "927", "3Zimmer", "6Zimmer",
"2Zimmer", "4Zimmer"))


Related Topics



Leave a reply



Submit