Removing Particular Character in a Column in R

How to remove part of characters in data frame column

There are multiple ways of doing this:

Using as.numeric on a column of your choice.

raw$Zipcode <- as.numeric(raw$Zipcode)

If you want it to be a character then you can use stringr package.

library(stringr)
raw$Zipcode <- str_replace(raw$Zipcode, "^0+" ,"")

There is another function called str_remove in stringr package.

raw$Zipcode <- str_remove(raw$Zipcode, "^0+")

You can also use sub from base R.

raw$Zipcode <- sub("^0+", "", raw$Zipcode)

But if you want to remove n number of leading zeroes, replace + with {n} to remove them.

For instance to remove two 0's use sub("^0{2}", "", raw$Zipcode).

How to remove a specific character within a column in r?

You can use the gsub function. Use a ^ to indicate the start of the string so that you don't remove 0s elsewhere.

x$VARIANT_ID <- gsub("^0", "", x$VARIANT_ID)

How do I remove characters from items in a column?

Instead of .1, escape the . with \\. as it is a metacharacter in regex and can match any character. Here, we need just sub i.e. match once and replace with blank. The pattern below matches the . followed by one or more digits (\\d+) at the end ($) of the string

df$Sample <- sub("\\.\\d+$", "", df$Sample)

How to remove a character (asterisk) in column values in r?

The stringr package has some very handy functions for vectorized string manipulation.

In the following code I replace the * with ''. Note that in R, literals inside the regex have to be preceded by double slashes \\ instead of the usual single slash \.

library(stringr) 
LocationID <- c('*Yukon','*Lewis Rich',  '*Kodiak', 'Kodiak', '*Rays')
AWC <- c(333, 485, 76, 666, 54)
df <- data.frame(LocationID, AWC)

df$location_clean <- stringr::str_replace(df$LocationID, '\\*', '')

Resulting in:

LocationID AWC location_clean
1      *Yukon 333          Yukon
2 *Lewis Rich 485     Lewis Rich
3     *Kodiak  76         Kodiak
4      Kodiak 666         Kodiak
5       *Rays  54           Rays

R Remove string characters from a range of rows in a column

If we want to substring and filter, an option is to use trimws (trims out the characters by default whitespace at either end of the string - if we want only left or right, specify the which by default is 'both') with whitespace as regex i.e. matching zero or more upper case letters followed by zero or more spaces ([A-Z]*\\s*), and then filter the rows where the elements are not blank

library(dplyr)
df %>% 
  mutate(Date = trimws(Date, whitespace = "[A-Z]*\\s*")) %>% 
  filter(nzchar(Date))

-output

       Date Date_Approved
1  1/27/2020     1/28/2020
2  1/29/2020     1/30/2020
3  1/30/2020     1/31/2020
4   2/1/2020      2/2/2020
5   2/9/2020     2/10/2020
6  2/15/2020     2/16/2020
7  2/16/2020     2/17/2020
8  2/17/2020     2/19/2020
9  2/18/2020     2/20/2020
10 2/22/2020     2/23/2020
11 2/25/2020     2/26/2020
12 2/28/2020     2/29/2020

Remove a number of character from string in a column

I think this will work:

library(dplyr)
library(stringr)

df %>%
  mutate(col1 = str_remove(col1, "\\d+(_)"))

  col1
1    A
2    B
3    C

How can I remove certain characters from column headers in R?

We can use sub to match the . (metacharacter - so escape) followed by one or more digits (\\d+) at the end ($) of the string and replace with blank ("")

names(df) <- sub("\\.\\d+$", "", names(df))

NOTE: If the data is data.frame, duplicate column names are not allowed and is not recommended

How to Remove characters that doesn't match the string pattern from a column of a data frame

Just replace strings that don't contain the word "Zimmer"

flat_cl_one$room[!grepl("Zimmer", flat_cl_one$room)] <- ""

flat_cl_one
#>       room
#> 1  3Zimmer
#> 2  2Zimmer
#> 3  2Zimmer
#> 4  3Zimmer
#> 5         
#> 6         
#> 7  3Zimmer
#> 8  6Zimmer
#> 9  2Zimmer
#> 10 4Zimmer

Data

flat_cl_one <- data.frame(room = c("3Zimmer", "2Zimmer", "2Zimmer", "3Zimmer", 
                                   "9586", "927", "3Zimmer", "6Zimmer", 
                                   "2Zimmer", "4Zimmer"))