Remove Part of a String in Dataframe Column (R)

Remove part of a string in dataframe column (R)

If your string is not too fancy/complex, it might be easiest to do something like:

gsub("C([0-9]+)_.*", "\\1", df$Col2)
# [1] "607989" "607989" "607989" "607989" "607989" "607989"

Start with a "C", followed by digits, followed by an underscore and then anything else. Digits are captured with (), and the replacement is set to that capture group (\\1).

How to remove part of string value in column using stringr library?

You can use str_remove which is shortcut for str_replace with empty replacement ("")

as.numeric(stringr::str_remove(df$column1, "value-"))

Or in base R -

as.numeric(sub("value-", "", df$column1))

How to remove part of characters in data frame column

There are multiple ways of doing this:

  1. Using as.numeric on a column of your choice.
raw$Zipcode <- as.numeric(raw$Zipcode)

  1. If you want it to be a character then you can use stringr package.
library(stringr)
raw$Zipcode <- str_replace(raw$Zipcode, "^0+" ,"")

  1. There is another function called str_remove in stringr package.
raw$Zipcode <- str_remove(raw$Zipcode, "^0+")

  1. You can also use sub from base R.
raw$Zipcode <- sub("^0+", "", raw$Zipcode)

But if you want to remove n number of leading zeroes, replace + with {n} to remove them.

For instance to remove two 0's use sub("^0{2}", "", raw$Zipcode).

How to remove part of a string in a column of dataframe in R?

You can do this with sub and a regular expression.

df$S = sub("\\|.*", "", as.character(df$S))
df
S S1 S2 S3 S4
1 100130426 0.0000 0.0000 0.9066 0.0000
2 100133144 16.3644 9.2659 11.6228 12.0894
3 100134869 12.9316 17.3790 9.2294 11.0799
4 3457 1910.3000 2453.5000 2695.3700 1372.3624
5 9834 1660.1300 857.3000 1240.5300 1434.6463
6 ATP5L2 0.0000 0.0000 0.9066 0.0000
7 ATP5L 1510.2900 1270.7900 2965.5400 2397.1866
8 ATP5O 2176.1700 1868.9500 2004.5300 2360.3641

Details:

sub substitutes the second argument for whatever matches the first argument. In this case, we want | and everything after it. You can't just write | because that has a special meaning in regular expressions so you "escape" it with by writing \\|. It is followed by .*. The . means "any character" and * means any number of times, so together \\|.* means | followed by any number of characters. We replace that with the empty string "". We apply this operation to as.character(df$S) because your error message makes it look like your variable df$S may be a factor, rather than a string.

How can I remove parts of string based on other column in R?

Replace empty patterns with ^$

dt$ToRemove[dt$ToRemove == ''] <- '^$'

and then use stringr::str_remove which is vectorised.

dt$result <- stringr::str_remove(dt$SomeText, dt$ToRemove)
dt
# SomeText ToRemove result
#1 ABCDEF A BCDEF
#2 ABCDEF CDE ABF
#3 ABCDEF ^$ ABCDEF

r- how to remove a particular string from column values

As an example see this process:

# example data
x = c("Full Name A B", "Full Name F B")
y = c("Playing role G G", "Playing role G M")
dt = data.frame(x,y)

dt

# x y
# 1 Full Name A B Playing role G G
# 2 Full Name F B Playing role G M

library(dplyr)

dt %>% mutate_all(~gsub("Full Name |Playing role |Batting style |Bowling style ", "", .))

# x y
# 1 A B G G
# 2 F B G M

How to remove a specific character within a column in r?

You can use the gsub function. Use a ^ to indicate the start of the string so that you don't remove 0s elsewhere.

x$VARIANT_ID <- gsub("^0", "", x$VARIANT_ID)

Removing some text string and characters from a column in dataframe in R

We can match the .(\\. - escaped as it is a metacharacter that matches any character) and one or more digits (\\d+) till the end ($) of the string and replace with blank ("") and wrap with gsub to match the backquote ("`") and remove it

df$Regression <- gsub("`", "", sub("\\.\\d+$", '', df$Regression))
df$Regression
[1] "TLC~7_A" "TLC~7_A" "TLC~7_A" "TLC~7_A" "TLC~7_A" "TLC~7_A"


Related Topics



Leave a reply



Submit