Regex to Remove All Non-Digit Symbols from String in R

How can I remove non-numeric characters from strings using gsub in R?

Simply use

gsub("[^0-9.-]", "", x)

You can in case of multiple - and . have a second regEx dealing with that.
If you struggle with it, open a new question.


(Make sure to change . with , if needed)

Regex to remove all non-digit symbols from string in R

Remove all non-digit symbols:

list <- c("1010.1-1", "1010.2-1", "1010.3-1", "1030-1", "1040-1", "1060.1-1", "1060.2-1", "1070-1", "1100.1-1", "1100.2-1")
as.numeric(gsub("\\D+", "", list))
## => [1] 101011 101021 101031 10301 10401 106011 106021 10701 110011 110021

See the R demo online

Remove non-numeric characters within parantheses

How about substituting

(?:\(([^)\d]+)\)(.*?))?\([^\d)]*(\d{5,6})[^\d)]*\)

to

$1$2($3)
  • (?:\(([^)\d]+)\)(.*?))? the first optional part captures any preceding parenthesized stuff to $1. Anything that might follow before the parenthesized 5-6 digit part is captured to $2
  • \([^\d)]*(\d{5,6})[^\d)]*\) the second part captures the 5-6 digits to $3

See the demo at regex101


In r using gsub:

gsub(pattern='(?:\\(([^)\\d]+)\\)(.*?))?\\([^\\d)(]*(\\d{5,6})[^\\d)(]*\\)', 
replacement='\\1\\2(\\3)',
x=text,
perl=TRUE, fixed = FALSE)

Regex to remove all (non numeric OR period)

This should do it:

string s = "joe ($3,004.50)";
s = Regex.Replace(s, "[^0-9.]", "");

Remove non numeric values from vector in r

A simple solution is to use Filter over vec <- list(1, 2, T, 'x', 'abc', '6', 7, F, F, 10), i.e.,

> unlist(Filter(is.numeric,vec))
[1] 1 2 7 10

How to replace all non numeric character from a string except any NewLine (\n) using regex?

All characters except newline and digits is pretty straight-forward.

Regex.Replace(text, "[^\r\n0-9]", "")

Newline on Windows is CR (\r) and LF (\n). 0-9 can also be written as \d.

remove non-digits except E+ and E- in string

You may extract the numbers using the following regex:

[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?

Details

  • [-+]? - either + or -
  • [0-9]* - 0+ digits
  • \.? - an optional .
  • [0-9]+ - 1+ digits
  • ([eE][-+]?[0-9]+)? - an optional capturing group (add ?: after ( to use a non-capturing group) matching 1 or 0 occurrences of

    • [eE] - e or E
    • [-+]? - an optional - or +
    • [0-9]+ - 1 or more digits

R demo:

vec <- c('1234', '+ 42', '1E+4', 'NR 12', '4.5E+04', '8.6E-02')
res <- regmatches(vec, regexpr("[-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?", vec))
unlist(res)
## => [1] "1234" "42" "1E+4" "12" "4.5E+04" "8.6E-02"

If multiple matches per item in a character vector are expected replace regexpr with gregexpr.

Split string column on non-numeric characters in R

You want to do something like this?

library(dplyr)
library(tidyr)

df %>%
separate(lat,into = paste0("lat",1:4),sep = "[^0-9]",remove = FALSE) %>%
separate(long,into = paste0("long",1:4),sep = "[^0-9]",remove = FALSE)

# A tibble: 4 x 10
lat lat1 lat2 lat3 lat4 long long1 long2 long3 long4
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 "22ª29'56.06\"" 22 29 56 06 "105º21'37.27\"" 105 21 37 27
2 "22°29`53.14\"" 22 29 53 14 "105°21'29.48\"" 105 21 29 48
3 "22º30'00.43\"" 22 30 00 43 "105°21'37.46''" 105 21 37 46
4 "105'29'27.17\"" 105 29 27 17 "105°21'39.68" 105 21 39 68


Related Topics



Leave a reply



Submit