String Split on Last Comma in R

string split on last comma in R

Here's one approach:

strsplit("UK, USA, Germany", ",(?=[^,]+$)", perl=TRUE)

## [[1]]
## [1] "UK, USA" " Germany"

You may want:

strsplit("UK, USA, Germany", ",\\s*(?=[^,]+$)", perl=TRUE)

## [[1]]
## [1] "UK, USA" "Germany"

As it will match if there is no space after the comma:

strsplit(c("UK, USA, Germany", "UK, USA,Germany"), ",\\s*(?=[^,]+$)", perl=TRUE)

## [[1]]
## [1] "UK, USA" "Germany"
##
## [[2]]
## [1] "UK, USA" "Germany"

Regex Question: separate string at the last comma in string

Use

sep=",(?=[^,]*$)"

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
, ','
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[^,]* any character except: ',' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of look-ahead

split string last delimiter

These use no packages. They assume that each element of col2 has at least one underscore. (See note if lifting this restriction is needed.)

1) The first regular expression (.*)_ matches everything up to the last underscore followed by everything remaining .* and the first sub replaces the entire match with the matched part within parens. This works because such matches are greedy so the first .* will take everything it can leaving the rest for the second .* . The second regular expression matches everything up to the last underscore and the second sub replaces that with the empty string.

transform(df, col2 = sub("(.*)_.*", "\\1", col2), col3 = sub(".*_", "", col2))

2) Here is a variation that is a bit more symmetric. It uses the same regular expression for both sub calls.

pat <- "(.*)_(.*)"
transform(df, col2 = sub(pat, "\\1", col2), col3 = sub(pat, "\\2", col2))

Note: If we did want to handle strings with no underscore at all such that "xyz" is split into "xyz" and "" then use this for the second sub. It tries to match the left hand side of the | first and if that fails (which will occur if there are no underscores) then the entire string will match the right hand side and sub will replace that with the empty string.

sub(".*_|^[^_]*$", "", col2)

Split column and take the last object of the resulting list in R

Try:

> sapply(strsplit(col1, ", ", fixed=TRUE), tail, 1)
[1] "3" "11" "8"

If your column is not already a character vector, wrap col1 with as.character.

Split comma delimited string

strsplit gives you back a list of the character vectors, so if you want it in a single vector, use unlist as well.
So,

    unlist(strsplit(string, ","))

Split Character String Using Only Last Delimiter in r

A solution based on stringi and data.table: reverse the string and split it into fixed items and then reverse back:

library(stringi)
x <- c('foo - bar', 'hey-now-man', 'say-now-girl', 'fine-now')

lapply(stri_split_regex(stri_reverse(x), pattern = '[-\\s]+', n = 2), stri_reverse)

If we want to make a data.frame with this:

y <- lapply(stri_split_regex(stri_reverse(x), pattern = '[-\\s]+', n = 2), stri_reverse)

y <- setNames(data.table::transpose(y)[2:1], c('output1', 'output2'))

df <- as.data.frame(c(list(input = x), y))

# > df
# input output1 output2
# 1 foo - bar foo bar
# 2 hey-now-man hey-now man
# 3 say-now-girl say-now girl
# 4 fine-now fine now

Comma separated string split

Try:

a <- c("1,2,3","344")
scan(text = a, sep = ",", what = "")
# [1] "1" "2" "3" "344"

Extract the last two strings of words separated by the last comma

[a-z A-Z]+, [a-z A-Z]+$

This might work



Related Topics



Leave a reply



Submit