String Split on Last Comma in R

string split on last comma in R

Here's one approach:

strsplit("UK, USA, Germany", ",(?=[^,]+$)", perl=TRUE)

## [[1]]
## [1] "UK, USA" " Germany"

You may want:

strsplit("UK, USA, Germany", ",\\s*(?=[^,]+$)", perl=TRUE)

## [[1]]
## [1] "UK, USA" "Germany"

As it will match if there is no space after the comma:

strsplit(c("UK, USA, Germany", "UK, USA,Germany"), ",\\s*(?=[^,]+$)", perl=TRUE)

## [[1]]
## [1] "UK, USA" "Germany"
## 
## [[2]]
## [1] "UK, USA" "Germany"

Regex Question: separate string at the last comma in string

Use

sep=",(?=[^,]*$)"

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  ,                        ','
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    [^,]*                    any character except: ',' (0 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    $                        before an optional \n, and the end of
                             the string
--------------------------------------------------------------------------------
  )                        end of look-ahead

split string last delimiter

These use no packages. They assume that each element of col2 has at least one underscore. (See note if lifting this restriction is needed.)

1) The first regular expression (.*)_ matches everything up to the last underscore followed by everything remaining .* and the first sub replaces the entire match with the matched part within parens. This works because such matches are greedy so the first .* will take everything it can leaving the rest for the second .* . The second regular expression matches everything up to the last underscore and the second sub replaces that with the empty string.

transform(df, col2 = sub("(.*)_.*", "\\1", col2), col3 = sub(".*_", "", col2))

2) Here is a variation that is a bit more symmetric. It uses the same regular expression for both sub calls.

pat <- "(.*)_(.*)"
transform(df, col2 = sub(pat, "\\1", col2), col3 = sub(pat, "\\2", col2))

Note: If we did want to handle strings with no underscore at all such that "xyz" is split into "xyz" and "" then use this for the second sub. It tries to match the left hand side of the | first and if that fails (which will occur if there are no underscores) then the entire string will match the right hand side and sub will replace that with the empty string.

sub(".*_|^[^_]*$", "", col2)

Split column and take the last object of the resulting list in R

Try:

> sapply(strsplit(col1, ", ", fixed=TRUE), tail, 1)
[1] "3"  "11" "8"

If your column is not already a character vector, wrap col1 with as.character.

Split comma delimited string

strsplit gives you back a list of the character vectors, so if you want it in a single vector, use unlist as well.
So,

    unlist(strsplit(string, ","))

Split Character String Using Only Last Delimiter in r

A solution based on stringi and data.table: reverse the string and split it into fixed items and then reverse back:

library(stringi)
x <- c('foo - bar', 'hey-now-man', 'say-now-girl', 'fine-now')

lapply(stri_split_regex(stri_reverse(x), pattern = '[-\\s]+', n = 2), stri_reverse)

If we want to make a data.frame with this:

y <- lapply(stri_split_regex(stri_reverse(x), pattern = '[-\\s]+', n = 2), stri_reverse)

y <- setNames(data.table::transpose(y)[2:1], c('output1', 'output2'))

df <- as.data.frame(c(list(input = x), y))

# > df
# input output1 output2
# 1    foo - bar     foo     bar
# 2  hey-now-man hey-now     man
# 3 say-now-girl say-now    girl
# 4     fine-now    fine     now

Comma separated string split

Try:

a <- c("1,2,3","344")
scan(text = a, sep = ",", what = "")
# [1] "1"   "2"   "3"   "344"

Extract the last two strings of words separated by the last comma

[a-z A-Z]+, [a-z A-Z]+$

This might work

String Split on Last Comma in R