string split on last comma in R
Here's one approach:
strsplit("UK, USA, Germany", ",(?=[^,]+$)", perl=TRUE)
## [[1]]
## [1] "UK, USA" " Germany"
You may want:
strsplit("UK, USA, Germany", ",\\s*(?=[^,]+$)", perl=TRUE)
## [[1]]
## [1] "UK, USA" "Germany"
As it will match if there is no space after the comma:
strsplit(c("UK, USA, Germany", "UK, USA,Germany"), ",\\s*(?=[^,]+$)", perl=TRUE)
## [[1]]
## [1] "UK, USA" "Germany"
##
## [[2]]
## [1] "UK, USA" "Germany"
Regex Question: separate string at the last comma in string
Use
sep=",(?=[^,]*$)"
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
, ','
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[^,]* any character except: ',' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of look-ahead
split string last delimiter
These use no packages. They assume that each element of col2
has at least one underscore. (See note if lifting this restriction is needed.)
1) The first regular expression (.*)_
matches everything up to the last underscore followed by everything remaining .*
and the first sub
replaces the entire match with the matched part within parens. This works because such matches are greedy so the first .*
will take everything it can leaving the rest for the second .*
. The second regular expression matches everything up to the last underscore and the second sub
replaces that with the empty string.
transform(df, col2 = sub("(.*)_.*", "\\1", col2), col3 = sub(".*_", "", col2))
2) Here is a variation that is a bit more symmetric. It uses the same regular expression for both sub
calls.
pat <- "(.*)_(.*)"
transform(df, col2 = sub(pat, "\\1", col2), col3 = sub(pat, "\\2", col2))
Note: If we did want to handle strings with no underscore at all such that "xyz" is split into "xyz" and "" then use this for the second sub
. It tries to match the left hand side of the | first and if that fails (which will occur if there are no underscores) then the entire string will match the right hand side and sub
will replace that with the empty string.
sub(".*_|^[^_]*$", "", col2)
Split column and take the last object of the resulting list in R
Try:
> sapply(strsplit(col1, ", ", fixed=TRUE), tail, 1)
[1] "3" "11" "8"
If your column is not already a character vector, wrap col1
with as.character
.
Split comma delimited string
strsplit gives you back a list of the character vectors, so if you want it in a single vector, use unlist as well.
So,
unlist(strsplit(string, ","))
Split Character String Using Only Last Delimiter in r
A solution based on stringi
and data.table
: reverse the string and split it into fixed items and then reverse back:
library(stringi)
x <- c('foo - bar', 'hey-now-man', 'say-now-girl', 'fine-now')
lapply(stri_split_regex(stri_reverse(x), pattern = '[-\\s]+', n = 2), stri_reverse)
If we want to make a data.frame
with this:
y <- lapply(stri_split_regex(stri_reverse(x), pattern = '[-\\s]+', n = 2), stri_reverse)
y <- setNames(data.table::transpose(y)[2:1], c('output1', 'output2'))
df <- as.data.frame(c(list(input = x), y))
# > df
# input output1 output2
# 1 foo - bar foo bar
# 2 hey-now-man hey-now man
# 3 say-now-girl say-now girl
# 4 fine-now fine now
Comma separated string split
Try:
a <- c("1,2,3","344")
scan(text = a, sep = ",", what = "")
# [1] "1" "2" "3" "344"
Extract the last two strings of words separated by the last comma
[a-z A-Z]+, [a-z A-Z]+$
This might work
Related Topics
Ggplot2: Coloring Axis Text on a Faceted Plot
Grid.Arrange Using List of Plots
Plotting Dose Response Curves with Ggplot2 and Drc
R: Serialize Objects to Text File and Back Again
A^K for Matrix Multiplication in R
Check If R Package Is Installed Then Load Library
How to Pass Pandoc_Args to Yaml Header in Rmarkdown
Understanding Dates/Times (Posixc and Posixct) in R
How to Apply a Hierarchical or K-Means Cluster Analysis Using R
Set Upper Limit in Ggplot to Include Label Greater Than the Maximum Value
Ggplot2 Aes_String() Fails to Handle Names Starting with Numbers or Containing Spaces
Divide Each Data Frame Row by Vector in R
Arrange_() Multiple Columns with Descending Order