Split Delimited Single Value Character Vector

Split delimited single value character vector

You can use strsplit for that:

wkdays <- "Mon,Tue,Wed,Thu,Fri"
unlist(strsplit(wkdays, ","))

this gives:

> unlist(strsplit(wkdays, ","))
[1] "Mon" "Tue" "Wed" "Thu" "Fri"

An alternative is to use scan:

scan(text = wkdays, sep = ",", what = character())

which gives the same result.

Split comma delimited string

strsplit gives you back a list of the character vectors, so if you want it in a single vector, use unlist as well.
So,

    unlist(strsplit(string, ","))

Split all values in column and store them in a single numeric vector

You already have the answer in your code since all the functions you are using are vectorised.

v <- as.numeric(na.omit(unlist(strsplit(df$col, ','))))
v
#[1] 2 6 10 5 10 1

Split comma delimited string

strsplit gives you back a list of the character vectors, so if you want it in a single vector, use unlist as well.
So,

    unlist(strsplit(string, ","))

How to split an element in a character vector and insert it as a new element?

You are on the right path with strsplit, just use unlist to get a vector.

> unlist(strsplit(string, ","))
[1] "Eric" "John" "Dora" "Michael" " James" "Susan"

In R, split a character vector by a specific character; save 3rd piece in new vector

strsplit creates a list, so I would try the following:

lapply(strsplit(oss$id, split='_', fixed=TRUE), `[`, 3) ## Output a list
sapply(strsplit(oss$id, split='_', fixed=TRUE), `[`, 3) ## Output a vector (even though a list is also a vector)

The [ means to extract the third element. If you prefer a vector, substitute lapply with sapply.

Here's an example:

mystring <- c("A_B_C", "D_E_F")

lapply(strsplit(mystring, "_"), `[`, 3)
# [[1]]
# [1] "C"
#
# [[2]]
# [1] "F"
sapply(strsplit(mystring, "_"), `[`, 3)
# [1] "C" "F"

If there is an easily definable pattern, gsub might be a good option too, and avoids splitting. See the comments for improved (more robust) versions along the same lines from DWin and Josh O'Brien.

gsub(".*_.*_(.*)", "\\1", mystring)
# [1] "C" "F"

And, finally, just for fun, you can expand on the unlist approach to make it work by recycling a vector of TRUEs and FALSEs to extract every third item (since we know in advance that all the splits will result in an identical structure).

unlist(strsplit(mystring, "_"), use.names = FALSE)[c(FALSE, FALSE, TRUE)]
# [1] "C" "F"

If you're extracting not by numeric position, but just looking to extract the last value after a delimiter, you have a few different alternatives.

Use a greedy regex:

gsub(".*_(.*)", "\\1", mystring)
# [1] "C" "F"

Use a convenience function like stri_extract* from the "stringi" package:

library(stringi)
stri_extract_last_regex(mystring, "[A-Z]+")
# [1] "C" "F"

R - Splitting character vector so that every unique element is added to a new character vector

Your post title suggests you want unique strings, so

unique(unlist(strsplit(myvec, split=",")))

or

unique(unlist(strsplit(myvec, split=", ")))

if you always have a space following the comma.

Efficiently splitting a character vector

The following uses only base R. Append a semicolon onto each record, split the records at semicolon, remove leading and trailing whitespace, replace the space with a colon and space and read in using read.dcf. This gives a matrix m which we convert to a data frame and use type.convert to get the right types. (If a matrix is sufficient then omit the second line.)

m <- read.dcf(textConnection(sub(" ",": ",trimws(unlist(strsplit(paste0(vec, ";"),";"))))))
as.data.frame(lapply(as.data.frame(m, stringsAsFactors = FALSE), type.convert))

giving:

  id sex age type
1 a m 16 1
2 a m 16 NA
3 a m 16 3

Split a column of character vectors and return a list

We can split the Variable column at "," and get all the values and select only the unique ones.

unique(unlist(strsplit(df$Variable, ",")))
#[1] "a" "b" "c"

If the Variable column is factor convert it into character before using strsplit.

Convert delimited string to numeric vector in dataframe

Here's some sample data that reproduces your error:

data <- data.frame(a = 1:3, 
badColumn = c("10,20,30,40,50", "1,2,3,4,5,6", "9,8,7,6,5,4,3"),
stringsAsFactors = FALSE)

Here's the error:

library(tidyverse)
mutate(data, converted = as.numeric(unlist(strsplit(badColumn, split=","))))
# Error in mutate_impl(.data, dots) :
# Column `converted` must be length 3 (the number of rows) or one, not 18

A straightforward way would be to just use strsplit on the entire column, and lapply ... as.numeric to convert the resulting list values from character vectors to numeric vectors.

x <- mutate(data, converted = lapply(strsplit(badColumn, ",", TRUE), as.numeric))
str(x)
# 'data.frame': 3 obs. of 3 variables:
# $ a : int 1 2 3
# $ badColumn: chr "10,20,30,40,50" "1,2,3,4,5,6" "9,8,7,6,5,4,3"
# $ converted:List of 3
# ..$ : num 10 20 30 40 50
# ..$ : num 1 2 3 4 5 6
# ..$ : num 9 8 7 6 5 4 3


Related Topics



Leave a reply



Submit