Split delimited single value character vector
You can use strsplit
for that:
wkdays <- "Mon,Tue,Wed,Thu,Fri"
unlist(strsplit(wkdays, ","))
this gives:
> unlist(strsplit(wkdays, ","))
[1] "Mon" "Tue" "Wed" "Thu" "Fri"
An alternative is to use scan
:
scan(text = wkdays, sep = ",", what = character())
which gives the same result.
Split comma delimited string
strsplit gives you back a list of the character vectors, so if you want it in a single vector, use unlist as well.
So,
unlist(strsplit(string, ","))
Split all values in column and store them in a single numeric vector
You already have the answer in your code since all the functions you are using are vectorised.
v <- as.numeric(na.omit(unlist(strsplit(df$col, ','))))
v
#[1] 2 6 10 5 10 1
Split comma delimited string
strsplit gives you back a list of the character vectors, so if you want it in a single vector, use unlist as well.
So,
unlist(strsplit(string, ","))
How to split an element in a character vector and insert it as a new element?
You are on the right path with strsplit
, just use unlist
to get a vector.
> unlist(strsplit(string, ","))
[1] "Eric" "John" "Dora" "Michael" " James" "Susan"
In R, split a character vector by a specific character; save 3rd piece in new vector
strsplit
creates a list, so I would try the following:
lapply(strsplit(oss$id, split='_', fixed=TRUE), `[`, 3) ## Output a list
sapply(strsplit(oss$id, split='_', fixed=TRUE), `[`, 3) ## Output a vector (even though a list is also a vector)
The [
means to extract the third element. If you prefer a vector, substitute lapply
with sapply
.
Here's an example:
mystring <- c("A_B_C", "D_E_F")
lapply(strsplit(mystring, "_"), `[`, 3)
# [[1]]
# [1] "C"
#
# [[2]]
# [1] "F"
sapply(strsplit(mystring, "_"), `[`, 3)
# [1] "C" "F"
If there is an easily definable pattern, gsub
might be a good option too, and avoids splitting. See the comments for improved (more robust) versions along the same lines from DWin and Josh O'Brien.
gsub(".*_.*_(.*)", "\\1", mystring)
# [1] "C" "F"
And, finally, just for fun, you can expand on the unlist
approach to make it work by recycling a vector of TRUE
s and FALSE
s to extract every third item (since we know in advance that all the splits will result in an identical structure).
unlist(strsplit(mystring, "_"), use.names = FALSE)[c(FALSE, FALSE, TRUE)]
# [1] "C" "F"
If you're extracting not by numeric position, but just looking to extract the last value after a delimiter, you have a few different alternatives.
Use a greedy regex:
gsub(".*_(.*)", "\\1", mystring)
# [1] "C" "F"
Use a convenience function like stri_extract*
from the "stringi" package:
library(stringi)
stri_extract_last_regex(mystring, "[A-Z]+")
# [1] "C" "F"
R - Splitting character vector so that every unique element is added to a new character vector
Your post title suggests you want unique strings, so
unique(unlist(strsplit(myvec, split=",")))
or
unique(unlist(strsplit(myvec, split=", ")))
if you always have a space following the comma.
Efficiently splitting a character vector
The following uses only base R. Append a semicolon onto each record, split the records at semicolon, remove leading and trailing whitespace, replace the space with a colon and space and read in using read.dcf
. This gives a matrix m
which we convert to a data frame and use type.convert to get the right types. (If a matrix is sufficient then omit the second line.)
m <- read.dcf(textConnection(sub(" ",": ",trimws(unlist(strsplit(paste0(vec, ";"),";"))))))
as.data.frame(lapply(as.data.frame(m, stringsAsFactors = FALSE), type.convert))
giving:
id sex age type
1 a m 16 1
2 a m 16 NA
3 a m 16 3
Split a column of character vectors and return a list
We can split the Variable
column at "," and get all the values and select only the unique
ones.
unique(unlist(strsplit(df$Variable, ",")))
#[1] "a" "b" "c"
If the Variable
column is factor convert it into character before using strsplit
.
Convert delimited string to numeric vector in dataframe
Here's some sample data that reproduces your error:
data <- data.frame(a = 1:3,
badColumn = c("10,20,30,40,50", "1,2,3,4,5,6", "9,8,7,6,5,4,3"),
stringsAsFactors = FALSE)
Here's the error:
library(tidyverse)
mutate(data, converted = as.numeric(unlist(strsplit(badColumn, split=","))))
# Error in mutate_impl(.data, dots) :
# Column `converted` must be length 3 (the number of rows) or one, not 18
A straightforward way would be to just use strsplit
on the entire column, and lapply
... as.numeric
to convert the resulting list values from character vectors to numeric vectors.
x <- mutate(data, converted = lapply(strsplit(badColumn, ",", TRUE), as.numeric))
str(x)
# 'data.frame': 3 obs. of 3 variables:
# $ a : int 1 2 3
# $ badColumn: chr "10,20,30,40,50" "1,2,3,4,5,6" "9,8,7,6,5,4,3"
# $ converted:List of 3
# ..$ : num 10 20 30 40 50
# ..$ : num 1 2 3 4 5 6
# ..$ : num 9 8 7 6 5 4 3
Related Topics
Reverse Order of Discrete Y Axis in Ggplot2
How to Read Only Lines That Fulfil a Condition from a CSV into R
Why Does Unlist() Kill Dates in R
How to Use a String Variable to Select a Data Frame Column Using $ Notation
Subset a Column in Data Frame Based on Another Data Frame/List
R Knitr Chunk Options for Figure Height/Width Are Not Working
Dplyr Mutate Rowwise Max of Range of Columns
Extract Matrix Column Values by Matrix Column Name
Format Number as Fixed Width, with Leading Zeros
Installation of Rodbc/Roracle Packages on Os X Mavericks
Sort Columns of a Dataframe by Column Name
Force Character Vector Encoding from "Unknown" to "Utf-8" in R
Simplest Way to Get Rbind to Ignore Column Names
Add New Row to Dataframe, at Specific Row-Index, Not Appended