Split a string by a plus sign (+) character
Use
strsplit("(1)+(2)", "\\+")
or
strsplit("(1)+(2)", "+", fixed = TRUE)
The idea of using strsplit("(1)+(2)", "+")
doesn't work since unless specified otherwise, the split
argument is a regular expression, and the +
character is special in regex. Other characters that also need extra care are
?
*
.
^
$
\
|
{
}
[
]
(
)
Split a character string by the symbol *
You need to escape the star...
test = "23*45"
strsplit( test , "\\*" )
#[[1]]
#[1] "23" "45"
The split
is a regular expression and *
means the preceeding item is matched zero or more times. You are splitting on nothing , i.e. splitting into individual characters, as noted in the Details section of strsplit()
. \\*
means *treat *
as a literal *
.
Alternatively use the fixed
argument...
strsplit( test , "*" , fixed = TRUE )
#[[1]]
#[1] "23" "45"
Which gets R to treat the split pattern as literal and not a regular expression.
Split R string into individual characters
You could use
data.frame(Reduce(rbind, strsplit(df$V1, "")))
This returns
X1 X2 X3 X4 X5 X6
init g g g g c c
X c c c c t t
X.1 t t t t t t
X.2 a a a a a a
or
data.frame(do.call(rbind, strsplit(df$V1, "")))
which returns
X1 X2 X3 X4 X5 X6
1 g g g g c c
2 c c c c t t
3 t t t t t t
4 a a a a a a
How to split a string after the nth character in r
You can use substr
if you always want to split by the second character.
District <- c("AR01", "AZ03", "AZ05", "AZ08", "CA01", "CA05", "CA11", "CA16", "CA18", "CA21")
#split district starting at the first and ending at the second
state <- substr(District,1,2)
#split district starting at the 3rd and ending at the 4th
district <- substr(District,3,4)
#put in data frame if needed.
st_dt <- data.frame(state = state, district = district, stringsAsFactors = FALSE)
split string each x characters in dataframe
An option would be separate
library(tidyverse)
df %>%
separate(seq, into = paste0("x", 1:3), sep = c(3, 6))
# id x1 x2 x3
#1 1 ABC DEF GHI
#2 2 ZAB CDJ HIA
If we want to create it more generic
n1 <- nchar(as.character(df$seq[1])) - 3
s1 <- seq(3, n1, by = 3)
nm1 <- paste0("x", seq_len(length(s1) +1))
df %>%
separate(seq, into = nm1, sep = s1)
Or using base R
, using strsplit
, split the 'seq' column for each instance of 3 characters by passing a regex lookaround into a list
and then rbind
the list
elements
df[paste0("x", 1:3)] <- do.call(rbind,
strsplit(as.character(df$seq), "(?<=.{3})", perl = TRUE))
NOTE: It is better to avoid column names that start with non-standard labels such as numbers. For that reason, appended 'x' at the beginning of the names
R: How to split string into pieces
You can try with str_extract_all
:
stringr::str_extract_all(x, '[A-Za-z_]+')[[1]]
[1] "CN" "Shandong" "Zibo" "ABCDEFGHIJK" "IMG_HAS"
With base R :
regmatches(x, gregexpr('[A-Za-z_]+', x))[[1]]
Here we extract all the words with upper, lower case or an underscore. Everything else is ignored so characters like �\\00?
are not there in final output.
Split character by multiple criteria in R
In base R
, we can use strsplit
out <- strsplit("variable1+variable2 + variable3*variable4+ variable5",
"\\s*[*+]\\s*")[[1]]
-output
out
[1] "variable1" "variable2" "variable3" "variable4" "variable5"
The structure is
dput(out)
c("variable1", "variable2", "variable3", "variable4", "variable5"
)
Related Topics
Ggmap Error: Geomrasterann Was Built with an Incompatible Version of Ggproto
Calculate Multiple Aggregations on Several Variables Using Lapply(.Sd, ...)
Add Number of Observations Per Group in Ggplot2 Boxplot
Extract Elements Common in All Column Groups
How to Remove Columns from a Data.Frame
Rvest Error in Open.Connection(X, "Rb"):Timeout Was Reached
R Draws Plots with Rectangles Instead of Text
Get X-Value Given Y-Value: General Root Finding for Linear/Non-Linear Interpolation Function
How to Concatenate Factors, Without Them Being Converted to Integer Level
Rcpparmadillo Pass User-Defined Function
Ggplot2 Shade Area Under Density Curve by Group
Merge Dataframes of Different Sizes
R: += (Plus Equals) and ++ (Plus Plus) Equivalent from C++/C#/Java, etc.
How to Aggregate a Dataframe by Week
Insert Blanks into a Vector For, E.G., Minor Tick Labels in R