Split or separate uneven/unequal strings with no delimiter
This works. It fills with blanks rather than NA
s, but you can change that post-hoc if you prefer. (fill = 'right'
only works when splitting on a character vector, not explicit positions.)
maxchar = max(nchar(as.character(df$y)))
tidyr::separate(df, y, into = paste0("y", 1:maxchar), sep = 1:(maxchar - 1))
# x y1 y2 y3 y4 y5 y6
# 1 X1 0 0 L 0
# 2 X2 0
# 3 X3 0 0 0 1 2 L
# 4 X4 0 1 2 3 L 0
# 5 X5 0 D 0
Separate a column with uneven/unequal strings and with no delimiters
The code below may work for you, assuming that the "site", "garden" and "species" columns are of a fixed width.
df <- df %>%
mutate(site = substr(id, 1, 2),
garden = substr(id, 3, 5),
plot = ifelse(substr(id, 6, 9) == "CAGE", substr(id, 6, 9), substr(id, 6, 6)),
year = ifelse(substr(id, 6, 9) == "CAGE", substr(id, 10, 13), substr(id, 7, 10)),
species = ifelse(substr(id, 6, 9) == "CAGE", substr(id, 14, 17), substr(id, 11, 14)),
sampledate = ifelse(substr(id, 6, 9) == "CAGE", substr(id, 18, nchar(id)), substr(id, 15, nchar(id)))) %>%
separate(sampledate, into = c("m","d","y"), sep = "/") %>%
mutate(portion = substr(y, 3, nchar(y)),
sampledate = as.Date(paste(m, d, substr(y, 1, 2), sep = "-"), format = "%m-%d-%y"),
m = NULL,
d = NULL,
y = NULL)
R: split uneven length string with missing separator into two cols: separate characters and numbers
You can use extract
from tidyr
to get data in two columns where 1st column would have everything until a number is encountered and the second column would have the number part.
tidyr::extract(df, names, c('chars', 'nums'), '(.*?)(\\d+)', remove = FALSE)
# names chars nums
#1 ALL10 ALL 10
#2 ALL3 ALL 3
#3 CCF8 CCF 8
#4 not_CCF19 not_CCF 19
You can use the same regex in str_match
:
stringr::str_match(df$names, '(.*?)(\\d+)')[, -1]
How to split a data frame column with no defined delimiter
seriesID <- c('ISU00000000033001',
'ISU00000000033001',
'ISU00000000063001',
'ISU00000000063001')
df <- data.frame(pre = substr(seriesID,1,3),
supp =substr(seriesID,4,6),
ind =substr(seriesID,7,12),
data =substr(seriesID,13,13),
case =substr(seriesID,14,14),
area =substr(seriesID,15,17))
df
pre supp ind data case area
1 ISU 000 000000 3 3 001
2 ISU 000 000000 3 3 001
3 ISU 000 000000 6 3 001
4 ISU 000 000000 6 3 001
Using separate() to split differently-sized strings
You need this:
df %>% separate(x,c("size","anim"), sep = "(?!^)(?=[[:upper:]])")
# A tibble: 4 x 3
size anim y
<chr> <chr> <dbl>
1 big Ape 1
2 small Ape 2
3 big Dog 5
4 small Dog 3
Splitting a string column with unequal size into multiple columns using R
This is a good occasion to make use of extra = merge
argument of separate
:
library(dplyr)
df %>%
separate(str, c('A', 'B', 'C'), sep= ";", extra = 'merge')
no A B C
1 1 M 12 M 13 <NA>
2 2 M 24 <NA> <NA>
3 3 <NA> <NA> <NA>
4 4 C 12 C 50 C 78
split column containing strings of unequal length into multiple columns in R
Definitely an odd request, but definitely possible with tidyverse.
library(tidyverse)
df <- uniq %>%
mutate(n = row_number()) %>%
separate_rows(seq, sep = ' ') %>%
group_by(n, Freq) %>%
mutate(n2 = row_number()) %>%
spread(n2, seq) %>%
select(-n)
Freq `1` `2` `3` `4` `5` `6` `7`
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 3 T G T T A T T
2 4 G G T G T NA NA
3 50 G G T T NA NA NA
4 172 G NA NA NA NA NA NA
Using separate to split uneven number of variables in a column
With separate
from the tidyr
package:
library(tidyr)
country_info %>%
separate(country_data,
into = sprintf('%s.%s', rep(c('country','player.count'),3), rep(1:3, each=2)))
the result:
country.1 player.count.1 country.2 player.count.2 country.3 player.count.3
1 France 4 Morroco 8 Italy 2
2 Scotland 6 Mexico 2 <NA> <NA>
3 Scotland 2 <NA> <NA> <NA> <NA>
Separate automatically recognizes :
and |
as characters on which it has to separate. If you want to separate on a specific character, you need to specify that with the sep
argument. In this case you could use sep = '[:|]'
. This also prevents misbehavior of the automatic detection when there are missing values (see discussion in the comments).
With sprintf
you paste together the two vectors rep(c('country','player.count'),3)
and rep(1:3, each=2)
into a vector of column names where %s.%s
tells sprintf
to treat the two vectors are string-vectors and paste them together with a dot as separator. See ?sprintf
for more info. The each
argument tells rep
not to repete the whole vector a number of times, but to repete each element of the vector a number of times.
Related Topics
Breaks for Scale_X_Date in Ggplot2 and R
How to Ignore Na in Ifelse Statement
Fastest Way to Sort Each Row of a Large Matrix in R
How to Get the Min/Max Possible Numeric
Insert Function Variable into Graph Title
Renaming and Hiding an Exported Rcpp Function in an R Package
Additional Metrics in Caret - Ppv, Sensitivity, Specificity
Add Missing Rows to a Data Table
Extracting Data from Text Files
Robust and Clustered Standard Error in R for Probit and Logit Regression
Scoping of Variables in Aes(...) Inside a Function in Ggplot
Shiny Rcharts Multiple Chart Output
Why Does Mapply Not Return Date-Objects
Getsymbols and Using Lapply, Cl, and Merge to Extract Close Prices
Regex; Eliminate All Punctuation Except