How to Split a Column in R

Split data frame string column into multiple columns

Use stringr::str_split_fixed

library(stringr)
str_split_fixed(before$type, "_and_", 2)

How to split a column into multiple (non equal) columns in R

We could use cSplit from splitstackshape

library(splitstackshape)
cSplit(DF, "Col1",",")

-output

cSplit(DF, "Col1",",")
Col1_1 Col1_2 Col1_3 Col1_4
1: a b c <NA>
2: a b <NA> <NA>
3: a b c d

How to split a dataframe column into two columns

read.table(text=df$X1, sep=':', fill=T, h=F, dec = '/')
V1 V2
1 NA
2 1.0 0.82
3 1.1 1.995
4 0.1 1.146
5 NA
6 1.1 1.995

If you want columns in respective data.types:

type.convert(read.table(text=df$X1, sep=':', fill=T, h=F, dec = '/'), as.is = TRUE)
V1 V2
1 NA NA
2 1.0 0.820
3 1.1 1.995
4 0.1 1.146
5 NA NA
6 1.1 1.995


df <- structure(list(X1 = c(NA, "1/0:0.82", "1/1:1.995", "0/1:1.146", NA,
"1/1:1.995")), class = "data.frame", row.names = c(NA, -6L))

Splitting a single column into multiple columns in R

A possible solution, based on tidyverse:

library(tidyverse)

df %>%
filter(table != "_________________________________________________" ) %>%
mutate(table = str_trim(table)) %>%
separate(table, sep = "\\s+(?=\\d+)",
into = c("Characteristic", "Urban", "Rural", "Total"), fill = "right") %>%
filter(Characteristic != "") %>%
slice(-1)

#> # A tibble: 54 × 4
#> Characteristic Urban Rural Total
#> <chr> <chr> <chr> <chr>
#> 1 Electricity <NA> <NA> <NA>
#> 2 Yes 99.8 94.4 98.9
#> 3 No 0.2 5.6 1.1
#> 4 Total 100.0 100.0 100.0
#> 5 Source of drinking water <NA> <NA> <NA>
#> 6 Piped into residence 97.1 81.4 94.4
#> 7 Public tap 0.0 0.3 0.1
#> 8 Well in residence 1.1 3.7 1.6
#> 9 Public well 0.0 0.4 0.1
#> 10 Spring 0.0 2.3 0.4
#> # … with 44 more rows

How to split up a column of a dataframe into new columns in R?

With tidyverse, we could create a new group everytime c appears in the x column, then we can pivot the data wide. Generally, duplicate names are discouraged, so I created a sequential c column names.

library(tidyverse)

results <- df %>%
group_by(idx = cumsum(x == "c")) %>%
filter(x != "c") %>%
mutate(rn = row_number()) %>%
pivot_wider(names_from = idx, values_from = x, names_prefix = "c_") %>%
select(-rn)

Output

  c_1   c_2   c_3  
<chr> <chr> <chr>
1 a b d
2 a b d
3 a b d
4 a b d

However, if you really want duplicate names, then we could add on set_names:

purrr::set_names(results, "c")

c c c
<chr> <chr> <chr>
1 a b d
2 a b d
3 a b d
4 a b d

Or in base R, we could create the grouping with cumsum, then split those groups, then bind back together with cbind. Then, we remove the first row that contains the c characters.

names(df) <- "c"
do.call(cbind, split(df, cumsum(df$c == "c")))[-1,]

# c c c
#2 a b d
#3 a b d
#4 a b d
#5 a b d

How to split column by some rules in R?

One option would be to use an ifelse to find any rows that do not have an item in list_of_keep_data , then replace the hyphen with something else (like ;>) and leave the keep it rows alone. Then, we can use separate by using the new delimiter (;>). This simultaneously removes the text from product and puts the other text into the tags column.

library(tidyverse)

df_in %>%
mutate(product = ifelse(
!str_detect(product, str_c(list_of_keep_data, collapse = "|")),
str_replace_all(product, pattern = " - ", " ;> "),
product
)) %>%
separate(product, into = c("product", "tags"), sep = " ;> ")

Output

               product tags
1: Product 1 100g
2: Product 2 150g
3: Product 3 <NA>
4: Product 4 - keep it <NA>

Another option could be to filter to the rows that you do want to separate, separate on the -, then bind the rows back to the other rows.

df_in %>% 
filter(!str_detect(product, str_c(list_of_keep_data, collapse = "|"))) %>%
separate(product, into = c("product", "tags"), sep = " - ") %>%
bind_rows(filter(df_in, str_detect(product, str_c(list_of_keep_data, collapse = "|"))))

Or here is another option using data.table:

library(data.table)

df_in[, tags := as.character(tags)
][!str_detect(product, str_c(list_of_keep_data, collapse = "|")),
c("product", "tags") := tstrsplit(product, " - ")][]

Split one column into two, retain original value if there aren't two values

This is a good use case for dplyr::coalesce, which (akin to the SQL function it's named for) returns the first non-NA element from a set of vectors.

library(dplyr)
library(tidyr)
data %>%
separate(GeneLocation, c('Start_Position', 'Stop_Position')) %>%
mutate(Stop_Position = coalesce(Stop_Position, Start_Position))

How to split a column in multiple columns using data.table

Use tstrsplit with keep = 1:3 to keep only the first three columns:

dt[, c("bins", "positions", "IDs") := tstrsplit(name, "_", fixed = TRUE, keep = 1:3)]
                                name  bin  position  ID
1: bin1_position1_ID1 bin1 position1 ID1
2: bin2_position2_ID2 bin2 position2 ID2
3: bin3_position3_ID3 bin3 position3 ID3
4: bin4_position4_ID4 bin4 position4 ID4
5: bin5_position5_ID5_another5_more5 bin5 position5 ID5


Related Topics



Leave a reply



Submit