Split data frame string column into multiple columns
Use stringr::str_split_fixed
library(stringr)
str_split_fixed(before$type, "_and_", 2)
How to split a column into multiple (non equal) columns in R
We could use cSplit
from splitstackshape
library(splitstackshape)
cSplit(DF, "Col1",",")
-output
cSplit(DF, "Col1",",")
Col1_1 Col1_2 Col1_3 Col1_4
1: a b c <NA>
2: a b <NA> <NA>
3: a b c d
How to split a dataframe column into two columns
read.table(text=df$X1, sep=':', fill=T, h=F, dec = '/')
V1 V2
1 NA
2 1.0 0.82
3 1.1 1.995
4 0.1 1.146
5 NA
6 1.1 1.995
If you want columns in respective data.types:
type.convert(read.table(text=df$X1, sep=':', fill=T, h=F, dec = '/'), as.is = TRUE)
V1 V2
1 NA NA
2 1.0 0.820
3 1.1 1.995
4 0.1 1.146
5 NA NA
6 1.1 1.995
df <- structure(list(X1 = c(NA, "1/0:0.82", "1/1:1.995", "0/1:1.146", NA,
"1/1:1.995")), class = "data.frame", row.names = c(NA, -6L))
Splitting a single column into multiple columns in R
A possible solution, based on tidyverse
:
library(tidyverse)
df %>%
filter(table != "_________________________________________________" ) %>%
mutate(table = str_trim(table)) %>%
separate(table, sep = "\\s+(?=\\d+)",
into = c("Characteristic", "Urban", "Rural", "Total"), fill = "right") %>%
filter(Characteristic != "") %>%
slice(-1)
#> # A tibble: 54 × 4
#> Characteristic Urban Rural Total
#> <chr> <chr> <chr> <chr>
#> 1 Electricity <NA> <NA> <NA>
#> 2 Yes 99.8 94.4 98.9
#> 3 No 0.2 5.6 1.1
#> 4 Total 100.0 100.0 100.0
#> 5 Source of drinking water <NA> <NA> <NA>
#> 6 Piped into residence 97.1 81.4 94.4
#> 7 Public tap 0.0 0.3 0.1
#> 8 Well in residence 1.1 3.7 1.6
#> 9 Public well 0.0 0.4 0.1
#> 10 Spring 0.0 2.3 0.4
#> # … with 44 more rows
How to split up a column of a dataframe into new columns in R?
With tidyverse
, we could create a new group everytime c
appears in the x
column, then we can pivot the data wide. Generally, duplicate names are discouraged, so I created a sequential c
column names.
library(tidyverse)
results <- df %>%
group_by(idx = cumsum(x == "c")) %>%
filter(x != "c") %>%
mutate(rn = row_number()) %>%
pivot_wider(names_from = idx, values_from = x, names_prefix = "c_") %>%
select(-rn)
Output
c_1 c_2 c_3
<chr> <chr> <chr>
1 a b d
2 a b d
3 a b d
4 a b d
However, if you really want duplicate names, then we could add on set_names
:
purrr::set_names(results, "c")
c c c
<chr> <chr> <chr>
1 a b d
2 a b d
3 a b d
4 a b d
Or in base R, we could create the grouping with cumsum
, then split those groups, then bind back together with cbind
. Then, we remove the first row that contains the c
characters.
names(df) <- "c"
do.call(cbind, split(df, cumsum(df$c == "c")))[-1,]
# c c c
#2 a b d
#3 a b d
#4 a b d
#5 a b d
How to split column by some rules in R?
One option would be to use an ifelse
to find any rows that do not have an item in list_of_keep_data
, then replace the hyphen with something else (like ;>
) and leave the keep it rows alone. Then, we can use separate
by using the new delimiter (;>
). This simultaneously removes the text from product
and puts the other text into the tags
column.
library(tidyverse)
df_in %>%
mutate(product = ifelse(
!str_detect(product, str_c(list_of_keep_data, collapse = "|")),
str_replace_all(product, pattern = " - ", " ;> "),
product
)) %>%
separate(product, into = c("product", "tags"), sep = " ;> ")
Output
product tags
1: Product 1 100g
2: Product 2 150g
3: Product 3 <NA>
4: Product 4 - keep it <NA>
Another option could be to filter
to the rows that you do want to separate, separate on the -
, then bind the rows back to the other rows.
df_in %>%
filter(!str_detect(product, str_c(list_of_keep_data, collapse = "|"))) %>%
separate(product, into = c("product", "tags"), sep = " - ") %>%
bind_rows(filter(df_in, str_detect(product, str_c(list_of_keep_data, collapse = "|"))))
Or here is another option using data.table
:
library(data.table)
df_in[, tags := as.character(tags)
][!str_detect(product, str_c(list_of_keep_data, collapse = "|")),
c("product", "tags") := tstrsplit(product, " - ")][]
Split one column into two, retain original value if there aren't two values
This is a good use case for dplyr::coalesce
, which (akin to the SQL function it's named for) returns the first non-NA element from a set of vectors.
library(dplyr)
library(tidyr)
data %>%
separate(GeneLocation, c('Start_Position', 'Stop_Position')) %>%
mutate(Stop_Position = coalesce(Stop_Position, Start_Position))
How to split a column in multiple columns using data.table
Use tstrsplit
with keep = 1:3
to keep only the first three columns:
dt[, c("bins", "positions", "IDs") := tstrsplit(name, "_", fixed = TRUE, keep = 1:3)]
name bin position ID
1: bin1_position1_ID1 bin1 position1 ID1
2: bin2_position2_ID2 bin2 position2 ID2
3: bin3_position3_ID3 bin3 position3 ID3
4: bin4_position4_ID4 bin4 position4 ID4
5: bin5_position5_ID5_another5_more5 bin5 position5 ID5
Related Topics
Pivot Wider Produces Nested Object
Subsetting a Data Frame to the Rows Not Appearing in Another Data Frame
Is There a Package or Technique Availabe for Calculating Large Factorials in R
In R, Switch Uppercase to Lowercase and Vice-Versa in a String
Simulate an Ar(1) Process with Uniform Innovations
Web Scraping a Tableauviz into an R Dataframe
Cumsum Reset at Certain Values
Mapping Variable to Hexagon Size with Geom_Hex
Retain Numerical Precision in an R Data Frame
How to Convert a Numeric Value into a Date Value
How to Print on a Serie Sof Graphs Pairwise Comparisons Bars and Effect Size Value
Wordcloud Package: Get "Error in Strwidth(…):Invalid 'Cex' Value"
R: Removing Duplicate Elements in a Vector
How to Merge Two Data Frame Based on Partial String Match with R