How to Split a Character Vector into Data Frame

How to split a character vector into data frame?

DF <- data.frame(do.call(rbind, strsplit(a, "-", fixed=TRUE)))
DF[,2] <- as.Date(DF[,2] , format="%Y%m%d")
DF[,3] <- as.integer(gsub(".tsv", "", DF[,3], fixed=TRUE))

# X1 X2 X3
#1 blablabla 1996-01-01 1
#2 blablabla 1996-01-01 2
#3 blablabla 1996-01-01 3

Is there a way to split a character vector into two columns?

Using v in the Note at the end use read.table. No packages are used.

read.table(text = v, sep = ".", as.is = TRUE)

giving this data.frame:

  V1    V2
1 2L 70660
2 2L 80704
3 X 92727

Note

The input in reproducible form:

v <- c("2L.70660", "2L.80704", "X.92727")

Splitting character vector into data frame when the separating character is in the string

A base R attempt which makes use of regular expression grouping:

Data:

mydf <- data.frame(B=c(rep(" 'abcefgh.abc_123.1_123.1'",length=50),
rep(" 'ab[+12.1]abcdefgh.abc_123.1_123.1'",length=50)))

Code:

new_df <- do.call(rbind, strsplit(gsub("(['\\w\\+\\.\\[]*)(\\]*)([a-z]+)(\\.)([\\w\\.']+)",
"\\1\\2\\3_\\5",
trimws(mydf$B),
perl = T), split = "_"))
new_df <- data.frame(new_df)

Output:

# Just a select number of rows
X1 X2 X3 X4
'abcefgh abc 123.1 123.1'
'abcefgh abc 123.1 123.1'
'abcefgh abc 123.1 123.1'
'abcefgh abc 123.1 123.1'
'abcefgh abc 123.1 123.1'
'abcefgh abc 123.1 123.1'
'ab[+12.1]abcdefgh abc 123.1 123.1'
'ab[+12.1]abcdefgh abc 123.1 123.1'
'ab[+12.1]abcdefgh abc 123.1 123.1'
'ab[+12.1]abcdefgh abc 123.1 123.1'
'ab[+12.1]abcdefgh abc 123.1 123.1'
'ab[+12.1]abcdefgh abc 123.1 123.1'

Explanation:

The idea here to group each row into 5 chunks and use gsub to target the chunks that would constitute your new columns. I will use 'ab[+12.1]abcdefgh.abc_123.1_123.1' as an example. Here, you want to group the string in the following chunks: 'ab[+12.1, ], abcdefgh, . and abc_123.1_123.1', and then you can concatenate the groups back together except for the fourth group which is replaced with _. At this point you have all the four columns you need, separated by _. Subsequently, you can go right ahead and split your new row on _ to generate 4 different columns.

I hope this helps.

In R: Split a character vector to find specific characters and return a data frame

We can use for-loop with grepl to achieve this task. + 0 is to convert the column form TRUE or FALSE to 1 or 0

for (col in c("A", "B")){
dat[[col]] <- grepl(col, dat$rat) + 0
}
dat
# orgnr rat A B
# 1 1 A B C 1 1
# 2 2 A F H L H 1 0
# 3 3 H X L O 0 0
# 4 4 X Y Z A B C 1 1

If performance is an issue, try this data.table approach.

library(data.table)

# Convert to data.table
setDT(dat)

# Create a helper function
dummy_fun <- function(col, vec){
grepl(col, vec) + 0
}

# Apply the function to A and B
dat[, c("A", "B") := lapply(c("A", "B"), dummy_fun, vec = rat)]
dat
# orgnr rat A B
# 1: 1 A B C 1 1
# 2: 2 A F H L H 1 0
# 3: 3 H X L O 0 0
# 4: 4 X Y Z A B C 1 1

converting a vector into a dataframe columnwise

You can transpose the vector and convert it into dataframe/tibble.

t(x) %>% as_tibble()
t(x) %>% data.frame()

# estimate ci.low ci.up
#1 0.595 0.11 2.004

Can I split character vector based on position in R?

Try this using gsub() to clean the second id variable and then merge the dataframes in a one data pipeline. Here the code using tidyverse functions:

library(tidyverse)
#Code
NewA <- A %>% full_join(B %>% mutate(ID=gsub('-','',ID)))

Output:

    ID A_score B_score
1 A123 8 2
2 B213 10 10
3 C421 9 9
4 C312 10 10


Related Topics



Leave a reply



Submit