How to split a character vector into data frame?
DF <- data.frame(do.call(rbind, strsplit(a, "-", fixed=TRUE)))
DF[,2] <- as.Date(DF[,2] , format="%Y%m%d")
DF[,3] <- as.integer(gsub(".tsv", "", DF[,3], fixed=TRUE))
# X1 X2 X3
#1 blablabla 1996-01-01 1
#2 blablabla 1996-01-01 2
#3 blablabla 1996-01-01 3
Is there a way to split a character vector into two columns?
Using v
in the Note at the end use read.table
. No packages are used.
read.table(text = v, sep = ".", as.is = TRUE)
giving this data.frame:
V1 V2
1 2L 70660
2 2L 80704
3 X 92727
Note
The input in reproducible form:
v <- c("2L.70660", "2L.80704", "X.92727")
Splitting character vector into data frame when the separating character is in the string
A base R attempt which makes use of regular expression grouping
:
Data:
mydf <- data.frame(B=c(rep(" 'abcefgh.abc_123.1_123.1'",length=50),
rep(" 'ab[+12.1]abcdefgh.abc_123.1_123.1'",length=50)))
Code:
new_df <- do.call(rbind, strsplit(gsub("(['\\w\\+\\.\\[]*)(\\]*)([a-z]+)(\\.)([\\w\\.']+)",
"\\1\\2\\3_\\5",
trimws(mydf$B),
perl = T), split = "_"))
new_df <- data.frame(new_df)
Output:
# Just a select number of rows
X1 X2 X3 X4
'abcefgh abc 123.1 123.1'
'abcefgh abc 123.1 123.1'
'abcefgh abc 123.1 123.1'
'abcefgh abc 123.1 123.1'
'abcefgh abc 123.1 123.1'
'abcefgh abc 123.1 123.1'
'ab[+12.1]abcdefgh abc 123.1 123.1'
'ab[+12.1]abcdefgh abc 123.1 123.1'
'ab[+12.1]abcdefgh abc 123.1 123.1'
'ab[+12.1]abcdefgh abc 123.1 123.1'
'ab[+12.1]abcdefgh abc 123.1 123.1'
'ab[+12.1]abcdefgh abc 123.1 123.1'
Explanation:
The idea here to group each row into 5 chunks and use gsub
to target the chunks that would constitute your new columns. I will use 'ab[+12.1]abcdefgh.abc_123.1_123.1'
as an example. Here, you want to group the string in the following chunks: 'ab[+12.1
, ]
, abcdefgh
, .
and abc_123.1_123.1'
, and then you can concatenate the groups back together except for the fourth group which is replaced with _
. At this point you have all the four columns you need, separated by _
. Subsequently, you can go right ahead and split your new row on _
to generate 4 different columns.
I hope this helps.
In R: Split a character vector to find specific characters and return a data frame
We can use for-loop with grepl
to achieve this task. + 0
is to convert the column form TRUE
or FALSE
to 1 or 0
for (col in c("A", "B")){
dat[[col]] <- grepl(col, dat$rat) + 0
}
dat
# orgnr rat A B
# 1 1 A B C 1 1
# 2 2 A F H L H 1 0
# 3 3 H X L O 0 0
# 4 4 X Y Z A B C 1 1
If performance is an issue, try this data.table
approach.
library(data.table)
# Convert to data.table
setDT(dat)
# Create a helper function
dummy_fun <- function(col, vec){
grepl(col, vec) + 0
}
# Apply the function to A and B
dat[, c("A", "B") := lapply(c("A", "B"), dummy_fun, vec = rat)]
dat
# orgnr rat A B
# 1: 1 A B C 1 1
# 2: 2 A F H L H 1 0
# 3: 3 H X L O 0 0
# 4: 4 X Y Z A B C 1 1
converting a vector into a dataframe columnwise
You can transpose the vector and convert it into dataframe/tibble.
t(x) %>% as_tibble()
t(x) %>% data.frame()
# estimate ci.low ci.up
#1 0.595 0.11 2.004
Can I split character vector based on position in R?
Try this using gsub()
to clean the second id variable and then merge the dataframes in a one data pipeline. Here the code using tidyverse
functions:
library(tidyverse)
#Code
NewA <- A %>% full_join(B %>% mutate(ID=gsub('-','',ID)))
Output:
ID A_score B_score
1 A123 8 2
2 B213 10 10
3 C421 9 9
4 C312 10 10
Related Topics
Custom Fill Color in Ggvis (And Other Options)
Colons Equals Operator in R? New Syntax
Create Multilines from Points, Grouped by Id with Sf Package
Add Missing Xts/Zoo Data with Linear Interpolation in R
How to Install Rhadoop Packages (Rmr, Rhdfs, Rhbase)
Rcharts with Highcharts as Shiny Application
How to Convert by the Minute Data to Hourly Average Data
How to Use Aggregate Function in R
Plot Only a Select Few Facets in Facet_Grid
Stacke Different Plots in a Facet Manner
Reproduce a 'The Economist' Chart with Dual Axis
Retrieve Census Tract from Coordinates
Porting Set Operations from R's Data Frames to Data Tables: How to Identify Duplicated Rows
With the R Package Xlsx, How to Set Na.Strings When Reading an Excel File
How to Sort a Matrix by All Columns