split character columns and get names of field in string
Using regex
and the stringi
packages:
setDT(myDT) # After creating data.table from structure()
library(stringi)
fields <- unique(unlist(stri_extract_all(regex = "[a-z]+(?==)", myDT$info)))
patterns <- sprintf("(?<=%s=)[^;]+", fields)
myDT[, (fields) := lapply(patterns, function(x) stri_extract(regex = x, info))]
myDT[, !"info"]
chr pos type end
1: chr1 <NA> 3 4
2: chr2 <NA> <NA> 6
3: chr4 TRUE 2 5
Edit: To get the correct type it seems (?) type.convert()
can be used:
myDT[, (fields) := lapply(patterns, function(x) type.convert(stri_extract(regex = x, info), as.is = TRUE))]
R split column names with different occurrences of delimiter into strings and assign unique strings/string counts to a new dataframe
I think if you split at the "underscore, digit, underscore" it provides a solution to your statement above. This does eliminate the digit and the associated information. Does this matter?
names <- c("strainA_1_batch1", "strainA_2_batch2", "strainB_1_batch1", "strainC_1_batch2", "strainC_2_batch2",
"strainD_a_1_batch1", "strainD_b_1_batch1")
#split at the underscore, digit and underscore
splitList <- strsplit(names, "_\\d_")
#convert to dataframe
df <-data.frame(t(as.data.frame.list(splitList)))
#clean up data.frame
rownames(df)<-NULL
names(df)<-c("Strain", "Batch")
df
#report
table(df$Strain)
table(df$Batch)
Another option is to replace the underscore on either side of the digit with a " " (or other character) and then split on the space.
names<-gsub("_(\\d)_", " \\1 ", names)
How to split a character column into multiple columns in R
You can get what you want with gsub
:
gsub("^.* +- +([A-Za-z ]+) \\(.*$", "\\1", df$District)
[1] "North West" "North West" "North West" "North West" "North West" "North West"
The first argument to gsub
("^.* +- +([A-Za-z ]+) \(.*$") is a regular expression. It can be interpreted as follows:
From the the beginning of the string "^", match any characters ".*" followed by at least one space, a hyphen, and at least one space " +- +". Then capture the next text "()" that is made up of (at least one) letters and spaces "[A-Za-z ]+". Stop capturing when you reach a space followed by a parenthesis " \\(", then match everything until the end of the text ".*$".
The second argument of gsub
, "\\1" says replace the text with the text that was captured by the parentheses.
To assign it to a variable:
df$name <- gsub("^.* +- +([A-Za-z ]+) \\(.*$", "\\1", df$District)
Split data frame string column into multiple columns
Use stringr::str_split_fixed
library(stringr)
str_split_fixed(before$type, "_and_", 2)
How to make a row the column names and split up a string into multiple rows
You could probably use this -
df = df[, c(1, 5)]
## Split on comma and add to dataframe
tmp = strsplit(df$molecules, ",")
df = cbind(df[, -2], do.call(rbind, tmp))
## Transpose the dataframe
df = t(df)
rownames(df) = NULL
How to split a column into multiple (non equal) columns in R
We could use cSplit
from splitstackshape
library(splitstackshape)
cSplit(DF, "Col1",",")
-output
cSplit(DF, "Col1",",")
Col1_1 Col1_2 Col1_3 Col1_4
1: a b c <NA>
2: a b <NA> <NA>
3: a b c d
Split an string by number of characters in a column of a data frame to create multiple columns in R?
We can use separate
library(tidyr)
separate(df, ID, into = c("Spl_1", "Spl_2"), sep = 4, remove = FALSE)
# ID Spl_1 Spl_2 Var1 Var2
#1 0334KLM001 0334 KLM001 aa xx
#2 1334HDM002 1334 HDM002 zvv rr
#3 2334WEM003 2334 WEM003 qetr qwe
#4 3334OKT004 3334 OKT004 ff sdf
#5 4334WER005 4334 WER005 ee sdf
#6 5334BBC006 5334 BBC006 qly ssg
#7 6334QQQ007 6334 QQQ007 kk htj
#8 7334AAA008 7334 AAA008 uu yjy
#9 8334CBU009 8334 CBU009 ww wttt
#10 9334MLO010 9334 MLO010 aa dg
If we want 3 columns, we can pass a vector
in sep
separate(df, ID, into = c("Spl_1", "Spl_2", "Spl_3"), sep = c(4,8), remove = FALSE)
# ID Spl_1 Spl_2 Spl_3 Var1 Var2
#1 0334KLM001 0334 KLM0 01 aa xx
#2 1334HDM002 1334 HDM0 02 zvv rr
#3 2334WEM003 2334 WEM0 03 qetr qwe
#4 3334OKT004 3334 OKT0 04 ff sdf
#5 4334WER005 4334 WER0 05 ee sdf
#6 5334BBC006 5334 BBC0 06 qly ssg
#7 6334QQQ007 6334 QQQ0 07 kk htj
#8 7334AAA008 7334 AAA0 08 uu yjy
#9 8334CBU009 8334 CBU0 09 ww wttt
#10 9334MLO010 9334 MLO0 10 aa dg
If the numbers at the beginning are not of fixed length, use extract
extract(df, ID, into = c("Spl_1", "Spl_2"), "^([0-9]+)(.*)", remove = FALSE)
and for 3 columns,
extract(df, ID, into = c("Spl_1", "Spl_2", "Spl_3"), "(.{4})(.{4})(.*)", remove = FALSE)
How to split a string into multiple columns by a given pattern?
If the strings are always in that same format, the following regular expression should work well:
library(stringr)
x <- "\r\n \r\n How to get a confirm ticket?\r\n \r\n I want to get a tatkal ticket confirm ..."
str_split(x, "(\r\n\\s*)+", simplify = TRUE)[, -1, drop = FALSE]
[,1] [,2]
[1,] "How to get a confirm ticket?" "I want to get a tatkal ticket confirm ..."
If your data actually comes from a table in a text file or from a web page, there are probably more convenient options.
Related Topics
Automated Httr Authentication with Twitter , Provide Response to Interactive Prompt in "Batch" Mode
Keeping Only Certain Rows of a Data Frame Based on a Set of Values
Handling Errors Before Warnings in Trycatch
Dictionary() Is Not Supported Anymore in Tm Package. How to Emend Code
Extend an Irregular Sequence and Add Zeros to Missing Values
Convert a Printed Message into a Character Vector
Adding Percentage Labels on Pie Chart in R
Calculating Sum of Previous 3 Rows in R Data.Table (By Grid-Square)
Problems Formatting Date into Format "%Y-%M"
Is There an Error in Round Function in R
Gsub in R with Unicode Replacement Give Different Results Under Windows Compared with Unix
Finding Overlapping Ranges Between Two Interval Data
Finding Euclidean Distance in R{Spatstat} Between Points, Confined by an Irregular Polygon Window
When Does the Argument Go Inside or Outside Aes()