R split column by fixed width
One option would be to use extract
from tidyr
library(tidyr)
extract(extract(df, DDMMYY, c("DD","MM", "YY"), "(..)(..)(..)",
convert=TRUE), HHMM, c("HH", "MM"), "(..)(..)", convert=TRUE)
# ID DD MM YY HH MM
#1 1 10 3 10 22 5
#2 2 11 3 10 10 45
#3 3 12 3 10 11 10
#4 4 13 3 10 22 50
Or you could use strsplit
from base R
df[,c("DD", "MM", "YY", "HH", "MM")] <- do.call(data.frame,lapply(df[,-1],
function(x) do.call(rbind,lapply(strsplit(as.character(x),
'(?<=..)(?=..)', perl=TRUE), as.numeric))))
df[,-(2:3)]
# ID DD MM YY HH MM.1
#1 1 10 3 10 22 5
#2 2 11 3 10 10 45
#3 3 12 3 10 11 10
#4 4 13 3 10 22 50
Text to columns by fixed width in R
I regularly use these two functions:
substrRight <- function(x, n){
substr(x, nchar(x)-n+1, nchar(x))
}
and
substrLeft <- function(x, n){
substr(x, 1,n)
}
Which cutoff n characters left or right of the string
Column split by fixed length in R
1) read.fwf This can be done using read.fwf
as mentioned in the comments. This is straight-forward although it requires tedious determination of the field widths.
w <- c(7, 11, 10, 5, 22, 10, 9, 9, 9, 9, 12)
read.fwf("myfile.dat", w)
The remaining methods assume there is at least one space between each field. That is at least the case in the sample input.
2) read.pattern() This requires the development version of gsubfn package. This does require a long regular expression although its a reasonably straight forward one matching spaces (" +"
), non-spaces ("\\S+"
) and optional non-spaces ("\\S*"
).
library(gsubfn)
library(devtools) # source_url supports https whereas source only supports http
source_url("https://gsubfn.googlecode.com/svn/trunk/R/read.pattern.R") # from dev repo
# the third capture group is "\\S*" whereas the others are "\\S+"
pat_ <- rep(c("(\\S+) +", "(\\S*) +", "(\\S+) +", "(\\S+)"), c(2, 1, 7, 1))
pat <- paste0( c( "^ *", pat_, " *$"), collapse = "")
read.pattern("myfile.dat", pattern = pat, as.is = TRUE)
3) gsubfn() With slightly more work we can use gsubfn()
in the same package in which case we can use the CRAN version of the package. Use pat
defined above:
Lines <- readLines("myfile.dat")
tmp <- gsubfn(pat, ... ~ paste(..., sep = ","), Lines)
read.table(text = tmp, sep = ",", as.is = TRUE)
4) count.fields We can explicitly count fields in each line to get another solution. This requires no addon packages. It avoids manual field width counting but does require more code.
Lines <- readLines("myfile.dat")
k <- count.fields("myfile.dat")
Lines2 <- gsub("^ *| *$", "", Lines) # trim whitespace from beginning and end
Lines3 <- ifelse(k == 10, sub("^(\\S+ *\\S+)", "\\1,", Lines2), Lines2) # insert extra ,
Lines4 <- gsub(" +", ",", Lines3) # replace each string of spaces with a ,
read.table(text = Lines4, sep = ",", as.is = TRUE)
5) sub Slightly shorter and still without addon packages is this one. We use sub
to insert an extra comma into the short lines and then replace each string of spaces with a comma and read:
Lines <- readLines("myfile.dat")
pat2 <- "^( *(\\S+ +){2})((\\S+ +){7}\\S+) *$"
g <- gsub(" +", ",", sub(pat2, "\\1,\\3", Lines))
read.table(text = g, sep = ",", as.is = TRUE)
REVISED Additional solutions.
How can I create a DataFrame with separate columns from a fixed width character vector input in R?
You can use textConnection
to read file as text in read.fwf
and supply the widths.
data <- read.fwf(textConnection(text),
widths = c(12, 14, 20), strip.white = TRUE, skip = 3)
data
# V1 V2 V3
#1 AA A134 abcd
#2 AB A123 def
#3 AC A345 ghikl
#4 BA B134 jklmmm
#5 AD A987 mn
data
text <- c(" Report", "Group ID Name", "Number",
"AA A134 abcd", "AB A123 def",
"AC A345 ghikl", "BA B134 jklmmm",
"AD A987 mn")
Split dataframe based on one column in r, with a non-fixed width column
You can use tidyr::separate_rows
:
library(tidyr)
separate_rows(data, treatments)
# id treatments
#1 1 1
#2 1 2
#3 1 3
#4 2 2
#5 2 3
#6 3 8
#7 3 9
#8 3 1
#9 3 2
#10 3 4
Separating a string according to fixed lengths in R, to create columns
Since you mention fixed length delimiting, maybe give read.fwf
a try.
read.fwf(textConnection(text), widths = c(3, 21, 13),
col.names = c("Ticker", "Company", "Date"))
# Ticker Company Date
# 1 ABC ABC BROWNIES COMPANY 1/31/2009
# 2 BCD BCD BROWNIES COMPANY 1/31/2009
You can mess around with the middle 21
value to get it to work on all the data.
Another possibility is to split on three or more spaces.
data.frame(do.call(rbind, strsplit(text, " {3,}")))
# X1 X2 X3
# 1 ABC ABC BROWNIES COMPANY 1/31/2009
# 2 BCD BCD BROWNIES COMPANY 1/31/2009
Using R to separate txt file based on fixed width
We can read the file with readLines
and then use read.table
lines <- readLines('yourfile.csv')
read.table(text = gsub(" ([A-Z]\\w+\\s+?[A-Za-z]+?) ", " '\\1' ", lines),
sep = " ", header = FALSE,
colClasses = rep("character", 7), stringsAsFactors = FALSE)
# V1 V2 V3 V4 V5 V6 V7
#1 2016SVIC NN 01407 MOUNT ALEXANDER 7-12 COLLEGE 3031
Related Topics
Ggplot2: How to Rotate a Graph in a Specific Angle
Combining .Sd with Renamed Variable Messes with Names of .Sd Columns
Create a New Column with Non-Null Columns' Names
Understanding Bandwidth Smoothing in Ggplot2
Counting the Number of Values Greater Than 0 in R in Multiple Columns
Function for Polynomials of Arbitrary Order (Symbolic Method Preferred)
Calculate Difference Between Dates by Group in R
How to Format the X-Axis of the Hard Coded Plotting Function of Spei Package in R
Programmatically Create Tab and Plot in Markdown
Separate a Column into Multiple Columns Using Tidyr::Separate with Sep=""
Plot a Function with Several Arguments in R
How to Decode Postgresql Bytea Column Hex to Int16/Uint16 in R
Create All Subvectors of a Certain Length (Moving Window)
Selecting Multiple Columns in Data Frame Using Partial Column Name
Aggregating Monthly Column Values into Quarterly Values
How to Read Large Numbers Precisely in R and Perform Arithmetic on Them