Read Observations in Fixed Width Files Spanning Multiple Lines in R

Read observations in fixed width files spanning multiple lines in R

You actually need read.fwf for this.

Set up some sample data

    txt <- 'Acura         Integra        Small   12.9 15.9 18.8 25 31 0 1 4 1.8 140 6300
2890 1 13.2 5 177 102 68 37 26.5 11 2705 0
Acura         Legend         Midsize 29.2 33.9 38.7 18 25 2 1 6 3.2 200 5500
2335 1 18.0 5 195 115 71 38 30.0 15 3560 0
Audi          90             Compact 25.9 29.1 32.3 20 26 1 1 6 2.8 172 5500
2280 1 16.9 5 180 102 67 37 28.0 14 3375 0'

Read using read.fwf - pay attention to widths argument. The widths should be a list of 2 integer vectors specifying element widths on multiple lines

DF <- read.fwf(textConnection(txt), 
               widths = list(
                 c(14, 15, 8, 5, 5, 5, 3, 3, 2, 2, 2, 4, 4, 4), 
                 c(5, 2, 5, 2, 4, 4, 3, 3, 5, 3, 5, 1)
               ), 
               header = FALSE)

Using pander package to pretty print the table since it has so many columns.

require(pander)
pandoc.table(DF)
## 
## ---------------------------------------------------
##  V1     V2      V3     V4   V5   V6   V7   V8   V9 
## ----- ------- ------- ---- ---- ---- ---- ---- ----
## Acura Integra  Small  12.9 15.9 18.8  25   31   0  
## 
## Acura Legend  Midsize 29.2 33.9 38.7  18   25   2  
## 
## Audi    90    Compact 25.9 29.1 32.3  20   26   1  
## ---------------------------------------------------
## 
## Table: Table continues below
## 
##  
## -----------------------------------------------
##  V10   V11   V12   V13   V14   V15   V16   V17 
## ----- ----- ----- ----- ----- ----- ----- -----
##   1     4    1.8   140  6300  2890    1   13.2 
## 
##   1     6    3.2   200  5500  2335    1   18.0 
## 
##   1     6    2.8   172  5500  2280    1   16.9 
## -----------------------------------------------
## 
## Table: Table continues below
## 
##  
## -----------------------------------------------
##  V18   V19   V20   V21   V22   V23   V24   V25 
## ----- ----- ----- ----- ----- ----- ----- -----
##   5    177   102   68    37   26.5   11   2705 
## 
##   5    195   115   71    38   30.0   15   3560 
## 
##   5    180   102   67    37   28.0   14   3375 
## -----------------------------------------------
## 
## Table: Table continues below
## 
##  
## -----
##  V26 
## -----
##   0  
## 
##   0  
## 
##   0  
## -----
##

How to tidy a fixed width file with headers every n (varies) rows?

One other possible solution (no tidyverse) is to read in the file per line, look for header rows and paste those rows at the end of rows without header. After, these lines are splitted and put into a data.frame.

lines <- readLines("asd.dat")

# last index + 1 for iteration
headers <- c(which(grepl("^4 ", lines)), length(lines) + 1) 

pastedLines <- c()
for(i in 1:(length(headers) - 1)) {
  pastedLines <- c(pastedLines, 
                   paste(lines[(headers[i] + 1) : (headers[i + 1] - 1)], lines[headers[i]]))
}

DF <- as.data.frame(matrix(unlist(strsplit(pastedLines, "\\s+")), nrow =  length(pastedLines), byrow=T))

Output:

           V1 V2 V3     V4 V5                  V6       V7
1  5416001130  1  F 492273  4 64001416230519844TP blahblah
2  5416001140  3  F 492274  4 64001416230519844TP blahblah
3  5416001145  1  F 492275  4 64001416230519844TP blahblah
4  5416001150 19  F 492276  4 64001416230519844TP blahblah
5  5416001155 21  F 492277  4 64001416230519844TP blahblah
6  5416001160 21  F 492278  4 64001416230519844TP blahblah
7  5416001165 13  F 492279  4 64001416230519844TP blahblah
8  5416001170  3  F 492280  4 64001416230519844TP blahblah
9  5416001180  1  F 492281  4 64001416230519844TP blahblah
10 5544001125  1  F 492291  4 64001544250619844RA blahblah
11 5544001130  3  F 492292  4 64001544250619844RA blahblah
12 5544001135  4  F 492293  4 64001544250619844RA blahblah
13 5544001140 11  F 492294  4 64001544250619844RA blahblah
14 5544001145 13  F 492295  4 64001544250619844RA blahblah

How can I create a DataFrame with separate columns from a fixed width character vector input in R?

You can use textConnection to read file as text in read.fwf and supply the widths.

data <- read.fwf(textConnection(text), 
                 widths = c(12, 14, 20), strip.white = TRUE, skip = 3)
data
#  V1   V2     V3
#1 AA A134   abcd
#2 AB A123    def
#3 AC A345  ghikl
#4 BA B134 jklmmm
#5 AD A987     mn

data

text <- c("           Report", "Group        ID           Name", "Number", 
"AA          A134          abcd", "AB          A123          def", 
"AC          A345          ghikl", "BA          B134          jklmmm", 
"AD          A987          mn")