Import Data into R with an Unknown Number of Columns

Import data into R with an unknown number of columns?

There is nice function count.fields (see help) which counts number of column per row:

count.fields("test", sep = "\t")
#[1] 1 2 3 4 5 6 7 8

So, using your second solution:

no_col <- max(count.fields("test", sep = "\t"))
data <- read.table("test",sep="\t",fill=TRUE,col.names=1:no_col)
data
# X1 X2 X3 X4 X5 X6 X7 X8
# 1 1 NA NA NA NA NA NA NA
# 2 1 2 NA NA NA NA NA NA
# 3 1 2 3 NA NA NA NA NA
# 4 1 2 3 4 NA NA NA NA
# 5 1 2 3 4 5 NA NA NA
# 6 1 2 3 4 5 6 NA NA
# 7 1 2 3 4 5 6 7 NA
# 8 1 2 3 4 5 6 7 8

Read.table isn't interpreting the number of columns in a table correctly

I'll use data that was temporarily available in your question:

txt <- "column1      column122   column3   column4   column5   column6 
27013443 10.33132 6.622399e-10 2701000 10.33132 6.622399e-10
column12 column123 column44 column55 column67 column18
10.33132 6.622399e-10 0 2.3 0 1.1 "

spl <- strsplit(txt, "[\n\r]+")[[1]]
ind1 <- seq(1, length(spl), by = 2)
ind2 <- seq(2, length(spl), by = 2)
out <- read.table(header = TRUE, text = c(
paste(spl[ind1], collapse = " "),
paste(spl[ind2], collapse = " ")
))
out
# column1 column122 column3 column4 column5 column6 column12 column123 column44 column55 column67 column18
# 1 27013443 10.33132 6.622399e-10 2701000 10.33132 6.622399e-10 10.33132 6.622399e-10 0 2.3 0 1.1

If you're having a problem with the amount of blank-space between each column, then you can preemptively reduce those gaps, converting into a CSV:

txt2 <- gsub("[[:space:]]+", ",", txt)
spl2 <- strsplit(txt2, "[\n\r]+")[[1]]
ind1 <- seq(1, length(spl), by = 2)
ind2 <- seq(2, length(spl), by = 2)
out2 <- read.csv(text = c(
paste(spl2[ind1], collapse = " "),
paste(spl2[ind2], collapse = " ")
))
out
# column1 column122 column3 column4 column5 column6 column12 column123 column44 column55 column67 column18
# 1 27013443 10.33132 6.622399e-10 2701000 10.33132 6.622399e-10 10.33132 6.622399e-10 0 2.3 0 1.1

Read only select columns with read.table when number of columns is unknown

It is easy to know how many columns you have if you know your separator. You can use a construct such as this for each file:

my.read.table <- function (file, sep=",", colClasses3=rep('double', 3), ...) {

first.line <- readLines(file, n=1)

## Split the first line on the separator.

ncols <- length(strsplit(first.line, sep, fixed=TRUE)[[1]])
## fixed=TRUE is to avoid the need to escape the separator when splitting.

out <- read.table(file, sep=sep,
colClasses=c(colClasses3, rep("NULL", ncols - 3)), ...)

out
}

And then use your solution:

lapply(files, my.read.table, skip=19, header=TRUE)

Also, note that you will have to worry about whether you have rownames and colnames in your file or not because of some intelligence that read.table applies when rownames and colnames are present. The above solution is written assuming none. Please read about colClasses in ?read.table to tweak this further to suit your needs.

Separate a column of a dataframe in undefined number of columns with R/tidyverse

You can first count the number of columns it can take and then use separate.

nmax <- max(stringr::str_count(df$x, "\\.")) + 1
tidyr::separate(df, x, paste0("col", seq_len(nmax)), sep = "\\.", fill = "right")

# col1 col2 col3
#1 a <NA> <NA>
#2 a b <NA>
#3 a b c
#4 a b d
#5 a d <NA>

How can you read a CSV file in R with different number of columns

Deep in the ?read.table documentation there is the following:

The number of data columns is determined by looking at the first five
lines of input (or the whole file if it has less than five lines), or
from the length of col.names if it is specified and is longer. This
could conceivably be wrong if fill or blank.lines.skip are true, so
specify col.names if necessary (as in the ‘Examples’).

Therefore, let's define col.names to be length X (where X is the max number of fields in your dataset), and set fill = TRUE:

dat <- textConnection("12223, University
12227, bridge, Sky
12828, Sunset
13801, Ground
14853, Tranceamerica
14854, San Francisco
15595, shibuya, Shrine
16126, fog, San Francisco
16520, California, ocean, summer, golden gate, beach, San Francisco")

read.table(dat, header = FALSE, sep = ",",
col.names = paste0("V",seq_len(7)), fill = TRUE)

V1 V2 V3 V4 V5 V6 V7
1 12223 University
2 12227 bridge Sky
3 12828 Sunset
4 13801 Ground
5 14853 Tranceamerica
6 14854 San Francisco
7 15595 shibuya Shrine
8 16126 fog San Francisco
9 16520 California ocean summer golden gate beach San Francisco

If the maximum number of fields is unknown, you can use the nifty utility function count.fields (which I found in the read.table example code):

count.fields(dat, sep = ',')
# [1] 2 3 2 2 2 2 3 3 7
max(count.fields(dat, sep = ','))
# [1] 7

Possibly helpful related reading: Only read limited number of columns in R

Is there a way to coerce R into reading a table with a specific number of columns, so that it fills all columns?

I am not exactly sure how many rows/column you expect in the table but you can try either

data.table::fread("https://www.physics.mcmaster.ca/~harris/GCS_table.txt",
header = TRUE,skip = 36)

Or

read.table("https://www.physics.mcmaster.ca/~harris/GCS_table.txt",
header = TRUE,skip = 36, fill = TRUE)


Related Topics



Leave a reply



Submit