Import data into R with an unknown number of columns?
There is nice function count.fields
(see help) which counts number of column per row:
count.fields("test", sep = "\t")
#[1] 1 2 3 4 5 6 7 8
So, using your second solution:
no_col <- max(count.fields("test", sep = "\t"))
data <- read.table("test",sep="\t",fill=TRUE,col.names=1:no_col)
data
# X1 X2 X3 X4 X5 X6 X7 X8
# 1 1 NA NA NA NA NA NA NA
# 2 1 2 NA NA NA NA NA NA
# 3 1 2 3 NA NA NA NA NA
# 4 1 2 3 4 NA NA NA NA
# 5 1 2 3 4 5 NA NA NA
# 6 1 2 3 4 5 6 NA NA
# 7 1 2 3 4 5 6 7 NA
# 8 1 2 3 4 5 6 7 8
Read.table isn't interpreting the number of columns in a table correctly
I'll use data that was temporarily available in your question:
txt <- "column1 column122 column3 column4 column5 column6
27013443 10.33132 6.622399e-10 2701000 10.33132 6.622399e-10
column12 column123 column44 column55 column67 column18
10.33132 6.622399e-10 0 2.3 0 1.1 "
spl <- strsplit(txt, "[\n\r]+")[[1]]
ind1 <- seq(1, length(spl), by = 2)
ind2 <- seq(2, length(spl), by = 2)
out <- read.table(header = TRUE, text = c(
paste(spl[ind1], collapse = " "),
paste(spl[ind2], collapse = " ")
))
out
# column1 column122 column3 column4 column5 column6 column12 column123 column44 column55 column67 column18
# 1 27013443 10.33132 6.622399e-10 2701000 10.33132 6.622399e-10 10.33132 6.622399e-10 0 2.3 0 1.1
If you're having a problem with the amount of blank-space between each column, then you can preemptively reduce those gaps, converting into a CSV:
txt2 <- gsub("[[:space:]]+", ",", txt)
spl2 <- strsplit(txt2, "[\n\r]+")[[1]]
ind1 <- seq(1, length(spl), by = 2)
ind2 <- seq(2, length(spl), by = 2)
out2 <- read.csv(text = c(
paste(spl2[ind1], collapse = " "),
paste(spl2[ind2], collapse = " ")
))
out
# column1 column122 column3 column4 column5 column6 column12 column123 column44 column55 column67 column18
# 1 27013443 10.33132 6.622399e-10 2701000 10.33132 6.622399e-10 10.33132 6.622399e-10 0 2.3 0 1.1
Read only select columns with read.table when number of columns is unknown
It is easy to know how many columns you have if you know your separator. You can use a construct such as this for each file:
my.read.table <- function (file, sep=",", colClasses3=rep('double', 3), ...) {
first.line <- readLines(file, n=1)
## Split the first line on the separator.
ncols <- length(strsplit(first.line, sep, fixed=TRUE)[[1]])
## fixed=TRUE is to avoid the need to escape the separator when splitting.
out <- read.table(file, sep=sep,
colClasses=c(colClasses3, rep("NULL", ncols - 3)), ...)
out
}
And then use your solution:
lapply(files, my.read.table, skip=19, header=TRUE)
Also, note that you will have to worry about whether you have rownames and colnames in your file or not because of some intelligence that read.table applies when rownames and colnames are present. The above solution is written assuming none. Please read about colClasses
in ?read.table
to tweak this further to suit your needs.
Separate a column of a dataframe in undefined number of columns with R/tidyverse
You can first count the number of columns it can take and then use separate
.
nmax <- max(stringr::str_count(df$x, "\\.")) + 1
tidyr::separate(df, x, paste0("col", seq_len(nmax)), sep = "\\.", fill = "right")
# col1 col2 col3
#1 a <NA> <NA>
#2 a b <NA>
#3 a b c
#4 a b d
#5 a d <NA>
How can you read a CSV file in R with different number of columns
Deep in the ?read.table
documentation there is the following:
The number of data columns is determined by looking at the first five
lines of input (or the whole file if it has less than five lines), or
from the length ofcol.names
if it is specified and is longer. This
could conceivably be wrong iffill
orblank.lines.skip are true
, so
specifycol.names
if necessary (as in the ‘Examples’).
Therefore, let's define col.names
to be length X (where X is the max number of fields in your dataset), and set fill = TRUE
:
dat <- textConnection("12223, University
12227, bridge, Sky
12828, Sunset
13801, Ground
14853, Tranceamerica
14854, San Francisco
15595, shibuya, Shrine
16126, fog, San Francisco
16520, California, ocean, summer, golden gate, beach, San Francisco")
read.table(dat, header = FALSE, sep = ",",
col.names = paste0("V",seq_len(7)), fill = TRUE)
V1 V2 V3 V4 V5 V6 V7
1 12223 University
2 12227 bridge Sky
3 12828 Sunset
4 13801 Ground
5 14853 Tranceamerica
6 14854 San Francisco
7 15595 shibuya Shrine
8 16126 fog San Francisco
9 16520 California ocean summer golden gate beach San Francisco
If the maximum number of fields is unknown, you can use the nifty utility function count.fields
(which I found in the read.table
example code):
count.fields(dat, sep = ',')
# [1] 2 3 2 2 2 2 3 3 7
max(count.fields(dat, sep = ','))
# [1] 7
Possibly helpful related reading: Only read limited number of columns in R
Is there a way to coerce R into reading a table with a specific number of columns, so that it fills all columns?
I am not exactly sure how many rows/column you expect in the table but you can try either
data.table::fread("https://www.physics.mcmaster.ca/~harris/GCS_table.txt",
header = TRUE,skip = 36)
Or
read.table("https://www.physics.mcmaster.ca/~harris/GCS_table.txt",
header = TRUE,skip = 36, fill = TRUE)
Related Topics
Multiple Ggplots of Different Sizes
Sparse Matrix to a Data Frame in R
How to Color Sliderbar (Sliderinput)
R: What Do You Call the :: and ::: Operators and How Do They Differ
Filling Missing Dates in a Grouped Time Series - a Tidyverse-Way
How to Make a Discontinuous Axis in R with Ggplot2
Changing Factor Levels with Dplyr Mutate
Set the Size of Ggsave Exactly
How to Disable "Save Workspace Image" Prompt in R
Knitr: How to Prevent Text Wrapping in Output
Standard Deviation in R Seems to Be Returning the Wrong Answer - am I Doing Something Wrong
Shift Values in Single Column of Dataframe Up
R- How to Dynamically Name Data Frames
How to Change Type of Target Column When Doing := by Group in a Data.Table in R
How to Specify a Dynamic Position for the Start of Substring