How can you read a CSV file in R with different number of columns
Deep in the ?read.table
documentation there is the following:
The number of data columns is determined by looking at the first five
lines of input (or the whole file if it has less than five lines), or
from the length ofcol.names
if it is specified and is longer. This
could conceivably be wrong iffill
orblank.lines.skip are true
, so
specifycol.names
if necessary (as in the ‘Examples’).
Therefore, let's define col.names
to be length X (where X is the max number of fields in your dataset), and set fill = TRUE
:
dat <- textConnection("12223, University
12227, bridge, Sky
12828, Sunset
13801, Ground
14853, Tranceamerica
14854, San Francisco
15595, shibuya, Shrine
16126, fog, San Francisco
16520, California, ocean, summer, golden gate, beach, San Francisco")
read.table(dat, header = FALSE, sep = ",",
col.names = paste0("V",seq_len(7)), fill = TRUE)
V1 V2 V3 V4 V5 V6 V7
1 12223 University
2 12227 bridge Sky
3 12828 Sunset
4 13801 Ground
5 14853 Tranceamerica
6 14854 San Francisco
7 15595 shibuya Shrine
8 16126 fog San Francisco
9 16520 California ocean summer golden gate beach San Francisco
If the maximum number of fields is unknown, you can use the nifty utility function count.fields
(which I found in the read.table
example code):
count.fields(dat, sep = ',')
# [1] 2 3 2 2 2 2 3 3 7
max(count.fields(dat, sep = ','))
# [1] 7
Possibly helpful related reading: Only read limited number of columns in R
write and read.csv different number of columns
This is probably related to the following in ?read.csv
:
The number of data columns is determined by looking at the first five
lines of input (or the whole file if it has less than five lines), or
from the length of col.names if it is specified and is longer. This
could conceivably be wrong if fill or blank.lines.skip are true, so
specify col.names if necessary (as in the ‘Examples’).
It just happens that the row with the most number of columns is the sixth row in your first example.
I suggest using col.names
to get around this, e.g.:
`... read.csv(..., col.names = paste0('V', 1:6))`
As the OP notes in a comment to this answer, you can find out the number of
columns required using readLines
:
Ncol <- max(unlist(lapply(strsplit(readLines(datfile), ","), length)))
and then modify the above to give:
read.csv(datfile,header=F,colClasses="character", col.names=paste0("V", 1:Ncol))
Reading in a .csv with multiple data frames / Different number of columns
df <- read.delim(file.choose(),header=F,sep=";",fill=TRUE) # choose x.csv from you PC.
file.choose() opens up a dialog box for selecting the input file. Hope this helped.Combine some csv files into one - different number of columns
Your questions seems to contain multiple subquestions. I encourage you to separate them.
The first thing you apparently need is to combine data frames with different columns. You can use rbind.fill
from the plyr
package:
library(plyr)
all_data = do.call(rbind.fill, list_of_data)
Related Topics
How to Efficiently Calculate Distance Between Pair of Coordinates Using Data.Table :=
Create Group Number For Contiguous Runs of Equal Values
How to Unload a Package Without Restarting R
What Does %≫% Function Mean in R
Dplyr Mutate/Replace Several Columns on a Subset of Rows
Subscript Out of Bounds - General Definition and Solution
How to Send an Email With Attachment from R in Windows
Check If the Number Is Integer
How to Make Consistent-Width Plots in Ggplot (With Legends)
How to Get a Vertical Geom_Vline to an X-Axis of Class Date
Change Bar Plot Colour in Geom_Bar With Ggplot2 in R
How to Subtract/Add Days From/To a Date
Plot Multiple Lines in One Graph
Method to Extract Stat_Smooth Line Fit
Plot Multiple Lines (Data Series) Each With Unique Color in R