R read.csv More columns than column names error
That's one wonky CSV file. Multiple headers tossed about (try pasting it to CSV Fingerprint) to see what I mean.
Since I don't know the data, it's impossible to be sure the following produces accurate results for you, but it involves using readLines
and other R functions to pre-process the text:
# use readLines to get the data
dat <- readLines("N0_07312014.CSV")
# i had to do this to fix grep errors
Sys.setlocale('LC_ALL','C')
# filter out the repeating, and wonky headers
dat_2 <- grep("Node Name,RTC_date", dat, invert=TRUE, value=TRUE)
# turn that vector into a text connection for read.csv
dat_3 <- read.csv(textConnection(paste0(dat_2, collapse="\n")),
header=FALSE, stringsAsFactors=FALSE)
str(dat_3)
## 'data.frame': 308 obs. of 37 variables:
## $ V1 : chr "Node 0" "Node 0" "Node 0" "Node 0" ...
## $ V2 : chr "07/31/2014" "07/31/2014" "07/31/2014" "07/31/2014" ...
## $ V3 : chr "08:58:18" "08:59:22" "08:59:37" "09:00:06" ...
## $ V4 : chr "" "" "" "" ...
## .. more
## $ V36: chr "" "" "" "" ...
## $ V37: chr "0" "0" "0" "0" ...
# grab the headers
headers <- strsplit(dat[1], ",")[[1]]
# how many of them are there?
length(headers)
## [1] 32
# limit it to the 32 columns you want (Which matches)
dat_4 <- dat_3[,1:32]
# and add the headers
colnames(dat_4) <- headers
str(dat_4)
## 'data.frame': 308 obs. of 32 variables:
## $ Node Name : chr "Node 0" "Node 0" "Node 0" "Node 0" ...
## $ RTC_date : chr "07/31/2014" "07/31/2014" "07/31/2014" "07/31/2014" ...
## $ RTC_time : chr "08:58:18" "08:59:22" "08:59:37" "09:00:06" ...
## $ N1 Bat (VDC) : chr "" "" "" "" ...
## $ N1 Shinyei (ug/m3): chr "" "" "0.23" "null" ...
## $ N1 CC (ppb) : chr "" "" "null" "null" ...
## $ N1 Aeroq (ppm) : chr "" "" "null" "null" ...
## ... continues
more columns than column name on txt file
If myFile
contains the path/filename then replace each of the first 4 stretches of whitespace on every line with a comma and then re-read using read.csv
. No packages are used.
L <- readLines(myFile) ##
for(i in 1:4) L <- sub("\\s+", ",", L)
DF <- read.csv(text = L)
giving:
> DF
height Shoesize gender Location
1 181 44 male city center
4 170 43 female city center
5 172 43 female city center
13 175 42 male out of city
14 181 44 male out of city
15 180 43 male out of city
16 177 43 female out of city
17 133 41 male out of city
Note: For purposes of testing we can use this in place of the line marked ## above. (Note that SO can introduce spaces at the beginnings of the lines so we remove them.)
Lines <- " height Shoesize gender Location
1 181 44 male city center
4 170 43 female city center
5 172 43 female city center
13 175 42 male out of city
14 181 44 male out of city
15 180 43 male out of city
16 177 43 female out of city
17 133 41 male out of city"
L <- readLines(textConnection(Lines))
L[-1] <- sub("^\\s+", "", L[-1])
import csv-table into R and got multiple errors
You have to define a separator otherwise R fail to read data properly. Suppose your data structure is the following:
structure(list(month = 2:5, titles_tmp = structure(c(1L, 1L,
1L, 1L), .Label = "some text", class = "factor"), info_tmp = structure(c(1L,
1L, 1L, 1L), .Label = "More text", class = "factor"), unlist.text = structure(c(1L,
1L, 1L, 1L), .Label = "http://somelink.com", class = "factor")), .Names = c("month",
"titles_tmp", "info_tmp", "unlist.text"), class = "data.frame", row.names = c(NA,
-4L))
That means you separate each columns with single tab. Meaning you need to use sep = " "
as a data separator. Provided your data file name is "df.csv" the following should import your data nicely:
df = read.csv("Sz-Iraki2.csv", sep= " ", fileEncoding = "UTF-8")
Issues importing a csv in R
I finally found the solution!
I was going nuts; even my instructor didn't know how to fix it!
This statement works:
o<-read.csv("C:/Users/Admin/Desktop/-=Data Science=-/11-27-2018/Occ.txt", header=T, sep="\t", fileEncoding="UTF-16LE")
Like I said in my original question: I tried using fileEncoding="UTF-16LE" and it didn't help. After asking the question, I tried using sep="\t", and it didn't help. But using both of them did the trick!
Related Topics
How to Plot Multiple Lines in R
R - Error When Using Geturl from Curl After Site Was Changed
How to Suppress R Startup Message
Using: = in Data.Table with Paste()
Put Y Axis Title in Top Left Corner of Graph
Robust Standard Errors for Mixed-Effects Models in Lme4 Package of R
R: How to Count How Many Points Are in Each Cell of My Grid
Convert Latitude/Longitude to State Plane Coordinates
Ggplot2: How to Separate Geom_Polygon and Geom_Line in Legend Keys
How to Draw a Boxplot Without Specifying X Axis
Is There a General Inverse of The Table() Function
Create New Variable by Multiple Conditions via Mutate Case_When
Fill Missing Values Rowwise (Right/Left)
Change Thickness of a Marker in Ggplot2
R Plotly: Preserving Appearance of Two Legends When Converting Ggplot2 with Ggplotly