Technique for Finding Bad Data in Read.CSV in R

data cleaning with read.csv

While the ultimate form you'll need will be very dependent on the particulars of the file in question, for what you've presented here you can hack the data out without too much insanity:

library(tidyverse)

df <- read_csv2(file, col_names = FALSE) %>% 
    filter(rowSums(!is.na(.)) > 0) %>% 
    magrittr::set_rownames(.[[1]]) %>% 
    select(-1) %>% 
    t() %>% 
    as_data_frame() %>% 
    type_convert(col_types = cols(Date = col_date('%d.%m.%Y')), 
                 locale = locale(decimal_mark = ','))

df
#> # A tibble: 3 x 12
#>    Name Correction       Date     Time `T_int [ms]` `Ev [lx]`
#>   <chr>      <chr>     <date>   <time>        <int>     <dbl>
#> 1    #1       <NA> 2016-09-19 12:05:03          806      1310
#> 2    #2       <NA> 2016-09-19 12:06:01          800      1350
#> 3    #3       <NA> 2016-09-19 12:07:00          884      1270
#> # ... with 6 more variables: `Ee [W/sqm] (380-780nm)` <dbl>, `Chrom.
#> #   Coord.` <chr>, x <dbl>, y <dbl>, `u'` <dbl>, `v'` <dbl>

Data

file <- ";;;
;;;
;;;
Name;#1;#2;#3
Correction;;;          
Date;19.09.2016;19.09.2016;19.09.2016
Time;12:05:03;12:06:01;12:07:00
T_int [ms];806;800;884
Ev [lx];1,31E+03;1,35E+03;1,27E+03
Ee [W/sqm] (380-780nm);4,22E+00;4,38E+00;4,17E+00
;;;
;;;
Chrom. Coord.;;;           
x;0,3657;0,3642;0,3643
y;0,3842;0,3831;0,3833
u';0,2126;0,2121;0,2121
v';0,5026;0,502;0,5021
;;;"

In R, how to read a special csv with some rows skipped the first value?

Updated

you can use the na.strings parameter to replace the empty dates ("") with missing values (NA),

data = read.csv(your_file, header = TRUE, na.strings = c(""))

then,

data$Date = as.Date(data$Date)
data$Date = zoo::na.locf(data$Date)

to fill the missing values.

However, credit to @Taran, who commented your initial question, as I wasn't aware of the zoo::na.locf function.

Reading broken CSV lines from R

There are probably several ways to do this..

UPDATE: Try this then. With the skip=argument in scan()you can specify how many rows to skip.


file <- scan("C:/Users/skupfer/Documents/bisher.txt", strip.white = TRUE, sep = ",",
             what = list("character"), skip = 1)

file_mat <- matrix(file[[1]][file[[1]] != ""], ncol = 5, byrow = TRUE)

file_df <- as.data.frame(file_mat, stringsAsFactors = FALSE)

file_df$Quantity <- as.integer(file_mat[,3])

> file_df
  Product     Date Quantity Categorie sector
1     ABC 01052019     4510      Food    Dry
2     CDE 01052019      222     Drink   Cold
3     FGH 01052019      345      Food    Dry
4     IJK 01052019      234      Food   Cold

Reading in Poor CSV File Structure

Using panda.read_csv and regex negative look ahead. The same regex should work in R as well.

import pandas as pd

df = pd.read_csv(filename, sep=r',(?!\s)')

Filter df for rows in which LOC has a comma, to verify that we've parsed correctly:

df[df.LOC.str.contains(',')]

Sample Image

Error reading csv as a zoo object - certain lines with 'bad entries'

Your csv has empty values. You can fill with NAs and then turn into a zoo object. You could try this:

x<- read.csv("OakParkR.csv", header=TRUE)
na.fill(x,NA)
x<- zoo(x)
x[33:35]
#date      imax Tmax imin Tmin  irain rain cbl    wdsp ihm hm iddhm ddhm ihg hg soil  
#33 02-Feb-07 0     9.1 0     -1.7 0      0.1 1026.2  3.9 0   10 0     340  0   14  5.970
#34 03-Feb-07 0     9.2 0     -3.0 0      0.0 <NA>    2.4 0    7 0     130  0   11  3.101
#35 04-Feb-07 0     7.7 0     -3.7 0      0.0 1031.8  3.3 0    8 0     330  0   12  2.668