Skip Specific Rows Using Read.CSV in R

Skip specific rows using read.csv in R

One way to do this is using two read.csv commands, the first one reads the headers and the second one the data:

headers = read.csv(file, skip = 1, header = F, nrows = 1, as.is = T)
df = read.csv(file, skip = 3, header = F)
colnames(df)= headers

I've created the following text file to test this:

do not read
a,b,c
previous line are headers
1,2,3
4,5,6

The result is:

> df
a b c
1 1 2 3
2 4 5 6

Is there a way to open .csv in R skipping the first X rows, where X is variable based on where specified headers can be found?

Since you know that Date and Time appears in the header try this:

library(data.table)
fread(filename, skip = "Date and Time")

See ?fread for additional arguments which you may or may not need.

Skipping rows starting with specific values while importing a CSV file into R using FREAD

You can read the data with read.csv with fill = TRUE, keep only those rows that have data in date format in date column so values like '<<<<<<< HEAD' or '=======' are removed and use type_convert to change them in respective types.

data <- read.csv('https://raw.githubusercontent.com/RamiKrispin/coronavirus/master/csv/coronavirus.csv', fill = TRUE)
data <- data[grepl('\\d+-\\d+-\\d+', data$date), ]
data <- readr::type_convert(data)
data

# date province country lat long type cases
# <date> <chr> <chr> <dbl> <dbl> <chr> <int>
# 1 2020-01-22 NA Afghanistan 33.9 67.7 confirmed 0
# 2 2020-01-23 NA Afghanistan 33.9 67.7 confirmed 0
# 3 2020-01-24 NA Afghanistan 33.9 67.7 confirmed 0
# 4 2020-01-25 NA Afghanistan 33.9 67.7 confirmed 0
# 5 2020-01-26 NA Afghanistan 33.9 67.7 confirmed 0
# 6 2020-01-27 NA Afghanistan 33.9 67.7 confirmed 0
# 7 2020-01-28 NA Afghanistan 33.9 67.7 confirmed 0
# 8 2020-01-29 NA Afghanistan 33.9 67.7 confirmed 0
# 9 2020-01-30 NA Afghanistan 33.9 67.7 confirmed 0
#10 2020-01-31 NA Afghanistan 33.9 67.7 confirmed 0
# … with 287,772 more rows

and with data.table::fread you can use blank.lines.skip=TRUE.

data <- data.table::fread('https://raw.githubusercontent.com/RamiKrispin/coronavirus/master/csv/coronavirus.csv', blank.lines.skip=TRUE)

Skipping specific rows and columns in R

You can "skip" as many rows using negative values, i.e.

Df=(read.csv(“IMDB_data.csv”, header=T, sep=",")[-c(2,3,5:9),])

Similar for columns:

Df=(read.csv(“IMDB_data.csv”, header=T, sep=",")[, -c(2,4)])

To skip rows and columns

Df=(read.csv(“IMDB_data.csv”, header=T, sep=",")[-c(2,3,5:9), -c(2,4)])

skip some rows in read.csv in R

It is possible using sqldf package, using read.csv.sql

Lets say the contents of sample.csv looks like this:

id,name,age
1,"a",23
2,"b",24
3,"c",23

Now to read only rows where age=23:

require(sqldf)

df <- read.csv.sql("sample.csv", "select * from file where age=23")

df
id name age
1 1 "a" 23
2 3 "c" 23

It is possible to select necessary columns:

df <- read.csv.sql("sample.csv", "select id, name from file where age=23")
df
id name
1 1 "a"
2 3 "c"

read.csv, header on first line, skip second line

This should do the trick:

all_content = readLines("file.csv")
skip_second = all_content[-2]
dat = read.csv(textConnection(skip_second), header = TRUE, stringsAsFactors = FALSE)

The first step using readLines reads the entire file into a list, where each item in the list represents a line in the file. Next, you discard the second line using the fact that negative indexing in R means select all but this index. Finally, we feed this data to read.csv to process it into a data.frame.



Related Topics



Leave a reply



Submit