Skip Some Rows in Read.CSV in R

Skip specific rows using read.csv in R

One way to do this is using two read.csv commands, the first one reads the headers and the second one the data:

headers = read.csv(file, skip = 1, header = F, nrows = 1, as.is = T)
df = read.csv(file, skip = 3, header = F)
colnames(df)= headers

I've created the following text file to test this:

do not read
a,b,c
previous line are headers
1,2,3
4,5,6

The result is:

> df
a b c
1 1 2 3
2 4 5 6

Is there a way to open .csv in R skipping the first X rows, where X is variable based on where specified headers can be found?

Since you know that Date and Time appears in the header try this:

library(data.table)
fread(filename, skip = "Date and Time")

See ?fread for additional arguments which you may or may not need.

skip some rows in read.csv in R

It is possible using sqldf package, using read.csv.sql

Lets say the contents of sample.csv looks like this:

id,name,age
1,"a",23
2,"b",24
3,"c",23

Now to read only rows where age=23:

require(sqldf)

df <- read.csv.sql("sample.csv", "select * from file where age=23")

df
id name age
1 1 "a" 23
2 3 "c" 23

It is possible to select necessary columns:

df <- read.csv.sql("sample.csv", "select id, name from file where age=23")
df
id name
1 1 "a"
2 3 "c"

Skipping rows starting with specific values while importing a CSV file into R using FREAD

You can read the data with read.csv with fill = TRUE, keep only those rows that have data in date format in date column so values like '<<<<<<< HEAD' or '=======' are removed and use type_convert to change them in respective types.

data <- read.csv('https://raw.githubusercontent.com/RamiKrispin/coronavirus/master/csv/coronavirus.csv', fill = TRUE)
data <- data[grepl('\\d+-\\d+-\\d+', data$date), ]
data <- readr::type_convert(data)
data

# date province country lat long type cases
# <date> <chr> <chr> <dbl> <dbl> <chr> <int>
# 1 2020-01-22 NA Afghanistan 33.9 67.7 confirmed 0
# 2 2020-01-23 NA Afghanistan 33.9 67.7 confirmed 0
# 3 2020-01-24 NA Afghanistan 33.9 67.7 confirmed 0
# 4 2020-01-25 NA Afghanistan 33.9 67.7 confirmed 0
# 5 2020-01-26 NA Afghanistan 33.9 67.7 confirmed 0
# 6 2020-01-27 NA Afghanistan 33.9 67.7 confirmed 0
# 7 2020-01-28 NA Afghanistan 33.9 67.7 confirmed 0
# 8 2020-01-29 NA Afghanistan 33.9 67.7 confirmed 0
# 9 2020-01-30 NA Afghanistan 33.9 67.7 confirmed 0
#10 2020-01-31 NA Afghanistan 33.9 67.7 confirmed 0
# … with 287,772 more rows

and with data.table::fread you can use blank.lines.skip=TRUE.

data <- data.table::fread('https://raw.githubusercontent.com/RamiKrispin/coronavirus/master/csv/coronavirus.csv', blank.lines.skip=TRUE)

Skipping specific rows and columns in R

You can "skip" as many rows using negative values, i.e.

Df=(read.csv(“IMDB_data.csv”, header=T, sep=",")[-c(2,3,5:9),])

Similar for columns:

Df=(read.csv(“IMDB_data.csv”, header=T, sep=",")[, -c(2,4)])

To skip rows and columns

Df=(read.csv(“IMDB_data.csv”, header=T, sep=",")[-c(2,3,5:9), -c(2,4)])

Skipping last N rows with lapply and then read.csv

Something like this should put you on the right track. This reads the files first, then removes last 5 rows, and finally binds them together. Would also suggest not to use variable names that might conflict with function names. files and c are functions in base R. Here, I am using all_files instead of files. -

all_files <- list.files(path = "./savedfiles", full.names = TRUE)

do.call(rbind, # assuming columns match 1:1; use dplyr::bind_rows() if not 1:1
lapply(all_files, function(x) {
head(read.csv(x, header = T, stringsAsFactors = F), -5) # change as per needs
})
)

Skip 1st n rows of csv file and read the next line as columns

Here is an example for reference.

read.csv(file="/temp/abc.txt", skip = 6, header = T, as.is = T)

Result:

Type    Data    month   year    date    logs    status
<int> <chr> <int> <int> <chr> <chr> <chr>
1 car 2 2022 12/2 done success


Related Topics



Leave a reply



Submit