Skip specific rows using read.csv in R
One way to do this is using two read.csv
commands, the first one reads the headers and the second one the data:
headers = read.csv(file, skip = 1, header = F, nrows = 1, as.is = T)
df = read.csv(file, skip = 3, header = F)
colnames(df)= headers
I've created the following text file to test this:
do not read
a,b,c
previous line are headers
1,2,3
4,5,6
The result is:
> df
a b c
1 1 2 3
2 4 5 6
Is there a way to open .csv in R skipping the first X rows, where X is variable based on where specified headers can be found?
Since you know that Date and Time
appears in the header try this:
library(data.table)
fread(filename, skip = "Date and Time")
See ?fread
for additional arguments which you may or may not need.
skip some rows in read.csv in R
It is possible using sqldf package, using read.csv.sql
Lets say the contents of sample.csv
looks like this:
id,name,age
1,"a",23
2,"b",24
3,"c",23
Now to read only rows where age=23:
require(sqldf)
df <- read.csv.sql("sample.csv", "select * from file where age=23")
df
id name age
1 1 "a" 23
2 3 "c" 23
It is possible to select necessary columns:
df <- read.csv.sql("sample.csv", "select id, name from file where age=23")
df
id name
1 1 "a"
2 3 "c"
Skipping rows starting with specific values while importing a CSV file into R using FREAD
You can read the data with read.csv
with fill = TRUE
, keep only those rows that have data in date format in date
column so values like '<<<<<<< HEAD'
or '======='
are removed and use type_convert
to change them in respective types.
data <- read.csv('https://raw.githubusercontent.com/RamiKrispin/coronavirus/master/csv/coronavirus.csv', fill = TRUE)
data <- data[grepl('\\d+-\\d+-\\d+', data$date), ]
data <- readr::type_convert(data)
data
# date province country lat long type cases
# <date> <chr> <chr> <dbl> <dbl> <chr> <int>
# 1 2020-01-22 NA Afghanistan 33.9 67.7 confirmed 0
# 2 2020-01-23 NA Afghanistan 33.9 67.7 confirmed 0
# 3 2020-01-24 NA Afghanistan 33.9 67.7 confirmed 0
# 4 2020-01-25 NA Afghanistan 33.9 67.7 confirmed 0
# 5 2020-01-26 NA Afghanistan 33.9 67.7 confirmed 0
# 6 2020-01-27 NA Afghanistan 33.9 67.7 confirmed 0
# 7 2020-01-28 NA Afghanistan 33.9 67.7 confirmed 0
# 8 2020-01-29 NA Afghanistan 33.9 67.7 confirmed 0
# 9 2020-01-30 NA Afghanistan 33.9 67.7 confirmed 0
#10 2020-01-31 NA Afghanistan 33.9 67.7 confirmed 0
# … with 287,772 more rows
and with data.table::fread
you can use blank.lines.skip=TRUE
.
data <- data.table::fread('https://raw.githubusercontent.com/RamiKrispin/coronavirus/master/csv/coronavirus.csv', blank.lines.skip=TRUE)
Skipping specific rows and columns in R
You can "skip" as many rows using negative values, i.e.
Df=(read.csv(“IMDB_data.csv”, header=T, sep=",")[-c(2,3,5:9),])
Similar for columns:
Df=(read.csv(“IMDB_data.csv”, header=T, sep=",")[, -c(2,4)])
To skip rows and columns
Df=(read.csv(“IMDB_data.csv”, header=T, sep=",")[-c(2,3,5:9), -c(2,4)])
Skipping last N rows with lapply and then read.csv
Something like this should put you on the right track. This reads the files first, then removes last 5 rows, and finally binds them together. Would also suggest not to use variable names that might conflict with function names. files
and c
are functions in base R. Here, I am using all_files
instead of files
. -
all_files <- list.files(path = "./savedfiles", full.names = TRUE)
do.call(rbind, # assuming columns match 1:1; use dplyr::bind_rows() if not 1:1
lapply(all_files, function(x) {
head(read.csv(x, header = T, stringsAsFactors = F), -5) # change as per needs
})
)
Skip 1st n rows of csv file and read the next line as columns
Here is an example for reference.
read.csv(file="/temp/abc.txt", skip = 6, header = T, as.is = T)
Result:
Type Data month year date logs status
<int> <chr> <int> <int> <chr> <chr> <chr>
1 car 2 2022 12/2 done success
Related Topics
Format Text Inside R Code Chunk
Convert List to Data Frame While Keeping List-Element Names
R Grep Pattern Regex with Brackets
Calculate Monthly Average of Ts Object
Dictionary() Is Not Supported Anymore in Tm Package. How to Emend Code
Adding an Repeated Index for Factors in Data Frame
Continuous Colour of Geom_Line According to Y Value
Update a Dataset After Putting a New Value in the Dt::Datatable
Recommended Way to Initialize Js Renderer in 'Asis' R Markdown Chunk
Add Text on Right of Shinydashboard Header
Using R Convert Data.Frame to Simple Vector
Data.Table Package in R 3.5 Does Not Install
Find All Combinations of Numbers That Sum to a Target