Skip specific rows using read.csv in R
One way to do this is using two read.csv
commands, the first one reads the headers and the second one the data:
headers = read.csv(file, skip = 1, header = F, nrows = 1, as.is = T)
df = read.csv(file, skip = 3, header = F)
colnames(df)= headers
I've created the following text file to test this:
do not read
a,b,c
previous line are headers
1,2,3
4,5,6
The result is:
> df
a b c
1 1 2 3
2 4 5 6
Is there a way to open .csv in R skipping the first X rows, where X is variable based on where specified headers can be found?
Since you know that Date and Time
appears in the header try this:
library(data.table)
fread(filename, skip = "Date and Time")
See ?fread
for additional arguments which you may or may not need.
Skipping rows starting with specific values while importing a CSV file into R using FREAD
You can read the data with read.csv
with fill = TRUE
, keep only those rows that have data in date format in date
column so values like '<<<<<<< HEAD'
or '======='
are removed and use type_convert
to change them in respective types.
data <- read.csv('https://raw.githubusercontent.com/RamiKrispin/coronavirus/master/csv/coronavirus.csv', fill = TRUE)
data <- data[grepl('\\d+-\\d+-\\d+', data$date), ]
data <- readr::type_convert(data)
data
# date province country lat long type cases
# <date> <chr> <chr> <dbl> <dbl> <chr> <int>
# 1 2020-01-22 NA Afghanistan 33.9 67.7 confirmed 0
# 2 2020-01-23 NA Afghanistan 33.9 67.7 confirmed 0
# 3 2020-01-24 NA Afghanistan 33.9 67.7 confirmed 0
# 4 2020-01-25 NA Afghanistan 33.9 67.7 confirmed 0
# 5 2020-01-26 NA Afghanistan 33.9 67.7 confirmed 0
# 6 2020-01-27 NA Afghanistan 33.9 67.7 confirmed 0
# 7 2020-01-28 NA Afghanistan 33.9 67.7 confirmed 0
# 8 2020-01-29 NA Afghanistan 33.9 67.7 confirmed 0
# 9 2020-01-30 NA Afghanistan 33.9 67.7 confirmed 0
#10 2020-01-31 NA Afghanistan 33.9 67.7 confirmed 0
# … with 287,772 more rows
and with data.table::fread
you can use blank.lines.skip=TRUE
.
data <- data.table::fread('https://raw.githubusercontent.com/RamiKrispin/coronavirus/master/csv/coronavirus.csv', blank.lines.skip=TRUE)
Skipping specific rows and columns in R
You can "skip" as many rows using negative values, i.e.
Df=(read.csv(“IMDB_data.csv”, header=T, sep=",")[-c(2,3,5:9),])
Similar for columns:
Df=(read.csv(“IMDB_data.csv”, header=T, sep=",")[, -c(2,4)])
To skip rows and columns
Df=(read.csv(“IMDB_data.csv”, header=T, sep=",")[-c(2,3,5:9), -c(2,4)])
skip some rows in read.csv in R
It is possible using sqldf package, using read.csv.sql
Lets say the contents of sample.csv
looks like this:
id,name,age
1,"a",23
2,"b",24
3,"c",23
Now to read only rows where age=23:
require(sqldf)
df <- read.csv.sql("sample.csv", "select * from file where age=23")
df
id name age
1 1 "a" 23
2 3 "c" 23
It is possible to select necessary columns:
df <- read.csv.sql("sample.csv", "select id, name from file where age=23")
df
id name
1 1 "a"
2 3 "c"
read.csv, header on first line, skip second line
This should do the trick:
all_content = readLines("file.csv")
skip_second = all_content[-2]
dat = read.csv(textConnection(skip_second), header = TRUE, stringsAsFactors = FALSE)
The first step using readLines
reads the entire file into a list, where each item in the list represents a line in the file. Next, you discard the second line using the fact that negative indexing in R means select all but this index
. Finally, we feed this data to read.csv
to process it into a data.frame
.
Related Topics
How to Conditionally Replace Values in R Data Frame Using If/Then Statement
Using R to "Click" a Download File Button on a Webpage
Replace Character at Certain Location Within String
How to Iterate Over List of Dates Without Coercion to Numeric
Simple Method of Counting Non-Nas in Column of Data String
How to Rotate Legend Symbols in Ggplot2
Easier Way to Plot the Cumulative Frequency Distribution in Ggplot
Adding Legend to Ggplot When Lines Were Added Manually
How to Create Base R Plot 'Type = B' Equivalent in Ggplot2
Group by in R, Ddply with Weighted.Mean
Extract Standard Errors from Glm
How to Plot Ellipse Given a General Equation in R
R: Generate All Permutations of Vector Without Duplicated Elements
How to Insert (Add) a Row to a SQLite Db Table Using Dplyr Package
Using Annotate to Add Different Annotations to Different Facets