Read specific, non-consecutive rows using data.table::fread (equivalent to the “Select” argument, but for rows)?
One method (although a little brute-force) is to use sed
to cut the lines.
Recall that fread
takes file=
as well as cmd=
, as in
library(data.table)
fwrite(iris, "iris.csv")
fread(cmd = "head -n 3 iris.csv")
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1: 5.1 3.5 1.4 0.2 setosa
# 2: 4.9 3.0 1.4 0.2 setosa
(Two rows since head
doesn't know/care about the header row.)
Try this:
want_rows <- c(1, 3, 147:149)
# due to the header row, add the header and 1 to each of want
paste0(c(1, 1+want_rows), "p")
# [1] "1p" "2p" "4p" "148p" "149p" "150p"
writeLines(paste0(c(1, 1+want_rows), "p"), "commands.sed")
fread(cmd = "sed -n -f commands.sed iris.csv")
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1: 5.1 3.5 1.4 0.2 setosa
# 2: 4.7 3.2 1.3 0.2 setosa
# 3: 6.3 2.5 5.0 1.9 virginica
# 4: 6.5 3.0 5.2 2.0 virginica
# 5: 6.2 3.4 5.4 2.3 virginica
iris[want_rows,]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 147 6.3 2.5 5.0 1.9 virginica
# 148 6.5 3.0 5.2 2.0 virginica
# 149 6.2 3.4 5.4 2.3 virginica
If you have significant "ranges", then you could optimize this a little for sed
, to have an effective command line of sed -ne '1p;2p;4p;148,150p'
for the same effect.
There is another method ala "every so many rows" listed here: https://www.thegeekstuff.com/2009/09/unix-sed-tutorial-printing-file-lines-using-address-and-patterns/. I don't know how tightly you can control this (every nth row starting from some arbitrary number, for instance). I don't know that this is your intent or need, though, it sounds like will have arbitrary line numbers.
data.table fread, how to read in csv file last N rows
You can have fread
read the last n rows of a .csv using cmd
param
fread(cmd="tail -10 iris.csv")
If you want the header information also, you can wrap the above in a call to data.table::setnames
, where the names are obtained by reading the file without any rows (nrows=0
):
setnames(
fread(cmd="tail -10 iris.csv"),
names(fread("iris.csv",nrows = 0))
)
If you have full path, you can do this:
fpath = "C:/Users/xxxx/Desktop/folder 1/iris.csv"
setnames(
fread(cmd=paste0("tail -10 '", fpath, "'")),
names(fread(fpath,nrows = 0))
)[]
How can I use do.call() to read specific rows?
There are several strategies on how to tackle this. You can import all the data into a list using lapply
and then from each list element filter out based on your filter. You would use data.table::rbindlist
to make the final data.table. Another one would be to do this in one step, e.g. (not tested, obviously)
library(data.table)
files = list.files(path = "G:/SafeGraph201708MidWest",
pattern = "*.csv",
recursive = TRUE,
full.names = TRUE)
xy <- lapply(files, FUN = function(x) {
out <- fread(x)
out <- filter(out, longitude >= -86.97 & longitude <= -86.78,
latitude >= 40.35 & latitude <= 40.49)
out
})
xy <- rbindlist(xy)
Using fread() to select rows and columns, the way read.csv.sql() does
It is not possible to select rows with fread()
as with read.csv.sql()
yet. But it is still better to read the entire data (memory permitting) and then subset it as per your criteria. For a 200 mb file, fread()
+ subset()
gave ~ 4 times better performance than read.csv.sql()
.
So, using @Arun's suggestion,
ans = rbindlist(lapply(files, function(x) fread(x)[, fn := x]))
subset(ans, 'your criteria')
is better than the approach in the original question.
Related Topics
Correctly Color Vertices in R Igraph
Showing Equation of Nls Model with Ggpmisc
How to Declare a Thousand Separator in Read.Csv
Remove "Showing 1 to N of N Entries" Shiny Dt
Add Raster to Ggmap Base Map: Set Alpha (Transparency) and Fill Color to Inset_Raster() in Ggplot2
Extract Digit from Numeric in R
Find Matching Strings Between Two Vectors in R
Aggregating All Unique Values of Each Column of Data Frame
Format Ttest Output by R for Tex
How to Read Specific Rows of CSV File with Fread Function
Count the Number of Unique Characters in a String
Reshape Multi Id Repeated Variable Readings from Long to Wide
Error in File(File, "Rt"):Invalid 'Description' Argument in Complete.Cases Program
Any Way to Force Fread() of Data.Table Not to Stop on Empty Lines