Count the Length (Number of Lines) of a CSV File

Count the length (number of lines) of a CSV file?

another way to read the number of lines is

file.readlines.size

How to obtain the total numbers of rows from a CSV file in Python?

You need to count the number of rows:

row_count = sum(1 for row in fileObject)  # fileObject is your csv.reader

Using sum() with a generator expression makes for an efficient counter, avoiding storing the whole file in memory.

If you already read 2 rows to start with, then you need to add those 2 rows to your total; rows that have already been read are not being counted.

How can I count the lines of multiple csv that are in one folder?

The pros of using count.fields is that it doesn't load the file into the memory.
Thus, it should be faster than using read.csv or another function.

Get the list of files:

 files <- list.files(path, full.names=TRUE)

Get the number of rows in each file:

lapply(X = files, FUN = function(x) {
length(count.fields(x, skip = 1))
})

Benchmark

library(rbenchmark)

benchmark("count.fields" = {
lapply(X = files, FUN = function(x) {
length(count.fields(x, skip = 1))
})
},
"read.csv" = {
lapply(X = files, FUN = function(x) {
nrow(read.csv(x, skip = 1))
})
},
"fread" = {
lapply(X = files, FUN = function(x) {
nrow(data.table::fread(x, skip = 1))
})
},
replications = 1000,
columns = c("test", "replications", "elapsed",
"relative", "user.self", "sys.self"))

test replications elapsed relative user.self sys.self
1 count.fields 1000 0.81 1.000 0.28 0.50
3 fread 1000 6.24 7.704 4.57 1.66
2 read.csv 1000 2.93 3.617 2.16 0.76

Row count in a csv file

with open(adresse,"r") as f:
reader = csv.reader(f,delimiter = ",")
data = list(reader)
row_count = len(data)

You are trying to read the file twice, when the file pointer has already reached the end of file after saving the data list.

Python 3 Count number of rows in a CSV

If you are using pandas you can easily do that, without much coding stuff.

import pandas as pd

df = pd.read_csv('filename.csv')

## Fastest would be using length of index

print("Number of rows ", len(df.index))

## If you want the column and row count then

row_count, column_count = df.shape

print("Number of rows ", row_count)
print("Number of columns ", column_count)

Rails how to get the row count from CSV fast

Eventually I found solution, CSV.read(file.path).length is faster than CSV.parse(f.read).length



Related Topics



Leave a reply



Submit