How to Get the Number of Rows in a CSV File Without Opening It

Best way to find out number of rows in csv without loading the full thing

Assuming there are no quoted strings (with newlines in them) or other shenanigans in your CSV file an accurate (but hacky) solution is to not even parse the file but simply count the number of newlines in the file:

chunk = 1024*1024   # Process 1 MB at a time.
f = np.memmap("test.csv")
num_newlines = sum(np.sum(f[i:i+chunk] == ord('\n'))
for i in range(0, len(f), chunk))
del f

How to obtain the total numbers of rows from a CSV file in Python?

You need to count the number of rows:

row_count = sum(1 for row in fileObject)  # fileObject is your csv.reader

Using sum() with a generator expression makes for an efficient counter, avoiding storing the whole file in memory.

If you already read 2 rows to start with, then you need to add those 2 rows to your total; rows that have already been read are not being counted.

How to know CSV line count before loading in python?

yes, you can:

lines = sum(1 for line in open('/path/to/file.csv'))

but be aware that Pandas will read the whole file again

if you are sure that the whole file will fit into memory we can do this:

with open('/path/to/file.csv') as f:
data = f.readlines()
lines = len(data)
df = pd.read_csv(data, ...)

Row count in a csv file

with open(adresse,"r") as f:
reader = csv.reader(f,delimiter = ",")
data = list(reader)
row_count = len(data)

You are trying to read the file twice, when the file pointer has already reached the end of file after saving the data list.

Is there a way to get rows from a CSV without loading the whole file?

Apparently your file has grown that big that Excel (and according to your comment also Python) can't handle that file anymore.

So I would suggest you another approach: commandline tools.

  1. You can start working with Powershell: batch processing is quite difficult and the possibilities are quite limited.
  2. Another approach are the UNIX-like commandline tools. You can achieve those, either installing Cygwin on your PC (UNIX commands, rewritten as Windows (commandline) programs) or WSL (Windows Subsystem for Linux), which means that you install a Linux app on your PC.

On my PC, I have WSL, and I have tried to count the amount of entries in a 4Gb CSV file, and I got the answer within 2-3 seconds (without eating my RAM memory).

The feature you were talking about (showing some random rows), I just showed the 100,000th line as follows:

tail -n 100000 test.csv | head -n 1

Time consumption: almost immediate

RAM memory consumption: negligible.

Counting rows with fread without reading the whole file

1) count.fields Not sure if count.fields reads the whole file into R at once. Try it to see if it works.

length(count.fields("myfile.csv", sep = ","))

If the file has a header subtract one from the above.

2) sqldf Another possibility is:

library(sqldf)
read.csv.sql("myfile.csv", sep = ",", sql = "select count(*) from file")

You may need other arguments as well depending on header, etc. Note that this does not read the file into R at all -- only into sqlite.

3) wc Use the system command wc which should be available on all platforms that R runs on.

shell("wc -l myfile.csv", intern = TRUE)

or to directly get the number of lines in the file

read.table(pipe("wc -l myfile.csv"))[[1]]

or

read.table(text = shell("wc -l myfile.csv", intern = TRUE))[[1]]

Again, if there is a header subtract one.

If you are on Windows be sure that Rtools is installed and use this:

read.table(pipe("C:\\Rtools\\bin\\wc -l myfile.csv"))[[1]]

Alternately on Windows without Rtools try this:

read.table(pipe('find /v /c "" myfile.csv'))[[3]]

See How to count no of lines in text file and store the value into a variable using batch script?



Related Topics



Leave a reply



Submit