Count the Number of Lines in a File Without Reading Entire File into Memory

Count the number of lines in a file without reading entire file into memory?

If you are in a Unix environment, you can just let wc -l do the work.

It will not load the whole file into memory; since it is optimized for streaming file and count word/line the performance is good enough rather then streaming the file yourself in Ruby.

SSCCE:

filename = 'a_file/somewhere.txt'
line_count = `wc -l "#{filename}"`.strip.split(' ')[0].to_i
p line_count

Or if you want a collection of files passed on the command line:

wc_output = `wc -l "#{ARGV.join('" "')}"`
line_count = wc_output.match(/^ *([0-9]+) +total$/).captures[0].to_i
p line_count

How to get line count in a file without reading

  1. Read the size (in bytes) of the file -- the o/s will tell you this.
  2. Read the first 1000 lines (and process them).
  3. Calculate the average line size.
  4. Divide this average size into the file size.
  5. Now you have an estimate of the number of lines in the file, accurate enough for a progress bar display sort of thing.
  6. If this is not accurate enough, recompute every now and then as you read the file.

How to get Number Of Lines without Reading File To End

No. You have to read the file. Consider storing it at the beginning of the file or in a separate file when you write the file if you want to find it quickly without counting.

Note that you can stream the file, and it's surprisingly fast:

int count = File.ReadLines(path).Count();

Because i might be in some cases where i should get Total Number of Line's and compare it to Current line to display the Percentage,and just for a Percentage Display it might be stupid to read first all Content than read it Again to Display the raw text at user.

Oh, just get the file size and the length of each line in bytes and keep a cumulative count of the number of bytes processed so far.

Find number of lines in csv without reading it

Yes you need to read the whole file in memory before knowing how many lines are in it.
Just think the file to be a long long string Aaaaabbbbbbbcccccccc\ndddddd\neeeeee\n
to know how many 'lines' are in the string you need to find how many \n characters are in it.

If you want an approximate number what you can do is to read few lines (~20) and see how many characters are per lines and then from the file's size (stored in the file descriptor) get a possible estimate.

Counting rows with fread without reading the whole file

1) count.fields Not sure if count.fields reads the whole file into R at once. Try it to see if it works.

length(count.fields("myfile.csv", sep = ","))

If the file has a header subtract one from the above.

2) sqldf Another possibility is:

library(sqldf)
read.csv.sql("myfile.csv", sep = ",", sql = "select count(*) from file")

You may need other arguments as well depending on header, etc. Note that this does not read the file into R at all -- only into sqlite.

3) wc Use the system command wc which should be available on all platforms that R runs on.

shell("wc -l myfile.csv", intern = TRUE)

or to directly get the number of lines in the file

read.table(pipe("wc -l myfile.csv"))[[1]]

or

read.table(text = shell("wc -l myfile.csv", intern = TRUE))[[1]]

Again, if there is a header subtract one.

If you are on Windows be sure that Rtools is installed and use this:

read.table(pipe("C:\\Rtools\\bin\\wc -l myfile.csv"))[[1]]

Alternately on Windows without Rtools try this:

read.table(pipe('find /v /c "" myfile.csv'))[[3]]

See How to count no of lines in text file and store the value into a variable using batch script?

How to get the number of lines in a text file without opening it?

You can't count the number of lines in a file without reading it. The operating systems your code runs on do not store the number of lines as some kind of metadata. They don't even generally distinguish between binary and text files! You just have to read the file and count the newlines.

However, you can probably do this faster than you are doing it now, if your files have a large number of lines.

This line of code is what I'm worried about:

nbLines = (file.split("\n")).length;

Calling split here creates a large number of memory allocations, one for each line in the file.

My hunch is that it would be faster to count the newlines directly in a for loop:

function lineCount( text ) {
var nLines = 0;
for( var i = 0, n = text.length; i < n; ++i ) {
if( text[i] === '\n' ) {
++nLines;
}
}
return nLines;
}

This counts the newline characters without any memory allocations, and most JavaScript engines should do a good job of optimizing this code.

You may also want to adjust the final count slightly depending on whether the file ends with a newline or not, according to how you want to interpret that. But don't do that inside the loop, do it afterward.

Efficiently counting the number of lines of a text file. (200mb+)

This will use less memory, since it doesn't load the whole file into memory:

$file="largefile.txt";
$linecount = 0;
$handle = fopen($file, "r");
while(!feof($handle)){
$line = fgets($handle);
$linecount++;
}

fclose($handle);

echo $linecount;

fgets loads a single line into memory (if the second argument $length is omitted it will keep reading from the stream until it reaches the end of the line, which is what we want). This is still unlikely to be as quick as using something other than PHP, if you care about wall time as well as memory usage.

The only danger with this is if any lines are particularly long (what if you encounter a 2GB file without line breaks?). In which case you're better off doing slurping it in in chunks, and counting end-of-line characters:

$file="largefile.txt";
$linecount = 0;
$handle = fopen($file, "r");
while(!feof($handle)){
$line = fgets($handle, 4096);
$linecount = $linecount + substr_count($line, PHP_EOL);
}

fclose($handle);

echo $linecount;

it's possible to determine how many lines exist in file without per line iteration?

In general it's not possible to do better than reading every character in the file and counting newline characters.

It may be possible if you know details about the internal structure of the file. For example, if the file is 1024kB long, and every line is 1kB in length, then you can deduce there are 1024 lines in the file.



Related Topics



Leave a reply



Submit