How can I get the count of line in a file in an efficient way?
BufferedReader reader = new BufferedReader(new FileReader("file.txt"));
int lines = 0;
while (reader.readLine() != null) lines++;
reader.close();
Update: To answer the performance-question raised here, I made a measurement. First thing: 20.000 lines are too few, to get the program running for a noticeable time. I created a text-file with 5 million lines. This solution (started with java without parameters like -server or -XX-options) needed around 11 seconds on my box. The same with wc -l
(UNIX command-line-tool to count lines), 11 seconds. The solution reading every single character and looking for '\n' needed 104 seconds, 9-10 times as much.
How can I get the count of line in a file in an efficient way in Dart?
var myFile = new File('file.txt');
// assuming a utf8 encoding
var numberOfLines = myFile.readAsLinesSync().length;
See readAsLinesSync doc to provide an other encoding.
What's the fastest way to count the total lines of text file in c#?
Here are a few ways this can be accomplished quickly:
StreamReader:
using (var sr = new StreamReader(path))
{
while (!String.IsNullOrEmpty(sr.ReadLine()))
lineCount ++;
}
FileStream:
var lineBuffer = new byte[65536]; // 64Kb
using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read,
FileShare.Read, lineBuffer.Length))
{
int readBuffer = 0;
while ((readBuffer = fs.Read(lineBuffer, 0, lineBuffer.Length)) > 0)
{
for (int i = 0; i < readBuffer; i++)
{
if (lineBuffer[i] == 0xD) // Carriage return + line feed
lineCount++;
}
}
}
Multithreading:
Arguably the number of threads shouldn't affect the read speed, but real world benchmarking can sometimes prove otherwise. Try different buffer sizes and see if you get any gains at all with your setup. *This method contains a race condition. Use with caution.
var tasks = new Task[Environment.ProcessorCount]; // 1 per core
var fileLock = new ReaderWriterLockSlim();
int bufferSize = 65536; // 64Kb
using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read,
FileShare.Read, bufferSize, FileOptions.RandomAccess))
{
for (int i = 0; i < tasks.Length; i++)
{
tasks[i] = Task.Factory.StartNew(() =>
{
int readBuffer = 0;
var lineBuffer = new byte[bufferSize];
while ((fileLock.TryEnterReadLock(10) &&
(readBuffer = fs.Read(lineBuffer, 0, lineBuffer.Length)) > 0))
{
fileLock.ExitReadLock();
for (int n = 0; n < readBuffer; n++)
if (lineBuffer[n] == 0xD)
Interlocked.Increment(ref lineCount);
}
});
}
Task.WaitAll(tasks);
}
Number of lines in a file in Java
This is the fastest version I have found so far, about 6 times faster than readLines. On a 150MB log file this takes 0.35 seconds, versus 2.40 seconds when using readLines(). Just for fun, linux' wc -l command takes 0.15 seconds.
public static int countLinesOld(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int count = 0;
int readChars = 0;
boolean empty = true;
while ((readChars = is.read(c)) != -1) {
empty = false;
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n') {
++count;
}
}
}
return (count == 0 && !empty) ? 1 : count;
} finally {
is.close();
}
}
EDIT, 9 1/2 years later: I have practically no java experience, but anyways I have tried to benchmark this code against the LineNumberReader
solution below since it bothered me that nobody did it. It seems that especially for large files my solution is faster. Although it seems to take a few runs until the optimizer does a decent job. I've played a bit with the code, and have produced a new version that is consistently fastest:
public static int countLinesNew(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int readChars = is.read(c);
if (readChars == -1) {
// bail out if nothing to read
return 0;
}
// make it easy for the optimizer to tune this loop
int count = 0;
while (readChars == 1024) {
for (int i=0; i<1024;) {
if (c[i++] == '\n') {
++count;
}
}
readChars = is.read(c);
}
// count remaining characters
while (readChars != -1) {
for (int i=0; i<readChars; ++i) {
if (c[i] == '\n') {
++count;
}
}
readChars = is.read(c);
}
return count == 0 ? 1 : count;
} finally {
is.close();
}
}
Benchmark resuls for a 1.3GB text file, y axis in seconds. I've performed 100 runs with the same file, and measured each run with System.nanoTime()
. You can see that countLinesOld
has a few outliers, and countLinesNew
has none and while it's only a bit faster, the difference is statistically significant. LineNumberReader
is clearly slower.
Is there a better way to determine the number of lines in a large txt file(1-2 GB)?
I'm just thinking out loud here, but chances are performance is I/O bound and not CPU bound. In any case, I'm wondering if interpreting the file as text may be slowing things down as it will have to convert between the file's encoding and string
's native encoding. If you know the encoding is ASCII or compatible with ASCII, you might be able to get away with just counting the number of times a byte with the value 10 appears (which is the character code for a linefeed).
What if you had the following:
FileStream fs = new FileStream("path.txt", FileMode.Open, FileAccess.Read, FileShare.None, 1024 * 1024);
long lineCount = 0;
byte[] buffer = new byte[1024 * 1024];
int bytesRead;
do
{
bytesRead = fs.Read(buffer, 0, buffer.Length);
for (int i = 0; i < bytesRead; i++)
if (buffer[i] == '\n')
lineCount++;
}
while (bytesRead > 0);
My benchmark results for 1.5GB text file, timed 10 times, averaged:
StreamReader
approach, 4.69 secondsFile.ReadLines().Count()
approach, 4.54 secondsFileStream
approach, 1.46 seconds
Count lines in large files
Try: sed -n '$=' filename
Also cat is unnecessary: wc -l filename
is enough in your present way.
How to get the line count of a large file, at least 5G
Step 1: head -n filename > newfile // get the first n lines into newfile,e.g. n =5
Step 2: Get the huge file size, A
Step 3: Get the newfile size,B
Step 4: (A/B)*n is approximately equal to the exact line count.
Set n to be different values,done a few times more, then get the average.
How to get line count of a large file cheaply in Python?
You can't get any better than that.
After all, any solution will have to read the entire file, figure out how many \n
you have, and return that result.
Do you have a better way of doing that without reading the entire file? Not sure... The best solution will always be I/O-bound, best you can do is make sure you don't use unnecessary memory, but it looks like you have that covered.
Fastest way to count lines in a file
for /f "tokens=1 delims=:" %%# in ('find /c /v "" ^< FILENAME') do set "linescount=%%#"
echo %linescount%
Fastest way to find the number of lines in a text (C++)
The only way to find the line count is to read the whole file and count the number of line-end characters. The fastest way to do this is probably to read the whole file into a large buffer with one read operation and then go through the buffer counting the '\n' characters.
As your current file size appears to be about 60Mb, this is not an attractive option. You can get some of the speed by not reading the whole file, but reading it in chunks, say of size 1Mb. You also say that a database is out of the question, but it really does look to be the best long-term solution.
Edit: I just ran a small benchmark on this and using the buffered approach (buffer size 1024K) seems to be a bit more than twice as fast as reading a line at a time with getline()
. Here's the code - my tests were done with g++ using -O2
optimisation level:
#include <iostream>
#include <fstream>
#include <vector>
#include <ctime>
using namespace std;
unsigned int FileRead( istream & is, vector <char> & buff ) {
is.read( &buff[0], buff.size() );
return is.gcount();
}
unsigned int CountLines( const vector <char> & buff, int sz ) {
int newlines = 0;
const char * p = &buff[0];
for ( int i = 0; i < sz; i++ ) {
if ( p[i] == '\n' ) {
newlines++;
}
}
return newlines;
}
int main( int argc, char * argv[] ) {
time_t now = time(0);
if ( argc == 1 ) {
cout << "lines\n";
ifstream ifs( "lines.dat" );
int n = 0;
string s;
while( getline( ifs, s ) ) {
n++;
}
cout << n << endl;
}
else {
cout << "buffer\n";
const int SZ = 1024 * 1024;
std::vector <char> buff( SZ );
ifstream ifs( "lines.dat" );
int n = 0;
while( int cc = FileRead( ifs, buff ) ) {
n += CountLines( buff, cc );
}
cout << n << endl;
}
cout << time(0) - now << endl;
}
Related Topics
How Does Java Makes Use of Multiple Cores
How to Check a Uploaded File Whether It Is an Image or Other File
How to Remove Element from Arraylist by Checking Its Value
How to Copy Hashmap (Not Shallow Copy) in Java
Console Application with Java and Gradle
How to Serialize Object to CSV File
Passing Parameters to a Jdbc Preparedstatement
Refreshing Static Content with Spring MVC and Boot
Make Hibernate Ignore Instance Variables That Are Not Mapped
Parsing a Fixed-Width Formatted File in Java
How to Create an 2D Arraylist in Java
How to Simulate a Real Mouse Click Using Java
How to Change Java Version Used by Tomcat