Java:Read Last N Lines of a Huge File

Java : Read last n lines of a HUGE file

If you use a RandomAccessFile, you can use length and seek to get to a specific point near the end of the file and then read forward from there.

If you find there weren't enough lines, back up from that point and try again. Once you've figured out where the Nth last line begins, you can seek to there and just read-and-print.

An initial best-guess assumption can be made based on your data properties. For example, if it's a text file, it's possible the line lengths won't exceed an average of 132 so, to get the last five lines, start 660 characters before the end. Then, if you were wrong, try again at 1320 (you can even use what you learned from the last 660 characters to adjust that - example: if those 660 characters were just three lines, the next try could be 660 / 3 * 5, plus maybe a bit extra just in case).

How to read the last n lines of a HUGE compressed file without decompressing the whole file to disk

you can't do random access on compressed stream contents. you either need to uncompress to a temp file or figure out a way to get what you need from one pass through the stream (e.g. read through the stream and keep the last N lines in memory, when you get to the end of the stream, you have the last N lines).

Quickly read the last line of a text file?

Have a look at my answer to a similar question for C#. The code would be quite similar, although the encoding support is somewhat different in Java.

Basically it's not a terribly easy thing to do in general. As MSalter points out, UTF-8 does make it easy to spot \r or \n as the UTF-8 representation of those characters is just the same as ASCII, and those bytes won't occur in multi-byte character.

So basically, take a buffer of (say) 2K, and progressively read backwards (skip to 2K before you were before, read the next 2K) checking for a line termination. Then skip to exactly the right place in the stream, create an InputStreamReader on the top, and a BufferedReader on top of that. Then just call BufferedReader.readLine().

How to read text file from server and display only e.g. 100 last lines of it in textview

One solution would be to add everything to an ArrayList and then make your text out of the last hundred records. To improve the functionality you can start removing the lines from the top once the count reaches beyond hundred.

Here is the code snippet:

/* Iterate File */
List<String> lst = new ArrayList<>();
String line = null;
while((line = br.readLine()) != null) {
if(lst.size() == 100) {
lst.remove(0);
}
lst.add(line);
}
br.close();

/* Make Text */
StringBuilder sb = new StringBuilder();
for(String s : lst) {
sb.append(s).append("\n");
}
text = sb.toString();

/* Clear ArrayList */
lst.clear();

Reading Last *n* lines from GZIPInputStream

Is it possible to do that without readline until eof?

No and well due to the following two reasons:

  1. You cannot read a stream backwards.
  2. You cannot un(g)zip backwards.

Just read the entire stream wherein you ignore the lines which you aren't interested in.

Reading the last n lines from a huge text file

I'd use scan for this, in case you know how many lines the log has :

scan("foo.txt",sep="\n",what="char(0)",skip=100)

If you have no clue how many you need to skip, you have no choice but to move towards either

  • reading in everything and taking the last n lines (in case that's feasible),
  • using scan("foo.txt",sep="\n",what=list(NULL)) to figure out how many records there are, or
  • using some algorithm to go through the file, keeping only the last n lines every time

The last option could look like :

ReadLastLines <- function(x,n,...){    
con <- file(x)
open(con)
out <- scan(con,n,what="char(0)",sep="\n",quiet=TRUE,...)

while(TRUE){
tmp <- scan(con,1,what="char(0)",sep="\n",quiet=TRUE)
if(length(tmp)==0) {close(con) ; break }
out <- c(out[-1],tmp)
}
out
}

allowing :

ReadLastLines("foo.txt",100)

or

ReadLastLines("foo.txt",100,skip=1e+7)

in case you know you have more than 10 million lines. This can save on the reading time when you start having extremely big logs.


EDIT : In fact, I'd not even use R for this, given the size of your file. On Unix, you can use the tail command. There is a windows version for that as well, somewhere in a toolkit. I didn't try that out yet though.



Related Topics



Leave a reply



Submit