How to Read a Large Text File Line by Line Using Java

Java Read Large Text File With 70million line of text

1) I am sure there is no difference speedwise, both use FileInputStream internally and buffering

2) You can take measurements and see for yourself

3) Though there's no performance benefits I like the 1.7 approach

try (BufferedReader br = Files.newBufferedReader(Paths.get("test.txt"), StandardCharsets.UTF_8)) {
for (String line = null; (line = br.readLine()) != null;) {
//
}
}

4) Scanner based version

    try (Scanner sc = new Scanner(new File("test.txt"), "UTF-8")) {
while (sc.hasNextLine()) {
String line = sc.nextLine();
}
// note that Scanner suppresses exceptions
if (sc.ioException() != null) {
throw sc.ioException();
}
}

5) This may be faster than the rest

try (SeekableByteChannel ch = Files.newByteChannel(Paths.get("test.txt"))) {
ByteBuffer bb = ByteBuffer.allocateDirect(1000);
for(;;) {
StringBuilder line = new StringBuilder();
int n = ch.read(bb);
// add chars to line
// ...
}
}

it requires a bit of coding but it can be really faster because of ByteBuffer.allocateDirect. It allows OS to read bytes from file to ByteBuffer directly, without copying

6) Parallel processing would definitely increase speed. Make a big byte buffer, run several tasks that read bytes from file into that buffer in parallel, when ready find first end of line, make a String, find next...

Using Java Spark to read large text files line by line

JavaRDD<String> lines = sc.textFile(path);

JavaRDD<String> jsonList = lines.map(line ->line.split("/"))

Or you can define a new method inside the map

   JavaRDD<String> jsonList = lines.map(line ->{
String newline = line.replace("","")
return newline ;

})

//Do convert the JavaRDD to DataFrame

Converting JavaRDD to DataFrame in Spark java

dfTobeSaved.write.format("json").save("/root/data.json")

how to read/list 30 lines per page in a large text file report in Java?

As I already stated in my comment you'd need to count the cars you've already processed and print new page headers when you've hit a multiple of 30 cars.

First I'd suggest moving your header print statements to a separate method, e.g. printPageHeader(int pageNumber). Then change your loop like this:

final int pageSize = 30;
int carCounter = 0;

while (carsScanner.hasNextLine()) {
if( carCounter % pageSize == 0 ) {
printPageHeader( carCounter / pageSize + 1 );
}

//read car lines here

//print your car
System.out.printf(format, ... );

//count the car
carCounter++;
}

Reading a large text file with over 130000 line of text

Since you are processing a large file, you should process the data in chunks . Here your file reading is fine but then you keep adding all rows in string buffer and finally passing to Toast.makeText(). It creates a big foot-print in memory. Instead you can read 100-100 lines and call Toast.makeText() to process in chunks. One more thing, use string builder instead of string buffer go avoid unwanted overhead of synchronization. You initializing wwwdf2 variable inside the method but looks it is a instance variable which I think is not required. Declare it inside method to make it's scope shorter.

Reading very large text files in java

InputStreamReader is a facility to convert a raw InputStream (stream of bytes) to a stream of characters, according to some charset. FIleInputStream is a stream of bytes (it extends InputStream) from a given file. You can use InputStreamReader to read text, for instance, from a socket as well, as socket.getInputStream() also gives an InputStream.

InputStreamReader is a Reader, the abstract class for a stream of characters. Using an InputStreamReader alone would be inefficient, as each "readLine" would actually read from the file. When you decorate with a BufferedReader, it will read a chunk of bytes and keep it in memory, and use it for subsequent reads.

About the size: the documentation does not state the default value:

https://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html

The buffer size may be specified, or the default size may be used. The
default is large enough for most purposes.

You must check the source file to find the value.

https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/java/io/BufferedReader.java

This is the implementation in the OpenJDK:

 private static int defaultCharBufferSize = 8192;

The Oracle's closed source JDK implementation may be different.

What's the most efficient way to process a large text file line by line

BufferedReader br = new BufferedReader(new FileReader(file));
String line;
boolean found = false;
while ((line = br.readLine()) != null) {
if(line.equalsIgnoreCase("Your string"))
found = true;
}


Related Topics



Leave a reply



Submit