How to load a large xlsx file with Apache POI?
I was in a similar situation with a webserver environment. The typical size of the uploads were ~150k rows and it wouldn't have been good to consume a ton of memory from a single request. The Apache POI Streaming API works well for this, but it requires a total redesign of your read logic. I already had a bunch of read logic using the standard API that I didn't want to have to redo, so I wrote this instead: https://github.com/monitorjbl/excel-streaming-reader
It's not entirely a drop-in replacement for the standard XSSFWorkbook
class, but if you're just iterating through rows it behaves similarly:
import com.monitorjbl.xlsx.StreamingReader;
InputStream is = new FileInputStream(new File("/path/to/workbook.xlsx"));
StreamingReader reader = StreamingReader.builder()
.rowCacheSize(100) // number of rows to keep in memory (defaults to 10)
.bufferSize(4096) // buffer size to use when reading InputStream to file (defaults to 1024)
.sheetIndex(0) // index of sheet to use (defaults to 0)
.read(is); // InputStream or File for XLSX file (required)
for (Row r : reader) {
for (Cell c : r) {
System.out.println(c.getStringCellValue());
}
}
There are some caveats to using it; due to the way XLSX sheets are structured, not all data is available in the current window of the stream. However, if you're just trying to read simple data out from the cells, it works pretty well for that.
Processing large xlsx file
Try using the event API. See Event API (HSSF only) and XSSF and SAX (Event API) in the POI documentation for details. A couple of quotes from that page:
HSSF:
The event API is newer than the User API. It is intended for intermediate developers who are willing to learn a little bit of the low level API structures. Its relatively simple to use, but requires a basic understanding of the parts of an Excel file (or willingness to learn). The advantage provided is that you can read an XLS with a relatively small memory footprint.
XSSF:
If memory footprint is an issue, then for XSSF, you can get at the underlying XML data, and process it yourself. This is intended for intermediate developers who are willing to learn a little bit of low level structure of .xlsx files, and who are happy processing XML in java. Its relatively simple to use, but requires a basic understanding of the file structure. The advantage provided is that you can read a XLSX file with a relatively small memory footprint.
For output, one possible approach is described in the blog post Streaming xlsx files. (Basically, use XSSF to generate a container XML file, then stream the actual content as plain text into the appropriate xml part of the xlsx zip archive.)
Java - OutOfMemoryError when writing large Excel file with Apache POI
This line:
String csvStr = new String(Files.readAllBytes(Paths.get(inputFilePath)), StandardCharsets.UTF_8);
Issue:
You are loading the whole file into the memory by using Files.readAllBytes
. And the allocated memory for the jvm processor on which this program is running is not enough.
Possible Solution:
You may want to start reading the file using streams/buffers like BufferedReader. Or you can lookup other Readers that allow you to read the file in bits so the whole memory is not consumed all at once.
Further Modifications:
You will have to modify your program at the time of writing also where after you read bits of data, you process and and write to a file, and when the time comes to write to a file again, you append.
Related Topics
What Code Folding Plugins Work on Eclipse 3.6
Java: How to Convert a Utc Timestamp to Local Time
How to Store Java Objects in Httpsession
How to Set Request Encoding in Tomcat
How to Unmap a File from Memory Mapped Using Filechannel in Java
Drawing a Simple Line Graph in Java
Why F Is Placed After Float Values
How to Validate an Xml File Using Java with an Xsd Having an Include
How to Find All the Methods That Call a Given Method in Java
Retrieving a Random Item from Arraylist
How to Use @Id with String Type in JPA/Hibernate
Use Custom Fonts When Creating PDF Using Ireport
Java Recursive Fibonacci Sequence
Why Can't a Generic Type Parameter Have a Lower Bound in Java