How to add a UTF-8 BOM in Java?
To write a BOM in UTF-8 you need PrintStream.print()
, not PrintStream.write()
.
Also if you want to have BOM in your csv
file, I guess you need to print a BOM after putNextEntry()
.
Write a text file encoded in UTF-8 with a BOM through java.nio
As far as I know, there's no direct way in the standard Java NIO library to write text files in UTF-8 with BOM format.
But that's not a problem, since BOM is nothing but a special character at the start of a text stream represented as \uFEFF
. Just add it manually to the CSV file, f.e.:
List<String> lines =
Arrays.asList("\uFEFF" + "Übernahme", "Außendarstellung", "€", "@", "UTF-8?");
...
How to add a UTF-8 BOM in Kotlin?
The BOM is a single Unicode character, U+FEFF. You can easily add it yourself, if it's required.
File(fileName).writeText("\uFEFF" + source, Charsets.UTF_8)
The harder part is that the BOM is not stripped automatically when the file is read back in. This is why people recommend not adding a BOM when it's not needed.
Reading UTF-8 - BOM marker
In Java, you have to consume manually the UTF8 BOM if present. This behaviour is documented in the Java bug database, here and here. There will be no fix for now because it will break existing tools like JavaDoc or XML parsers. The Apache IO Commons provides a BOMInputStream
to handle this situation.
Take a look at this solution: Handle UTF8 file with BOM
Send CSV file encoded in UTF-8 with BOM in Java
I didn't do much to fix my issue, and I'm still not sure what was wrong. I only had to change the PrintWriter
to a Writer
, and add the charset in my javascript code.
Backend service
public void exportStreetsToCsv(Set<Street> streets, Writer writer) throws IOException {
writer.write('\uFEFF'); // Write BOM
// ...
Frontend download
const blobFile = new Blob([response.data], { type: 'text/csv;charset=utf-8' });
this.FileSaver.saveAs(blobFile, 'test.csv');
Add BOM in the beginning of a String
It turns out Postman do not show the BOM. Otherwise the linked solution works perfectly.
Java: UTF-8 and BOM
Yes, it is still true that Java cannot handle the BOM in UTF8 encoded files. I came across this issue when parsing several XML files for data formatting purposes. Since you can't know when you might come across them, I would suggest stripping the BOM marker out if you find it at runtime or following the advice that tchrist gave.
Java Spring returning CSV file encoded in UTF-8 with BOM
I have just come across, this same problem. The solution which works for me is to get the output stream from the response Object and write to it as follows
// first create an array for the Byte Order Mark
final byte[] bom = new byte[] { (byte) 239, (byte) 187, (byte) 191 };
try (OutputStream os = response.getOutputStream()) {
os.write(bom);
final PrintWriter w = new PrintWriter(new OutputStreamWriter(os, "UTF-8"));
w.print(data);
w.flush();
w.close();
} catch (IOException e) {
// logit
}
So UTF-8 is specified on the OutputStreamWriter.
As an addendum to this, I should add, the same application needs to allow users to upload files, these may or may not have BOM's. This may be dealt with by using the class org.apache.commons.io.input.BOMInputStream
, then using that to construct a org.apache.commons.csv.CSVParser
.
The BOMInputStream includes a method hasBOM()
to detect if the file has a BOM or not.
One gotcha that I first fell into was that the hasBOM()
method reads (obviously!) from the underlying stream, so the way to deal with this is to first mark the stream, then after the test if it doesn't have a BOM, reset the stream. The code I use for this looks like the following:
try (InputStream is = uploadFile.getInputStream();
BufferedInputStream buffIs = new BufferedInputStream(is);
BOMInputStream bomIn = new BOMInputStream(buffIs);) {
buffIs.mark(LOOKAHEAD_LENGTH);
// this should allow us to deal with csv's with or without BOMs
final boolean hasBOM = bomIn.hasBOM();
final BufferedReader buffReadr = new BufferedReader(
new InputStreamReader(hasBOM ? bomIn : buffIs, StandardCharsets.UTF_8));
// if this stream does not have a BOM, then we must reset the stream as the test
// for a BOM will have consumed some bytes
if (!hasBOM) {
buffIs.reset();
}
// collect the validated entity details
final CSVParser parser = CSVParser.parse(buffReadr,
CSVFormat.DEFAULT.withFirstRecordAsHeader());
// Do stuff with the parser
...
// Catch and clean up
Hope this helps someone.
write UTF-8 BOM with supercsv
As supercsv probably wraps a Writer:
Writer writer = new OutputStreamWriter(out, StandardCharsets.UTF_8);
writer.write('\uFEFF'); // BOM for UTF-*
... new BeanWriter(writer, CsvPreference.STANDARD_PREFERENCE);
Related Topics
Package a Non-Modular Javafx Application
How to Correctly Decode Unicode Parameters Passed to a Servlet
How to Avoid Type Safety Warnings with Hibernate Hql Results
How to Get All Table Names from a Database
How to Set Auto-Scrolling of Jtextarea in Java Gui
Adding an Http Header to the Request in a Servlet Filter
How to Check If a String Contains Only Ascii
Dbpedia Jena Query Returning Null
Spring Qualifier and Property Placeholder
Run Command Prompt as Administrator
Java: Join Array of Primitives with Separator
Is There Any Performance Reason to Declare Method Parameters Final in Java
Java Swing: How to Implement a Login Screen Before Showing a Jframe
Java Ee Specification and Multi Threading