How to Add a Utf-8 Bom in Java

How to add a UTF-8 BOM in Java?

To write a BOM in UTF-8 you need PrintStream.print(), not PrintStream.write().

Also if you want to have BOM in your csv file, I guess you need to print a BOM after putNextEntry().

Write a text file encoded in UTF-8 with a BOM through java.nio

As far as I know, there's no direct way in the standard Java NIO library to write text files in UTF-8 with BOM format.

But that's not a problem, since BOM is nothing but a special character at the start of a text stream represented as \uFEFF. Just add it manually to the CSV file, f.e.:

List<String> lines = 
Arrays.asList("\uFEFF" + "Übernahme", "Außendarstellung", "€", "@", "UTF-8?");
...

How to add a UTF-8 BOM in Kotlin?

The BOM is a single Unicode character, U+FEFF. You can easily add it yourself, if it's required.

File(fileName).writeText("\uFEFF" + source, Charsets.UTF_8)

The harder part is that the BOM is not stripped automatically when the file is read back in. This is why people recommend not adding a BOM when it's not needed.

Reading UTF-8 - BOM marker

In Java, you have to consume manually the UTF8 BOM if present. This behaviour is documented in the Java bug database, here and here. There will be no fix for now because it will break existing tools like JavaDoc or XML parsers. The Apache IO Commons provides a BOMInputStream to handle this situation.

Take a look at this solution: Handle UTF8 file with BOM

Send CSV file encoded in UTF-8 with BOM in Java

I didn't do much to fix my issue, and I'm still not sure what was wrong. I only had to change the PrintWriter to a Writer, and add the charset in my javascript code.

Backend service

public void exportStreetsToCsv(Set<Street> streets, Writer writer) throws IOException {
writer.write('\uFEFF'); // Write BOM
// ...

Frontend download

const blobFile = new Blob([response.data], { type: 'text/csv;charset=utf-8' });
this.FileSaver.saveAs(blobFile, 'test.csv');

Add BOM in the beginning of a String

It turns out Postman do not show the BOM. Otherwise the linked solution works perfectly.

Java: UTF-8 and BOM

Yes, it is still true that Java cannot handle the BOM in UTF8 encoded files. I came across this issue when parsing several XML files for data formatting purposes. Since you can't know when you might come across them, I would suggest stripping the BOM marker out if you find it at runtime or following the advice that tchrist gave.

Java Spring returning CSV file encoded in UTF-8 with BOM

I have just come across, this same problem. The solution which works for me is to get the output stream from the response Object and write to it as follows

    // first create an array for the Byte Order Mark
final byte[] bom = new byte[] { (byte) 239, (byte) 187, (byte) 191 };
try (OutputStream os = response.getOutputStream()) {
os.write(bom);

final PrintWriter w = new PrintWriter(new OutputStreamWriter(os, "UTF-8"));
w.print(data);
w.flush();
w.close();
} catch (IOException e) {
// logit
}

So UTF-8 is specified on the OutputStreamWriter.


As an addendum to this, I should add, the same application needs to allow users to upload files, these may or may not have BOM's. This may be dealt with by using the class org.apache.commons.io.input.BOMInputStream, then using that to construct a org.apache.commons.csv.CSVParser.
The BOMInputStream includes a method hasBOM() to detect if the file has a BOM or not.
One gotcha that I first fell into was that the hasBOM() method reads (obviously!) from the underlying stream, so the way to deal with this is to first mark the stream, then after the test if it doesn't have a BOM, reset the stream. The code I use for this looks like the following:

try (InputStream is = uploadFile.getInputStream();
BufferedInputStream buffIs = new BufferedInputStream(is);
BOMInputStream bomIn = new BOMInputStream(buffIs);) {
buffIs.mark(LOOKAHEAD_LENGTH);
// this should allow us to deal with csv's with or without BOMs
final boolean hasBOM = bomIn.hasBOM();
final BufferedReader buffReadr = new BufferedReader(
new InputStreamReader(hasBOM ? bomIn : buffIs, StandardCharsets.UTF_8));

// if this stream does not have a BOM, then we must reset the stream as the test
// for a BOM will have consumed some bytes
if (!hasBOM) {
buffIs.reset();
}

// collect the validated entity details
final CSVParser parser = CSVParser.parse(buffReadr,
CSVFormat.DEFAULT.withFirstRecordAsHeader());
// Do stuff with the parser
...
// Catch and clean up

Hope this helps someone.

write UTF-8 BOM with supercsv

As supercsv probably wraps a Writer:

Writer writer = new OutputStreamWriter(out, StandardCharsets.UTF_8);
writer.write('\uFEFF'); // BOM for UTF-*
... new BeanWriter(writer, CsvPreference.STANDARD_PREFERENCE);


Related Topics



Leave a reply



Submit