How to Write a Utf-8 File with Java

How to write a UTF-8 file with Java?

Instead of using FileWriter, create a FileOutputStream. You can then wrap this in an OutputStreamWriter, which allows you to pass an encoding in the constructor. Then you can write your data to that inside a try-with-resources Statement:

try (OutputStreamWriter writer =
new OutputStreamWriter(new FileOutputStream(PROPERTIES_FILE), StandardCharsets.UTF_8))
// do stuff
}

set encoding as UTF-8 for a FileWriter

Java has extensive, highly informative documentation. Keep it bookmarked. Refer to it first, whenever you have difficulty. You'll find it's frequently helpful.

In this case, the documentation for FileWriter says:

The constructors of this class assume that the default character encoding and the default byte-buffer size are acceptable. To specify these values yourself, construct an OutputStreamWriter on a FileOutputStream.

If you want to be sure your file will be written as UTF-8, replace this:

FileWriter fstream = null;
BufferedWriter out = null;
try {
fstream = new FileWriter(mergedFile, false);

with this:

Writer fstream = null;
BufferedWriter out = null;
try {
fstream = new OutputStreamWriter(new FileOutputStream(mergedFile), StandardCharsets.UTF_8);

Creating a UTF-8 File in Java

First, files themselves don't have encodings. They're a bunch of 0s and 1s. If you write "asdf" in utf-8, it's completely indistinguishable from plain old ascii7.

If you were writing in, say, utf-16, then the byte-order mark (BOM) would be a pretty clear indication that it's written in utf-16, even with an empty string, but utf-8 does not require such a marker to be present.

Therefore, your editor has no way of knowing that this file is supposed to be written in utf-8. You could write utf-8's BOM to your file by:

out.write(0xEFBBBF);

However, in this case, outwould have to be an OutputStream, such as the FileOutputStream. (BufferedWriter and OutputStreamWriter do not accept byte arrays for input.)

Write a text file encoded in UTF-8 with a BOM through java.nio

As far as I know, there's no direct way in the standard Java NIO library to write text files in UTF-8 with BOM format.

But that's not a problem, since BOM is nothing but a special character at the start of a text stream represented as \uFEFF. Just add it manually to the CSV file, f.e.:

List<String> lines = 
Arrays.asList("\uFEFF" + "Übernahme", "Außendarstellung", "€", "@", "UTF-8?");
...

Write to a file with a specific encoding in Java

Now first the worrisome. FileWriter and FileReader are old utility classes, that use the default platform settings on that computer. Run elsewhere that code will give a different file, will not be able to read a file from another spot.

ISO-8859-15 is a single byte encoding. But java holds text in Unicode, so it
can combine all scripts. And char is UTF-16. In general a char index will not be a byte index, but in your case it probably works. But the line break might be one \n or two \r\n chars/bytes - platform dependently.

Re

Personally I think UTF-8 is well established, and it is easier to use:

byte[] bytes = string.getBytes(StandardCharsets.UTF_8);
string = new String(bytes, StandardCharsets.UTF_8);

That way all special quotes, euro, and so on will always be available.

At least specify the encoding:

Files.newBufferedWriter(file.toPath(), "ISO-8859-15");


Related Topics



Leave a reply



Submit