Java FileReader encoding issue
Yes, you need to specify the encoding of the file you want to read.
Yes, this means that you have to know the encoding of the file you want to read.
No, there is no general way to guess the encoding of any given "plain text" file.
The one-arguments constructors of FileReader
always use the platform default encoding which is generally a bad idea.
Since Java 11 FileReader
has also gained constructors that accept an encoding: new FileReader(file, charset)
and new FileReader(fileName, charset)
.
In earlier versions of java, you need to use new InputStreamReader(
new FileInputStream(pathToFile)
, <encoding>)
.
Read file utf-8
new FileReader(fileName)
As indicated in the documentation:
The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.
So, if your file is encoded using UTF-8, and your default encoding is not UTF-8, that won't work. The documentation explains what must be done in this case:
new InputStreamReader(new FileInputStream(fileName), "UTF-8")
Found reliance on default encoding in FileReader
use an explicit character encoding when opening a file instead of relying on the platform default (which can change depending on the platform), unless of course, you intend to use the platform default. you can use InputStreamReader
to convert a FileInputStream
to a Reader
using an explicit character encoding.
Wrong output when attempting to read a text file
Your file starts with a byte-order mark (U+FEFF). It should only occur in the first character of the file - it's not terribly widely used, but various Windows tools do include it, including Notepad. You can just strip it from the start of the first line.
As an aside, I'd strongly recommend not using FileReader
- it doesn't allow you to specify the encoding. I'd use Files.newBufferedReader
, and either specify the encoding or let it default to UTF-8 (rather than the system default encoding which FileReader
uses). When you're using BufferedReader
, you can then just read a line at a time with readLine()
too:
String line;
while ((line = reader.readLine()) != null) {
System.out.println(line.replace("\uFEFF", ""));
}
If you really want to read a character at a time, it's worth getting in the habit of using a StringBuilder
instead of repeated string concatenation in a loop. Also note that your variable name of ascii
is misleading: it's actually the UTF-16 code unit, which may or may not be an ASCII character.
The encoding you specify should match the encoding used to write the file - at that point you should see the correct output instead of an extra character between each "real" character when using Unicode and Unicode big endian.
How to read a file in Java with specific character encoding?
So, first, as a heads up, do realize that fileName.getBytes()
as you have there gets the bytes of the filename, not the file itself.
Second, reading inside the docs of FileReader:
The constructors of this class assume that the default character
encoding and the default byte-buffer size are appropriate. To specify
these values yourself, construct an InputStreamReader on a
FileInputStream.
So, sounds like FileReader actually isn't the way to go. If we take the advice in the docs, then you should just change your code to have:
String fileName = getFileNameToReadFromUserInput();
FileInputStream is = new FileInputStream(fileName);
InputStreamReader isr = new InputStreamReader(is, getCorrectCharsetToApply());
BufferedReader buffReader = new BufferedReader(isr);
and not try to make a FileReader at all.
Java text encoding
Your problem is probably that you're opening a reader using the platform encoding.
You should manually specify the encoding whenever you convert between bytes and characters. If you know that the appropriate encoding is UTF-8 you can open a file thus:
FileInputStream inputFile = new FileInputStream(myFile);
try {
FileReader reader = new FileReader(inputFile, "UTF-8");
// Maybe buffer reader and do something with it.
} finally {
inputFile.close();
}
Libraries like Guava can make this whole process easier..
Reading from a file Russian characters(javaSE)
You need to specify the encoding to be able to read the russian character. Don't use FileReader
as it will use default platform encoding.
Instead use
new BufferedReader(new InputStreamReader(fileDir), "UTF8");
Related Topics
Prevent Webview from Displaying "Web Page Not Available"
How to Blur Background Images in Android
Recyclerview Scrolled Up/Down Listener
Fragment Add or Replace Not Working
How to Change the Edittext Text Without Triggering the Text Watcher
Automatically Log Android Lifecycle Events Using Activitylifecyclecallbacks
Onintercepttouchevent Only Gets Action_Down
Center Message in Android Dialog Box
Listview Selection Remains Persistent After Exiting Choice Mode
How to Use Tablayout with Viewpager2 in Android
App Crashing When Trying to Use Recyclerview on Android 5.0
How to Import All Packages in Jruby
What Is an Index in Elasticsearch
Is String Interning Really Useful
How to Get Ruby Generated Hmac for Sha256 That Is Url Safe to Match Java
Java to Ruby Aes/Ecb/Pkcs5Padding Encryption