How to Read File on Windows and Linux from Java

Java: Accessing Windows files from Linux machine

To access windows directory you should mount windows shared directory to your linux server.

See https://wiki.centos.org/TipsAndTricks/WindowsShares

After this you'll be able to use this windows directory as a local linux directory.

Reading File from Windows and Linux yields different results (character encoding?)

� is a sequence of three characters - 0xEF 0xBF 0xBD, and is UTF-8 representation of the Unicode codepoint 0xFFFD. The codepoint in itself is the replacement character for illegal UTF-8 sequences.

Apparently, for some reason, the set of routines involved in your source code (on Linux) is handling the PNG header inaccurately. The PNG header starts with the byte 0x89 (and is followed by 0x50, 0x4E, 0x47), which is correctly handled in Windows (which might be treating the file as a sequence of CP1252 bytes). In CP1252, the 0x89 character is displayed as .

On Linux, however, this byte is being decoded by a UTF-8 routine (or a library that thought it was good to process the file as a UTF-8 sequence). Since, 0x89 on it's own is not a valid codepoint in the ASCII-7 range (ref: the UTF-8 encoding scheme), it cannot be mapped to a valid UTF-8 codepoint in the 0x00-0x7F range. Also, it cannot be mapped to a valid codepoint represented as a multi-byte UTF-8 sequence, for all of multi-byte sequences start with a minimum of 2 bits set to 1 (11....), and since this is the start of the file, it cannot be a continuation byte as well. The resulting behavior is that the UTF-8 decoder, now replaces 0x89 with the UTF-8 replacement characters 0xEF 0xBF 0xBD (how silly, considering that the file is not UTF-8 to begin with), which will be displayed in ISO-8859-1 as �.

If you need to resolve this problem, you'll need to ensure the following in Linux:

  • Read the bytes in the PNG file, using the suitable encoding for the file (i.e. not UTF-8); this is apparently necessary if you are reading the file as a sequence of characters*, and not necessary if you are reading bytes alone. You might be doing this correctly, so it would be worthwhile to verify the subsequent step(s) also.
  • When you are viewing the contents of the file, use a suitable editor/view that does not perform any internal decoding of the file to a sequence of UTF-8 bytes. Using a suitable font will also help, for you might want to prevent the unprecedented scenario where the glyph (for 0xFFFD it is actually the diamond character �) cannot be represented, and might result in further changes (unlikely, but you never know how the editor/viewer has been written).
  • It is also a good idea to write the files out (if you are doing so) in the suitable encoding - ISO-8859-1 perhaps, instead of UTF-8. If you are processing and storing the file contents in memory as bytes instead of characters, then writing these to an output stream (without the involvement of any String or character references) is sufficient.

* Apparently, the Java Runtime will perform decoding of the byte sequence to UTF-16 codepoints, if you convert a sequence of bytes to a character or a String object.

How to specify the path to access a file in both windows and linux uniquely using slash seperator

This works fine when i used file.separator.

public static final String QWD = File.separator +"qark-master" + File.separator +"qark" + File.separator +"qarkMain.py";
public static final String MANIFESTPATH=File.separator +"apktool"+ File.separator +"AndroidManifest.xml";


Related Topics



Leave a reply



Submit