Reading Inputstream as Utf-8

Reading InputStream as UTF-8

Solved my own problem. This line:

BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));

needs to be:

BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));

or since Java 7:

BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), StandardCharsets.UTF_8));

How to read a InputStream with UTF-8?

When you get your InputStream read byte[]s from it. When you create your Strings, pass in the CharSetfor "UTF-8". Example:

byte[] buffer = new byte[contentLength];
int bytesRead = inputStream.read(buffer);
String page = new String(buffer, 0, bytesRead, "UTF-8");

Note, you're probably going to want to make your buffer some sane size (like 1024), and continuously called inputStream.read(buffer).


@Amir Pashazadeh

Yes, you can also use an InputStreamReader, and try changing the parse() line to:

Document doc = db.parse(new InputSource(new InputStreamReader(in, "UTF-8")));

How can I make System.in Input Stream read utf-8 characters?

Wrap the InputStream in an InputStreamReader.

int read = new InputStreamReader(System.in).read();
System.out.println((char) read); // prints 'ğ'

If necessary, you can pass a specific Charset to the reader's constructor, but by default, it will just use the default charset, which is probably correct.

inputStream and utf 8 sometimes shows ? characters

To read characters from a byte stream with a given encoding, use a Reader. In your case it would be something like:

    InputStreamReader isr = new InputStreamReader(inpputStream, "UTF-8");
char[] inputBuffer = new char[BUFFER_SIZE];

while ((charsRead = isr.read(inputBuffer, 0, BUFFER_SIZE)) > 0) {
String read = new String(inputBuffer, 0, charsRead);
str += read;
}

You can see that the bytes will be read in directly as characters --- it's the reader's problem to know if it needs to read one or two bytes, e.g., to create the character in the buffer. It's basically your approach but decoding as the bytes are being read in, instead of after.

How do I read / convert an InputStream into a String in Java?

A nice way to do this is using Apache commons IOUtils to copy the InputStream into a StringWriter... something like

StringWriter writer = new StringWriter();
IOUtils.copy(inputStream, writer, encoding);
String theString = writer.toString();

or even

// NB: does not close inputStream, you'll have to use try-with-resources for that
String theString = IOUtils.toString(inputStream, encoding);

Alternatively, you could use ByteArrayOutputStream if you don't want to mix your Streams and Writers

Java InputStream read locale dependent?

The constructor you chose, String(byte[] encoded, int offset, int length), uses the default platform encoding to convert bytes to characters. It explicitly depends on the environment in which it runs.

This is a bad choice for portable code. For network applications, explicitly specify the encoding to be used. You can negotiate this as part of the network protocol, or specify a useful default like UTF-8.

There are a variety of APIs that encode and decode text. For example, the String constructor String(byte[] encoded, int offset, int length, Charset encoding) can be used like this:

String str = new String(backbytes, 0, c, StandardCharsets.UTF_8);

How to read write this in utf-8?

First, you need to call output.close() (or at least call output.flush()) before you reopen the file for input. That's probably the main cause of your problems.

Then, you shouldn't use FileReader or FileWriter for this because it always uses the platform-default encoding (which is often not UTF-8). From the docs for FileReader:

The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate.

You have the same problem when using a FileWriter. Replace this:

BufferedReader br = new BufferedReader(new FileReader("DirectionResponse.xml" ));

with something like this:

BufferedReader br = new BufferedReader(new InputStreamReader(
new FileInputStream("DirectionResponse.xml"), "UTF-8"));

and similarly for fstream.

How to load a UTF-8 text file with InputStream

Following code reads your file into byte array buffer and converts it to string

public String inputStreamToString(InputStream is) throws IOException {
byte[] buffer = new byte[is.available()];
int bytesRead = is.read(buffer);
return new String(buffer, 0, bytesRead, "UTF-8");
}


Related Topics



Leave a reply



Submit