Utf-8 Byte[] to String

How to convert UTF-8 byte[] to string

string result = System.Text.Encoding.UTF8.GetString(byteArray);

How to convert Strings to and from UTF8 byte arrays in Java

Convert from String to byte[]:

String s = "some text here";
byte[] b = s.getBytes(StandardCharsets.UTF_8);

Convert from byte[] to String:

byte[] b = {(byte) 99, (byte)97, (byte)116};
String s = new String(b, StandardCharsets.US_ASCII);

You should, of course, use the correct encoding name. My examples used US-ASCII and UTF-8, two commonly-used encodings.

UTF-8 byte[] to String

Look at the constructor for String

String str = new String(bytes, StandardCharsets.UTF_8);

And if you're feeling lazy, you can use the Apache Commons IO library to convert the InputStream to a String directly:

String str = IOUtils.toString(inputStream, StandardCharsets.UTF_8);

byte [] to String conversion and again back to byte [] using UTF-8 encoding is not giving same byte array

A sequence of bytes has to follow strict rules to be valid utf-8 encoded text. What you have in the array does not follow these rules, and can't be converted into a string without losing information.

The rules are explained for example in https://en.wikipedia.org/wiki/UTF-8

Array of bytes to UTF-8 string in PHP?

A string is nothing more than an array of bytes. So a UTF-8 string is the very same as an array of bytes, except that in addition you know what the array of bytes represent.

So your input array of bytes needs one more additional information as well: the character set (character encoding). If you know the input character set, you can convert the array of bytes to another array of bytes representing an UTF-8 string.

The PHP method for doing that is called mb_convert_encoding().

PHP itself does not know of character sets (character encodings). So a string really is nothing more than an array of bytes. The application has to know how to handle that.

So if you have an array of bytes and want to turn that into a PHP string in order to convert the character set using mb_convert_encoding(), try the following:

$input = array(0x53, 0x68, 0x69);
$output = '';
for ($i = 0, $j = count($input); $i < $j; ++$i) {
$output .= chr($input[$i]);
}
$output_utf8 = mb_convert_encoding($output, 'utf-8', 'enter input encoding here');

(Instead of the single example above, have a look at more examples at https://stackoverflow.com/a/5473057/530502.)

$output_utf8 then will be a PHP string of the input array of bytes converted to UTF-8.

UTF-8 is not working for converting byte[] to string

UTF8 is not an appropriate way of encoding arbitrary bytes as a string. Rather: it encodes arbitrary strings as bytes (and vice-versa, as long as the bytes are in the correct format). There is no reason to think that HBaseGenericHelper.GetBigEndianByteArray(startEpochInDays) returns UTF-8 data, so encoding.GetString is entirely inappropriate and is actually using the Encoding backwards. This is the first topic I discussed here - so don't panic: you're in good company - people make this mistake all the time.

What you should be using is something like base-16 (hexadecimal) or base-64.

To get hex: BitConverter.ToString(byte[]). To get base-64: Convert.ToBase64String(byte[])

If you need the data to be in a particular format that isn't base-64 or base-16, then you'll have to be specific about what format you want. But: it isn't "UTF-8 used backwards".

How to convert array of byte to String in Java?

You need to specify the encoding you want e.g. for UTF-8

String doc = ....
byte[] bytes = doc.getBytes("UTF-8");
String doc2 = new String(bytes, "UTF-8");

doc and doc2 will be the same.

To decode a byte[] you need to know what encoding was used to be sure it will decode correctly.



Related Topics



Leave a reply



Submit