How to Convert Utf-8 Byte[] to String

How to convert Strings to and from UTF8 byte arrays in Java

Convert from String to byte[]:

String s = "some text here";
byte[] b = s.getBytes(StandardCharsets.UTF_8);

Convert from byte[] to String:

byte[] b = {(byte) 99, (byte)97, (byte)116};
String s = new String(b, StandardCharsets.US_ASCII);

You should, of course, use the correct encoding name. My examples used US-ASCII and UTF-8, two commonly-used encodings.

How to convert UTF-8 byte[] to string

string result = System.Text.Encoding.UTF8.GetString(byteArray);

UTF-8 byte[] to String

Look at the constructor for String

String str = new String(bytes, StandardCharsets.UTF_8);

And if you're feeling lazy, you can use the Apache Commons IO library to convert the InputStream to a String directly:

String str = IOUtils.toString(inputStream, StandardCharsets.UTF_8);

byte [] to String conversion and again back to byte [] using UTF-8 encoding is not giving same byte array

A sequence of bytes has to follow strict rules to be valid utf-8 encoded text. What you have in the array does not follow these rules, and can't be converted into a string without losing information.

The rules are explained for example in https://en.wikipedia.org/wiki/UTF-8

How to convert utf8 byte array to a string of given length

It looks like Decoder has your back here, in particular with the somewhat huge Convert method. I think you'd want:

var decoder = Encoding.UTF8.GetDecoder();
var chars = new char[4];
decoder.Convert(bytes, 0, bytes.Length, chars, 0, chars.Length,
true, out int bytesUsed, out int charsUsed, out bool completed);

Complete sample using the data in your question:

using System;
using System.Text;

public class Test
{
static void Main()
{
var bytes = new byte[] { 0x61, 0x62, 0xc4, 0x85, 0xc4, 0x87, 0x01, 0x02, 0x03 };
var decoder = Encoding.UTF8.GetDecoder();
var chars = new char[4];
decoder.Convert(bytes, 0, bytes.Length, chars, 0, chars.Length,
true, out int bytesUsed, out int charsUsed, out bool completed);
Console.WriteLine($"Completed: {completed}");
Console.WriteLine($"Bytes used: {bytesUsed}");
Console.WriteLine($"Chars used: {charsUsed}");
Console.WriteLine($"Text: {new string(chars, 0, charsUsed)}");
}
}

UTF-8 is not working for converting byte[] to string

UTF8 is not an appropriate way of encoding arbitrary bytes as a string. Rather: it encodes arbitrary strings as bytes (and vice-versa, as long as the bytes are in the correct format). There is no reason to think that HBaseGenericHelper.GetBigEndianByteArray(startEpochInDays) returns UTF-8 data, so encoding.GetString is entirely inappropriate and is actually using the Encoding backwards. This is the first topic I discussed here - so don't panic: you're in good company - people make this mistake all the time.

What you should be using is something like base-16 (hexadecimal) or base-64.

To get hex: BitConverter.ToString(byte[]). To get base-64: Convert.ToBase64String(byte[])

If you need the data to be in a particular format that isn't base-64 or base-16, then you'll have to be specific about what format you want. But: it isn't "UTF-8 used backwards".

How to convert array of byte to String in Java?

You need to specify the encoding you want e.g. for UTF-8

String doc = ....
byte[] bytes = doc.getBytes("UTF-8");
String doc2 = new String(bytes, "UTF-8");

doc and doc2 will be the same.

To decode a byte[] you need to know what encoding was used to be sure it will decode correctly.



Related Topics



Leave a reply



Submit