How to convert Strings to and from UTF8 byte arrays in Java
Convert from String
to byte[]
:
String s = "some text here";
byte[] b = s.getBytes(StandardCharsets.UTF_8);
Convert from byte[]
to String
:
byte[] b = {(byte) 99, (byte)97, (byte)116};
String s = new String(b, StandardCharsets.US_ASCII);
You should, of course, use the correct encoding name. My examples used US-ASCII and UTF-8, two commonly-used encodings.
How to convert UTF-8 byte[] to string
string result = System.Text.Encoding.UTF8.GetString(byteArray);
UTF-8 byte[] to String
Look at the constructor for String
String str = new String(bytes, StandardCharsets.UTF_8);
And if you're feeling lazy, you can use the Apache Commons IO library to convert the InputStream to a String directly:
String str = IOUtils.toString(inputStream, StandardCharsets.UTF_8);
byte [] to String conversion and again back to byte [] using UTF-8 encoding is not giving same byte array
A sequence of bytes has to follow strict rules to be valid utf-8 encoded text. What you have in the array does not follow these rules, and can't be converted into a string without losing information.
The rules are explained for example in https://en.wikipedia.org/wiki/UTF-8
How to convert utf8 byte array to a string of given length
It looks like Decoder
has your back here, in particular with the somewhat huge Convert
method. I think you'd want:
var decoder = Encoding.UTF8.GetDecoder();
var chars = new char[4];
decoder.Convert(bytes, 0, bytes.Length, chars, 0, chars.Length,
true, out int bytesUsed, out int charsUsed, out bool completed);
Complete sample using the data in your question:
using System;
using System.Text;
public class Test
{
static void Main()
{
var bytes = new byte[] { 0x61, 0x62, 0xc4, 0x85, 0xc4, 0x87, 0x01, 0x02, 0x03 };
var decoder = Encoding.UTF8.GetDecoder();
var chars = new char[4];
decoder.Convert(bytes, 0, bytes.Length, chars, 0, chars.Length,
true, out int bytesUsed, out int charsUsed, out bool completed);
Console.WriteLine($"Completed: {completed}");
Console.WriteLine($"Bytes used: {bytesUsed}");
Console.WriteLine($"Chars used: {charsUsed}");
Console.WriteLine($"Text: {new string(chars, 0, charsUsed)}");
}
}
UTF-8 is not working for converting byte[] to string
UTF8 is not an appropriate way of encoding arbitrary bytes as a string. Rather: it encodes arbitrary strings as bytes (and vice-versa, as long as the bytes are in the correct format). There is no reason to think that HBaseGenericHelper.GetBigEndianByteArray(startEpochInDays)
returns UTF-8 data, so encoding.GetString
is entirely inappropriate and is actually using the Encoding
backwards. This is the first topic I discussed here - so don't panic: you're in good company - people make this mistake all the time.
What you should be using is something like base-16 (hexadecimal) or base-64.
To get hex: BitConverter.ToString(byte[])
. To get base-64: Convert.ToBase64String(byte[])
If you need the data to be in a particular format that isn't base-64 or base-16, then you'll have to be specific about what format you want. But: it isn't "UTF-8 used backwards".
How to convert array of byte to String in Java?
You need to specify the encoding you want e.g. for UTF-8
String doc = ....
byte[] bytes = doc.getBytes("UTF-8");
String doc2 = new String(bytes, "UTF-8");
doc
and doc2
will be the same.
To decode a byte[]
you need to know what encoding was used to be sure it will decode correctly.
Related Topics
How to Build a Query String For a Url in C#
How to Generate Random Alphanumeric Strings
How to Fix the Flickering in User Controls
How to Check If a Number Is a Power of 2
Capturing Console Output from a .Net Application (C#)
Create an Instance of a Class from a String
Multiple Levels in MVC Custom Routing
Does Json.Net Cache Types' Serialization Information
Remove Duplicates in the List Using Linq
Why Does Property Set Throw Stackoverflow Exception
How to Convert Utf-8 Byte[] to String
What's the Fastest Way to Read a Text File Line-By-Line
Convert Integer to Hexadecimal and Back Again
How to Find the Text Within a Div in the Source of a Web Page Using C#