Java Byte Array to String to Byte Array

Java Byte Array to String to Byte Array

You can't just take the returned string and construct a string from it... it's not a byte[] data type anymore, it's already a string; you need to parse it. For example :

String response = "[-47, 1, 16, 84, 2, 101, 110, 83, 111, 109, 101, 32, 78, 70, 67, 32, 68, 97, 116, 97]";      // response from the Python script

String[] byteValues = response.substring(1, response.length() - 1).split(",");
byte[] bytes = new byte[byteValues.length];

for (int i=0, len=bytes.length; i<len; i++) {
bytes[i] = Byte.parseByte(byteValues[i].trim());
}

String str = new String(bytes);

** EDIT **

You get an hint of your problem in your question, where you say "Whatever I seem to try I end up getting a byte array which looks as follows... [91, 45, ...", because 91 is the byte value for [, so [91, 45, ... is the byte array of the string "[-45, 1, 16, ..." string.

The method Arrays.toString() will return a String representation of the specified array; meaning that the returned value will not be a array anymore. For example :

byte[] b1 = new byte[] {97, 98, 99};

String s1 = Arrays.toString(b1);
String s2 = new String(b1);

System.out.println(s1); // -> "[97, 98, 99]"
System.out.println(s2); // -> "abc";

As you can see, s1 holds the string representation of the array b1, while s2 holds the string representation of the bytes contained in b1.

Now, in your problem, your server returns a string similar to s1, therefore to get the array representation back, you need the opposite constructor method. If s2.getBytes() is the opposite of new String(b1), you need to find the opposite of Arrays.toString(b1), thus the code I pasted in the first snippet of this answer.

How to convert byte array to string and vice versa?

Your byte array must have some encoding. The encoding cannot be ASCII if you've got negative values. Once you figure that out, you can convert a set of bytes to a String using:

byte[] bytes = {...}
String str = new String(bytes, StandardCharsets.UTF_8); // for UTF-8 encoding

There are a bunch of encodings you can use, look at the supported encodings in the Oracle javadocs.

Java: Converting byte string to byte array

When converting byte[] to String, you should use this,

new String(b, "UTF-8");

instead of,

b.toString();

When you are converting byte array to String, you should always specify a character encoding and use the same encoding while converting back to byte array from String. Best is to use UTF-8 encoding as that is quite powerful and compact encoding and can represent over a million characters. If you don't specify a character encoding, then platform's default encoding may be used which may not be able to represent all characters properly when converted from byte array to String.

Your method when dealt appropriately, should be written something like this,

    public static void main(String args[]) throws Exception {
byte[] b = myFunction();
// String bstring = b.toString(); // don't do this
String bstring = new String(b, "UTF-8");
byte[] ser = bstring.getBytes("UTF-8");
/* Here the methode to convert the bstring to byte[], and call it ser */
String deser = new String(ser, "UTF-8");
}

How to convert a String array to a Byte array? (java)

Array to Array you should convert manually with parsing into both sides, but if you have just a String you can String.getBytes() and new String(byte[] data);
like this

public static void main(String[] args) {
String[] strings = new String[]{"first", "second"};
System.out.println(Arrays.toString(strings));
byte[][] byteStrings = convertToBytes(strings);
strings = convertToStrings(byteStrings);
System.out.println(Arrays.toString(strings));

}

private static String[] convertToStrings(byte[][] byteStrings) {
String[] data = new String[byteStrings.length];
for (int i = 0; i < byteStrings.length; i++) {
data[i] = new String(byteStrings[i], Charset.defaultCharset());

}
return data;
}

private static byte[][] convertToBytes(String[] strings) {
byte[][] data = new byte[strings.length][];
for (int i = 0; i < strings.length; i++) {
String string = strings[i];
data[i] = string.getBytes(Charset.defaultCharset()); // you can chose charset
}
return data;
}

for one byte[] from string[] you have to:

  • to byteArray concat byte arrays from each string using some delimeter
  • from bytearray split by te same delimiter and create String as I
    described above.

encoding decoding of byte array to string without data loss

First: ISO-8859-1 does not cause any data loss if an arbitrary byte array is converted to string using this encoding. Consider the following program:

public class BytesToString {
public static void main(String[] args) throws Exception {
// array that will contain all the possible byte values
byte[] bytes = new byte[256];
for (int i = 0; i < 256; i++) {
bytes[i] = (byte) (i + Byte.MIN_VALUE);
}

// converting to string and back to bytes
String str = new String(bytes, "ISO-8859-1");
byte[] newBytes = str.getBytes("ISO-8859-1");

if (newBytes.length != 256) {
throw new IllegalStateException("Wrong length");
}
boolean mismatchFound = false;
for (int i = 0; i < 256; i++) {
if (newBytes[i] != bytes[i]) {
System.out.println("Mismatch: " + bytes[i] + "->" + newBytes[i]);
mismatchFound = true;
}
}
System.out.println("Whether a mismatch was found: " + mismatchFound);
}
}

It builds an array of bytes with all possible byte values, then it converts it to String using ISO-8859-1 and then back to bytes using the same encoding.

This program outputs Whether a mismatch was found: false, so bytes->String->bytes conversion via ISO-8859-1 yields the same data as it was in the beginning.

But, as it was pointed out in the comments, String is not a good container for binary data. Specifically, such a string will almost surely contain unprintable characters, so if you print it or try to pass it via HTML or some other means, you will get some problems (data loss, for example).

If you really need to convert byte array to a string (and use it opaquely), use base64 encoding:

String stringRepresentation = Base64.getEncoder().encodeToString(bytes);
byte[] decodedBytes = Base64.getDecoder().decode(stringRepresentation);

It takes more space, but the resulting string is safe in regard to printing.

How to convert Java String into byte[]?

The object your method decompressGZIP() needs is a byte[].

So the basic, technical answer to the question you have asked is:

byte[] b = string.getBytes();
byte[] b = string.getBytes(Charset.forName("UTF-8"));
byte[] b = string.getBytes(StandardCharsets.UTF_8); // Java 7+ only

However the problem you appear to be wrestling with is that this doesn't display very well. Calling toString() will just give you the default Object.toString() which is the class name + memory address. In your result [B@38ee9f13, the [B means byte[] and 38ee9f13 is the memory address, separated by an @.

For display purposes you can use:

Arrays.toString(bytes);

But this will just display as a sequence of comma-separated integers, which may or may not be what you want.

To get a readable String back from a byte[], use:

String string = new String(byte[] bytes, Charset charset);

The reason the Charset version is favoured, is that all String objects in Java are stored internally as UTF-16. When converting to a byte[] you will get a different breakdown of bytes for the given glyphs of that String, depending upon the chosen charset.

Problems converting byte array to string and back to byte array

It is not a good idea to store encrypted data in Strings because they are for human-readable text, not for arbitrary binary data. For binary data it's best to use byte[].

However, if you must do it you should use an encoding that has a 1-to-1 mapping between bytes and characters, that is, where every byte sequence can be mapped to a unique sequence of characters, and back. One such encoding is ISO-8859-1, that is:

    String decoded = new String(encryptedByteArray, "ISO-8859-1");
System.out.println("decoded:" + decoded);

byte[] encoded = decoded.getBytes("ISO-8859-1");
System.out.println("encoded:" + java.util.Arrays.toString(encoded));

String decryptedText = encrypter.decrypt(encoded);

Other common encodings that don't lose data are hexadecimal and base64, but sadly you need a helper library for them. The standard API doesn't define classes for them.

With UTF-16 the program would fail for two reasons:

  1. String.getBytes("UTF-16") adds a byte-order-marker character to the output to identify the order of the bytes. You should use UTF-16LE or UTF-16BE for this to not happen.
  2. Not all sequences of bytes can be mapped to characters in UTF-16. First, text encoded in UTF-16 must have an even number of bytes. Second, UTF-16 has a mechanism for encoding unicode characters beyond U+FFFF. This means that e.g. there are sequences of 4 bytes that map to only one unicode character. For this to be possible the first 2 bytes of the 4 don't encode any character in UTF-16.

byte array to string and inverse : recoverded byte array is NOT match to original byte array in Java

You cannot convert an arbitrary sequence of bytes to String and expect the reverse conversion to work. You will need to use an encoding like Base64 to preserve an arbitrary sequence of bytes. (This is available from several places -- built into Java 8, and also available from Guava and Apache Commons.)

For example, with Java 8,

String encoded = Base64.getEncoder().encodeToString(myByteArray);

is reversible with

byte[] decoded = Base64.getDecoder().decode(encoded);

convert ByteArray to String to ByteArray

[B@... is the format that a JVM byte array's .toString returns, and is just [B (which means "byte array") and a hex-string which is analogous to the memory address at which the array resides (I'm deliberately not calling it a pointer but it's similar; the precise mapping of that hex-string to a memory address is JVM-dependent and could be affected by things like which garbage collector is in use). The important thing is that two different arrays with the same bytes in them will have different .toStrings. Note that in some places (e.g. the REPL), Scala will instead print something like Array(-127, 0, 0, 1) instead of calling .toString: this may cause confusion.

It appears that toByteArray emits a new array each time it's called. So the first time you call newPerson.toByteArray, you get an array at a location corresponding to 50da041d. The second time you call it you get a byte array with the same contents at a location corresponding to 7709e969 and you save the string [B@7709e969 into the variable l. When you then call getBytes on that string (saving it in l1), you get a byte array which is an encoding of the string "[B@7709e969" at the location corresponding to f44b405.

So at the locations corresponding to 50da041d and 7709e969 you have two different byte arrays which happen to contain the same elements (those elements being the bytes in the proto representation of newPerson). At the location corresponding to f44b405 you have a byte array where the bytes encode (in some character set, probably UTF-16?) [B@7709e969.

Because a proto isn't really a string, there's no general way to get a useful string (depending on what definition of useful you're dealing with). You could try interpreting a byte array from toByteArray as a string with a given character encoding, but there's no guarantee that any given proto will be valid in an arbitrary character encoding.

An encoding which is purely 8-bit, like ISO-8859-1 is guaranteed to at least be decodable from a byte array, but there could be non-printable or control characters, so it's not likely to that useful:

val iso88591Representation = new String(newPerson.toByteArray, java.nio.charset.StandardCharsets.ISO_8859_1)

Alternatively, you might want a representation like how the Scala REPL will (sometimes) render it:

"Array(" + newPerson.toByteArray.mkString(", ") + ")"


Related Topics



Leave a reply



Submit