Convert a String Representation of a Hex Dump to a Byte Array Using Java

Convert a string representation of a hex dump to a byte array using Java?

Update (2021) - Java 17 now includes java.util.HexFormat (only took 25 years):

HexFormat.of().parseHex(s)


For older versions of Java:

Here's a solution that I think is better than any posted so far:

/* s must be an even-length string. */
public static byte[] hexStringToByteArray(String s) {
int len = s.length();
byte[] data = new byte[len / 2];
for (int i = 0; i < len; i += 2) {
data[i / 2] = (byte) ((Character.digit(s.charAt(i), 16) << 4)
+ Character.digit(s.charAt(i+1), 16));
}
return data;
}

Reasons why it is an improvement:

  • Safe with leading zeros (unlike BigInteger) and with negative byte values (unlike Byte.parseByte)

  • Doesn't convert the String into a char[], or create StringBuilder and String objects for every single byte.

  • No library dependencies that may not be available

Feel free to add argument checking via assert or exceptions if the argument is not known to be safe.

In Java, how do I convert a hex string to a byte[]?

 String s="f263575e7b00a977a8e9a37e08b9c215feb9bfb2f992b2b8f11e";
byte[] b = new BigInteger(s,16).toByteArray();

Is there a way to convert Hex string to bytes using Java streams?

The simplest way to convert a hex string to a byte array, is JDK 17’s HexFormat.parseHex(…).

byte[] bytes = HexFormat.of().parseHex("c0ffeec0de");
System.out.println(Arrays.toString(bytes));
System.out.println(HexFormat.of().formatHex(bytes));
[-64, -1, -18, -64, -34]
c0ffeec0de

This is the most convenient method, as can also handle formatted input, e.g.

byte[] bytes = HexFormat.ofDelimiter(" ").withPrefix("0x")
.parseHex("0xc0 0xff 0xee 0xc0 0xde");

Note that if you have to process an entire file, even a straight-forward

String s = Files.readString(pathToYourFile);
byte[] bytes = HexFormat.of().parseHex(s);

may run with reasonable performance, as long as you have enough temporary memory. If the preconditions are met, which is the case for ASCII based charsets and hex strings, the readString method will read into an array which will become the resulting string’s backing buffer. In other words, the implicit copying between buffers, intrinsic to other approaches, is skipped.

There’s some time spent in checking the preconditions though, which we could skip:

String s = Files.readString(pathToYourFile, StandardCharsets.ISO_8859_1);
byte[] bytes = HexFormat.of().parseHex(s);

This enforces the same encoding used by the compact strings since JDK 9. Since hex strings consist of ASCII characters only, it will correctly interpret all sources whose charset is ASCII based¹. Only for incorrect sources, a misinterpretation of the wrong characters may occur in the exception message.

It’s hard to beat that and if using JDK 17 is an option, trying an alternative is not worth the effort. But if you are using an older JDK, you may parse a file like

byte[] bytes;
try(FileChannel fch = FileChannel.open(pathToYourFile, StandardOpenOption.READ)) {
bytes = hexStringToBytes(fch.map(MapMode.READ_ONLY, 0, fch.size()));
}
public static byte[] hexStringToBytes(ByteBuffer hexBytes) {
byte[] bytes = new byte[hexBytes.remaining() >> 1];
for(int i = 0; i < bytes.length; i++)
bytes[i] = (byte)((Character.digit(hexBytes.get(), 16) << 4)
| Character.digit(hexBytes.get(), 16));
return bytes;
}

This does also utilize the fact that hex strings are ASCII based, so unless you use a rather uncommon charset/encoding, we can process the file data short-cutting the charset conversions. This approach will also work if there’s not enough physical memory to keep the entire file, but then, the performance will be lower, of course.

The file also must not be larger than 2GiB to use a single memory mapping operation. Performing the operation in multiple memory mapping steps is possible, but you’ll soon run into the array length limit for the result, so if that’s an issue, you have to rethink the entire approach anyway.

¹ so this won’t work for UTF-16 nor EBCDIC, the only two counter examples you might have to deal with in real life, though even these are very rare.

Hex string to binary in Java

I recommend you use a fairly simple method BigInteger::toString(int radix) which returns the String representation in the given radix. Use 2 for the binary representation.

// 100100011010001010110011110001001101010111100110111101111
new BigInteger("0123456789ABCDEF", 16).toString(2);

Note the String must be blank characters free and using this way you have to process them from the array each one-by-one.

Convert string representation of a hexadecimal byte array to a string with non ascii characters in Java

The stuff within the square brackets, seems to be characters encoded in UTF-8 but converted into a hexadecimal string in a weird way. What you can do is find each instance that looks like [0xc3] and convert it into the corresponding byte, and then create a new string from the bytes.

Unfortunately there are no good tools for working with byte arrays. Here's a quick and dirty solution that uses regex to find and replace these hex codes with the corresponding character in latin-1, and then fixes that by re-interpreting the bytes.

String bracketDecode(String str) {
Pattern p = Pattern.compile("\\[(0x[0-9a-f]{2})\\]");
Matcher m = p.matcher(str);
StringBuilder sb = new StringBuilder();
while (m.find()) {
String group = m.group(1);
Integer decode = Integer.decode(group);
// assume latin-1 encoding
m.appendReplacement(sb, Character.toString(decode));
}
m.appendTail(sb);
// oh no, latin1 is not correct! re-interpret bytes in utf-8
byte[] bytes = sb.toString().getBytes(StandardCharsets.ISO_8859_1);
return new String(bytes, StandardCharsets.UTF_8);
}

How to convert a byte array to a hex string in Java?

From the discussion here, and especially this answer, this is the function I currently use:

private static final char[] HEX_ARRAY = "0123456789ABCDEF".toCharArray();
public static String bytesToHex(byte[] bytes) {
char[] hexChars = new char[bytes.length * 2];
for (int j = 0; j < bytes.length; j++) {
int v = bytes[j] & 0xFF;
hexChars[j * 2] = HEX_ARRAY[v >>> 4];
hexChars[j * 2 + 1] = HEX_ARRAY[v & 0x0F];
}
return new String(hexChars);
}

My own tiny benchmarks (a million bytes a thousand times, 256 bytes 10 million times) showed it to be much faster than any other alternative, about half the time on long arrays. Compared to the answer I took it from, switching to bitwise ops --- as suggested in the discussion --- cut about 20% off of the time for long arrays. (Edit: When I say it's faster than the alternatives, I mean the alternative code offered in the discussions. Performance is equivalent to Commons Codec, which uses very similar code.)

2k20 version, with respect to Java 9 compact strings:

private static final byte[] HEX_ARRAY = "0123456789ABCDEF".getBytes(StandardCharsets.US_ASCII);
public static String bytesToHex(byte[] bytes) {
byte[] hexChars = new byte[bytes.length * 2];
for (int j = 0; j < bytes.length; j++) {
int v = bytes[j] & 0xFF;
hexChars[j * 2] = HEX_ARRAY[v >>> 4];
hexChars[j * 2 + 1] = HEX_ARRAY[v & 0x0F];
}
return new String(hexChars, StandardCharsets.UTF_8);
}


Related Topics



Leave a reply



Submit