What Does ["String"].Pack('H*') Mean

What does [string].pack('H*') mean?

It interprets the string as hex numbers, two characters per byte, and converts it to a string with the characters with the corresponding ASCII code:

["464F4F"].pack('H*')  # =>  "FOO", 0x46 is the code for 'F', 0x4F the code for 'O'

For the opposite conversion, use unpack:

'FOO'.unpack('H*')     # => ["464f4f"]

It is a little bit more difficult for non-ASCII-8BIT encodings:

"á".encoding                                # => #<Encoding:UTF-8>
"á".unpack('H*') # => ["c3a1"]
['c3a1'].pack('H*') # => "\xC3\xA1"
['c3a1'].pack('H*').encoding # => #<Encoding:ASCII-8BIT>
['c3a1'].pack('H*').force_encoding('UTF-8') # => "á"

Validate a string using pack('H*')

Your $key is the return value from pack, which in this case is a binary string (essentially raw binary values). See the first line in the documentation for the pack() function return value: http://php.net/manual/en/function.pack.php

Pack given arguments into binary string [emphasis added] according to format.

You would normally base64 encode a binary string before attempting any kind of output, because by definition, a binary string may (and often does) include non-printable characters, or worse - terminal control/escape sequences which can hose up your screen.

Think of it like printing a raw Word or Excel file: you'll probably see recognizable values (although in this case occasional alpha-numerics), but lots of garbage too.

Base64 encoding is a technique to inspect these strings in a safe way.

But what your question implies is that you are very much entering this territory new. You should probably take a look at the Matasano crypto tutorial here: http://www.matasano.com/articles/crypto-challenges/. It is an excellent starting point, and completing exercise #1 in it (maybe 20 minutes of work) will shed complete light on your question above.

What is this unpack doing? Can someone help me understand just a few letters?

Here is a good explanation of Ruby's pack and unpack methods.

According to your question:

> ['A'].pack('H')
=> "\xA0"

A byte consists of 8 bits. A nibble consists of 4 bits. So a byte has two nibbles. The ascii value of ‘h’ is 104. Hex value of 104 is 68. This 68 is stored in two nibbles. First nibble, meaning 4 bits, contain the value 6 and the second nibble contains the value 8. In general we deal with high nibble first and going from left to right we pick the value 6 and then 8.

In the above case the input ‘A’ is not ASCII ‘A’ but the hex ‘A’. Why is it hex ‘A’. It is hex ‘A’ because the directive ‘H’ is telling pack to treat input value as hex value. Since ‘H’ is high nibble first and since the input has only one nibble then that means the second nibble is zero. So the input changes from ['A'] to ['A0'] .

Since hex value A0 does not translate into anything in the ASCII table the final output is left as it and hence the result is \xA0. The leading \x indicates that the value is hex value.

When would you use unpack('h*' ...) or pack('h*' ...)?

Recall in the bad 'ole days of MS-DOS that certain OS functions were controlled by setting high nibble and low nibbles on a register and performing an Interupt xx. For example, Int 21 accessed many file functions. You would set the high nibble as the drive number -- who will have more than 15 drives?? The low nibble as the requested function on that drive, etc.

Here is some old CPAN code that uses pack as you describe to set the registers to perform an MS-DOS system call.

Blech!!! I don't miss MS-DOS at all...

--Edit

Here is specific source code: Download Perl 5.00402 for DOS HERE, unzip,

In file Opcode.pm and Opcode.pl you see the use of unpack("h*",$_[0]); here:

sub opset_to_hex ($) {
return "(invalid opset)" unless verify_opset($_[0]);
unpack("h*",$_[0]);
}

I did not follow the code all the way through, but my suspicion is this is to recover info from an MS-DOS system call...

In perlport for Perl 5.8-8, you have these suggested tests for endianess of the target:

Different CPUs store integers and floating point numbers in different
orders (called endianness) and widths (32-bit and 64-bit being the
most common today). This affects your programs when they attempt to transfer
numbers in binary format from one CPU architecture to another,
usually either “live” via network connection, or by storing the
numbers to secondary storage such as a disk file or tape.

Conflicting storage orders make utter mess out of the numbers. If a
little-endian host (Intel, VAX) stores 0x12345678 (305419896 in
decimal), a big-endian host (Motorola, Sparc, PA) reads it as
0x78563412 (2018915346 in decimal). Alpha and MIPS can be either:
Digital/Compaq used/uses them in little-endian mode; SGI/Cray uses
them in big-endian mode. To avoid this problem in network (socket)
connections use the pack and unpack formats n and N, the
“network” orders. These are guaranteed to be portable.

As of perl 5.8.5, you can also use the > and < modifiers
to force big- or little-endian byte-order. This is useful if you want
to store signed integers or 64-bit integers, for example.

You can explore the endianness of your platform by unpacking a
data structure packed in native format such as:

   print unpack("h*", pack("s2", 1, 2)), "\n";
# '10002000' on e.g. Intel x86 or Alpha 21064 in little-endian mode
# '00100020' on e.g. Motorola 68040

If you need to distinguish between endian architectures you could use
either of the variables set like so:

   $is_big_endian    = unpack("h*", pack("s", 1)) =~ /01/;
$is_little_endian = unpack("h*", pack("s", 1)) =~ /^1/;

Differing widths can cause truncation even between platforms of equal
endianness. The platform of shorter width loses the upper parts of the
number. There is no good solution for this problem except to avoid
transferring or storing raw binary numbers.

One can circumnavigate both these problems in two ways. Either
transfer and store numbers always in text format, instead of raw
binary, or else consider using modules like Data::Dumper (included in
the standard distribution as of Perl 5.005) and Storable (included as
of perl 5.8). Keeping all data as text significantly simplifies matters.

The v-strings are portable only up to v2147483647 (0x7FFFFFFF), that's
how far EBCDIC, or more precisely UTF-EBCDIC will go.

It seems that unpack("h*",...) is used more often than pack("h*",...). I did note that return qq'unpack("F", pack("h*", "$hex"))'; is used in Deparse.pm and IO-Compress uses pack("*h",...) in Perl 5.12

If you want further examples, here is a Google Code Search list. You can see pack|unpack("h*"...) is fairly rare and mostly to do with determining platform endianess...

PHP to python pack('H')

I think you need it the other way around. "Dummy String" is not a valid number in hex. You can hexlify it:

>>> binascii.hexlify('Dummy String')
'44756d6d7920537472696e67'

but not unhexlify it. unhexlify takes a string in hex and turns it into it's ASCII representation:

>>> binascii.unhexlify('44756d6d7920537472696e67')
'Dummy String'

What you need is to md5 the string ("Dummy String" in our case) and unhexlify it's hash:

import binascii
import hashlib

the_hash = hashlib.md5('Dummy String').hexdigest()
print the_hash
the_unhex = binascii.unhexlify(the_hash)
print the_unhex

Which yields the hash, and the unhexlified hash:

ec041da9f891c09b3d1617ba5057b3f5
ЛLЬ-ю?=¦PWЁУ

Note: although the output doesn't look exactly like yours - "??????=?PW??", the "PW" and "=" in both, makes me pretty certain it's correct.

More on hashlib and binascii

How can I convert a String into a Char list?

As was already said, String is simply a synonym for [Char]

type String = [Char]

so both can be used interchangeably.

In particular, "hello" :: [Char] is exactly the same as "hello" :: String, both are just more elegant ways of writing ['h','e','l','l','o'].

That said, you'll find that not everything that would be a “String” in other languages is a String in Haskell. See, the list implementation is actually really inefficient in particular memory-wise – for an ASCII string, most languages take either 8 or 16 bit per character, but with Haskell's String type each character is a 64-bit Char plus a reference to the next character, for a total 128 bits!

That's why most modern Haskell libraries avoid String, except for short things like file names. (Incidentally,

type FilePath = String

so that is also interchangeable.)

What these libraries use for general string is typically Text, which is indeed a different type, corresponding more to other languages' implementations (it uses UTF-16 under the hood).

If you want to filter a value of that type, you can either convert it to a listy-String with unpack, or you can simply use the dedicated version of filter provided by the text library.

In standard Haskell, Text values can not be defined as string- or list literals, you'd need to explicitly wrap that like pack ['h','e','l','l','o']. However they can still be defined with a simple string literal, provided that you turn on {-# LANGUAGE OverloadedStrings #-}:

ghci> :m +Data.Text
ghci> "hello" :: Text

<interactive>:5:1: error:
• Couldn't match expected type ‘Text’ with actual type ‘[Char]’
• In the expression: "hello" :: Text
In an equation for ‘it’: it = "hello" :: Text

ghci> :set -XOverloadedStrings
ghci> "hello" :: Text
"hello"

With another extension, this also works for the list syntax:

ghci> ['h','e'] :: Text

<interactive>:9:1: error:
• Couldn't match expected type ‘Text’ with actual type ‘[Char]’
• In the expression: ['h', 'e'] :: Text
In an equation for ‘it’: it = ['h', 'e'] :: Text

ghci> :set -XOverloadedLists
ghci> ['h','e'] :: Text
"he"

Ruby pack and unpack. How is this hexadecimal conversion being done? Could use some assistance

encrypted_message is a string starting with the characters .YI. Let's convert those characters to Hex and then binary using the ASCII table:

ASCII  .        |Y        |I
Hex 2 e |5 9 |4 9
Binary 0010 1110|0101 1001|0100 1001

Notice that the hex is what you see at the beginning of the unpack(H*) result. If you were to call encrypted_message.unpack("B*") (bit string), you would similarly see it start with

001011100101100101001001

The point is

  1. There is no "encrypted message format". encrypted_message is meaningless, structureless binary data.
  2. When you call unpack, you're saying "Take this meaningless binary data, and show it to me with a different representation." In this case, hexadecimal digits. You can similarly see it as binary, like I did above. Or you can look at it as ASCII characters (the default), with \x indicating a byte that doesn't have an ASCII representation. It's all the same binary data just being presented in different human-readable ways.

Python equivalent for Perl's pack(H*, $string)

>>> T = (1, 2, 3)
>>> struct.pack('H' * len(T), *T)
'\x01\x00\x02\x00\x03\x00'

EDIT:

>>> "01020304deadbeef".decode('hex')
'\x01\x02\x03\x04\xde\xad\xbe\xef'


Related Topics



Leave a reply



Submit