Encoding byte data into digits
You can think of a (single byte character) string as a base-256 encoded number where "\x00" represents 0, ' ' (space, i.e., "\x20") represents 32 and so on until "\xFF", which represents 255.
A representation only with numbers 0-9 can be accomplished simply by changing the representation to base 10.
Note that "base64 encoding" is not actually a base conversion. base64 breaks the input into groups of 3 bytes (24 bits) and does the base conversion on those groups individually. This works well because a number with 24 bits can be represented with four digits in base 64 (2^24 = 64^4).
This is more or less what el.pescado does – he splits the input data into 8-bit pieces and then converts the number into base 10. However, this technique has one disadvantage relatively to base 64 encoding – it does not align correctly with the byte boundary. To represent a number with 8-bits (0-255 when unsigned) we need three digits in base 10. However, the left-most digit has less information than the others. It can either be 0, 1 or 2 (for unsigned numbers).
A digit in base 10 stores log(10)/log(2) bits. No matter the chunk size you choose, you're never going to be able to align the representations with 8-bit bytes (in the sense of "aligning" I've described in the paragraph before). Consequently, the most compact representation is a base conversion (which you can see as if it were a "base encoding" with only one big chunk).
Here is an example with bcmath.
bcscale(0);
function base256ToBase10(string $string) {
//argument is little-endian
$result = "0";
for ($i = strlen($string)-1; $i >= 0; $i--) {
$result = bcadd($result,
bcmul(ord($string[$i]), bcpow(256, $i)));
}
return $result;
}
function base10ToBase256(string $number) {
$result = "";
$n = $number;
do {
$remainder = bcmod($n, 256);
$n = bcdiv($n, 256);
$result .= chr($remainder);
} while ($n > 0);
return $result;
}
For$string = "Mary had a little lamb";
$base10 = base256ToBase10($string);
echo $base10,"\n";
$base256 = base10ToBase256($base10);
echo $base256;
we getSince each digit encodes only
36826012939234118013885831603834892771924668323094861
Mary had a little lamb
log(10)/log(2)=~3.32193
bits expect the number to tend to be 140% longer (not 200% longer, as would be with el.pescado's answer). Convert byte array to numbers in JavaScript
This was the only way I could think of off the top of my head to do it.
function bytesToDouble(str,start) {
start *= 8;
var data = [str.charCodeAt(start+7),
str.charCodeAt(start+6),
str.charCodeAt(start+5),
str.charCodeAt(start+4),
str.charCodeAt(start+3),
str.charCodeAt(start+2),
str.charCodeAt(start+1),
str.charCodeAt(start+0)];
var sign = (data[0] & 1<<7)>>7;
var exponent = (((data[0] & 127) << 4) | (data[1]&(15<<4))>>4);
if(exponent == 0) return 0;
if(exponent == 0x7ff) return (sign) ? Number.POSITIVE_INFINITY : Number.NEGATIVE_INFINITY;
var mul = Math.pow(2,exponent - 1023 - 52);
var mantissa = data[7]+
data[6]*Math.pow(2,8*1)+
data[5]*Math.pow(2,8*2)+
data[4]*Math.pow(2,8*3)+
data[3]*Math.pow(2,8*4)+
data[2]*Math.pow(2,8*5)+
(data[1]&15)*Math.pow(2,8*6)+
Math.pow(2,52);
return Math.pow(-1,sign)*mantissa*mul;
}
var data = atob("AAAAAABsskAAAAAAAPmxQAAAAAAAKrF");
alert(bytesToDouble(data,0)); // 4716.0
alert(bytesToDouble(data,1)); // 4601.0
This should give you a push in the right direction, though it took me a while to remember how to deal with doubles.One big caveats to note though:
This relies on the atob
to do the base64 decoding, which is not supported everywhere, and aside from that probably isn't a great idea anyway. What you really want to do is unroll the base64 encoded string to an array of numbers (bytes would be the easiest to work with although not the most efficient thing on the planet). The reason is that when atob
does its magic, it returns a string, which is far from ideal. Depending on the encoding the code points it maps to (especially for code points between 128 and 255) the resulting .charCodeAt()
may not return what you expect.
And there may be some accuracy issues, because after all I am using a double to calculate a double, but I think it might be okay.
Base64 is fairly trivial to work with, so you should be able to figure that part out.
If you did switch to an array (rather than the str
string now), then you would obviously drop the .charCodeAt()
reference and just get the indices you want directly.
There is a functioning fiddle here
How to convert from []byte to int in Go Programming
You can use encoding/binary's ByteOrder to do this for 16, 32, 64 bit types
Play
package main
import "fmt"
import "encoding/binary"
func main() {
var mySlice = []byte{244, 244, 244, 244, 244, 244, 244, 244}
data := binary.BigEndian.Uint64(mySlice)
fmt.Println(data)
}
How to encode a 8 byte block using only digits (numeric characters)?
Treat the 8 bytes as a 64-bit unsigned integer and convert it to decimal and pad it to the left with zeroes. That should make for the shortest possible string, as it utilizes all available digits in all positions except the starting one.
If your data is not uniformly distributed there are other alternatives, looking into Huffman-coding so that the most commonly data patterns can be represented by shorter strings. One way is to use the first digit to encode the length of the string. All numbers except 1 in the first position can be treated as a length specifier. That way the maximum length of 20 digits will never be exceeded. (The 20th digit can only be 0 or 1, the highest 64-bit number is 18,446,744,073,709,551,615.) The exact interpretation mapping of the other digits into lengths should be based on the distribution of your patterns. If you have 10 patterns which are occuring VERY often you could e.g. reserv "0" to mean that one digit represents a complete sequence.
Any such more complicated encoding will however introduce the need for more complex packing/unpacking code and maybe even lookup tables, so it might not be worth the effort.
Problems converting byte array to string and back to byte array
It is not a good idea to store encrypted data in Strings because they are for human-readable text, not for arbitrary binary data. For binary data it's best to use byte[]
.
However, if you must do it you should use an encoding that has a 1-to-1 mapping between bytes and characters, that is, where every byte sequence can be mapped to a unique sequence of characters, and back. One such encoding is ISO-8859-1, that is:
String decoded = new String(encryptedByteArray, "ISO-8859-1");
System.out.println("decoded:" + decoded);
byte[] encoded = decoded.getBytes("ISO-8859-1");
System.out.println("encoded:" + java.util.Arrays.toString(encoded));
String decryptedText = encrypter.decrypt(encoded);
Other common encodings that don't lose data are hexadecimal and base64, but sadly you need a helper library for them. The standard API doesn't define classes for them.With UTF-16 the program would fail for two reasons:
- String.getBytes("UTF-16") adds a byte-order-marker character to the output to identify the order of the bytes. You should use UTF-16LE or UTF-16BE for this to not happen.
- Not all sequences of bytes can be mapped to characters in UTF-16. First, text encoded in UTF-16 must have an even number of bytes. Second, UTF-16 has a mechanism for encoding unicode characters beyond U+FFFF. This means that e.g. there are sequences of 4 bytes that map to only one unicode character. For this to be possible the first 2 bytes of the 4 don't encode any character in UTF-16.
base64 to reduce digits required to encode a decimal number
Each character in Base64 can represent 6 bits, so divide your ID length by 6 to see how many characters it will be. Binary data is 8 bits per byte so it will always be shorter, but the bytes won't all be readable.
Base64 will make the ID readable, but it still won't be good if the ID needs to be hand entered, like a key. For that you'll want to restrict the character set further.
How to convert a string of bytes into an int?
You can also use the struct module to do this:
>>> struct.unpack("<L", "y\xcc\xa6\xbb")[0]
3148270713L
Related Topics
Cannot Unpack Array with String Keys
How to Tag a User in a Photo Using the Facebook Graph API
Upload File Using Guzzle 6 to API Endpoint
Bool Parameter from Jquery Ajax Received as Literal String "False"/"True" in PHP
What Does the Wordpress "_E()" Function Do
How to Get the Session Id in Laravel
What Is the Best Method for Getting a Database Connection/Object into a Function in PHP
Laravel Unexpected Redirects ( 302 )
Laravel - Validate File Size When PHP Max Upload Size Limit Is Exceeded
PHP MySQL Query Where X = $Variable
Newline Not Working in PHP Mail
PHP Splitting an Array into Two Arrays - Keys Array and Values Array
Base64 Over Http Post Losing Data (Objective-C)
How to Get an Array of Data from $_Post
How Does PHP Max_Execution_Time Work
Call-Time Pass-By-Reference Has Been Removed