How to Compress/Decompress a Long Query String in PHP

How to compress/decompress a long query string in PHP?

The basic premise is very difficult. Transporting any value in the URL means you're restricted to a subset of ASCII characters. Using any sort of compression like gzcompress would reduce the size of the string, but result in a binary blob. That binary blob can't be transported in the URL though, since it would produce invalid characters. To transport that binary blob using a subset of ASCII you need to encode it in some way and turn it into ASCII characters.

So, you'd turn ASCII characters into something else which you'd then turn into ASCII characters.

But actually, most of the time the ASCII characters you start out with are already the optimal length. Here a quick test:

$str = 'Hello I am a very very very very long search string';
echo $str . "\n";
echo base64_encode(gzcompress($str, 9)) . "\n";
echo bin2hex(gzcompress($str, 9)) . "\n";
echo urlencode(gzcompress($str, 9)) . "\n";

Hello I am a very very very very long search string
eNrzSM3JyVfwVEjMVUhUKEstqkQncvLz0hWKUxOLkjMUikuKMvPSAc+AEoI=
78daf348cdc9c957f05448cc554854284b2daa442772f2f3d2158a53138b9233148a4b8a32f3d201cf801282
x%DA%F3H%CD%C9%C9W%F0TH%CCUHT%28K-%AAD%27r%F2%F3%D2%15%8AS%13%8B%923%14%8AK%8A2%F3%D2%01%CF%80%12%82

As you can see, the original string is the shortest. Among the encoded compressions, base64 is the shortest since it uses the largest alphabet to represent the binary data. It's still longer than the original though.

For some very specific combination of characters with some very specific compression algorithm that compresses to ASCII representable data it may be possible to achieve some compression, but that's rather theoretical. Update: Actually, that sounds too negative. The thing is you need to figure out if compression makes sense for your use case. Different data compresses differently and different encoding algorithms work differently. Also, longer strings may achieve a better compression ratio. There's probably a sweet spot somewhere where some compression can be achieved. You need to figure out if you're in that sweet spot most of the time or not.

Something like md5 is unsuitable since md5 is a hash, which means it's non-reversible. You can't get the original value back from it.

I'm afraid you can only send the parameter via POST, if it doesn't work in the URL.

Compress string, then uncompress the string?

Yes you can compress and uncompress strings in PHP (Demo):

$str = 'Hello I am a very very very very long string';
$compressed = gzcompress($str, 9);
$uncompressed = gzuncompress($compressed);

echo $str, "\n";
echo $uncompressed, "\n";
echo base64_encode($compressed), "\n";
echo bin2hex($compressed), "\n";
echo urlencode($compressed), "\n";

However MD5 is not compressing but hashing.

See as well: How to compress/decompress a long query string in PHP?

Shortest possible query string for a numerically indexed array in PHP

Default PHP way

What http_build_query does is a common way to serialize arrays to URL. PHP automatically deserializes it in $_GET.

When wanting to serialize just a (non-associative) array of integers, you have other options.

Small arrays

For small arrays, conversion to underscore-separated list is quite convenient and efficient. It is done by $fs = implode('_', $fs). Then your URL would look like this:

http://example.com/?c=asdf&fs=5_12_99

The downside is that you’ll have to explicitly explode('_', $_GET['fs']) to get the values back as an array.

Other delimiters may be used too. Underscore is considered alphanumeric and as such rarely has special meaning. In URLs, it is usually used as space replacement (e.g. by MediaWiki). It is hard to distinguish when used in underlined text. Hyphen is another common replacement for space. It is also often used as minus sign. Comma is a typical list separator, but unlike underscore and hyphen in is percent-encoded by http_build_query and has special meaning almost everywhere. Similar situation is with vertical bar (“pipe”).

Large arrays

When having large arrays in URLs, you should first stop coding a start thinking. This almost always indicates bad design. Wouldn’t POST HTTP method be more appropriate? Don’t you have any more readable and space efficient way of identifying the addressed resource?

URLs should ideally be easy to understand and (at least partially) remember. Placing a large blob inside is really a bad idea.

Now I warned you. If you still need to embed a large array in URL, go ahead. Compress the data as much as you can, base64-encode them to convert the binary blob to text and url-encode the text to sanitize it for embedding in URL.

Modified base64

Mmm. Or better use a modified version of base64. The one of my choice is using

  • - instead of +,
  • _ instead of / and
  • omits the padding =.
define('URL_BASE64_FROM', '+/');
define('URL_BASE64_TO', '-_');
function url_base64_encode($data) {
$encoded = base64_encode($data);
if ($encoded === false) {
return false;
}
return str_replace('=', '', strtr($encoded, URL_BASE64_FROM, URL_BASE64_TO));
}
function url_base64_decode($data) {
$len = strlen($data);
if (is_null($len)) {
return false;
}
$padded = str_pad($data, 4 - $len % 4, '=', STR_PAD_RIGHT);
return base64_decode(strtr($padded, URL_BASE64_TO, URL_BASE64_FROM));
}

This saves two bytes on each character, that would be percent-encoded otherwise. There is no need to call urlencode function, too.

Compression

Choice between gzip (gzcompress) and bzip2 (bzcompress) should be made. Do not want to invest time in their comparison, gzip looks better on several relatively small inputs (around 100 chars) for any setting of block size.

Packing

But what data should be fed into the compression algorithm?

In C, one would cast array of integers to array of chars (bytes) and hand it over to the compression function. That’s the most obvious way to do things. In PHP the most obvious way to do things is converting all the integers to their decimal representation as strings, then concatenation using delimiters, and only after that compression. What a waste of space!

So, let’s use the C approach! We’ll get rid of the delimiters and otherwise wasted space and encode each integer in 2 bytes using pack:

define('PACK_NUMS_FORMAT', 'n*');
function pack_nums($num_arr) {
array_unshift($num_arr, PACK_NUMS_FORMAT);
return call_user_func_array('pack', $num_arr);
}
function unpack_nums($packed_arr) {
return unpack(PACK_NUMS_FORMAT, $packed_arr);
}

Warning: pack and unpack behavior is machine-dependent in this case. Byte order could change between machines. But I think it will not be a problem in practice, because the application will not run on two systems with different endianity at the same time. When integrating multiple systems, though, the problem might arise. Also if you switch to a system with different endianity, links using the original one will break.

Encoding together

Now packing, compression and modified base64, all in one:

function url_embed_array($arr) {
return url_base64_encode(gzcompress(pack_nums($arr)));
}
function url_parse_array($data) {
return unpack_nums(gzuncompress(url_base64_decode($data)));
}

See the result on IdeOne. It is better than OP’s answer where on his 40-element array my solution produced 91 chars while his one 98. When using range(1, 1000) (generates array(1, 2, 3, …, 1000)) as a benchmark, OP’s solution produces 2712 characters while mine just 2032 characters. This is about 25 % better.

For the sake of completeness, OP’s solution is

function url_embed_array($arr) {
return urlencode(base64_encode(gzcompress(implode(',', $arr))));
}

How to implement mysql compress() function in php

haven't ever done this, but here are some thoughts:

1) find the length of the uncompressed string... strlen() function ought to work

2) compress the string... you've already done this part

3) pack both together for storage in mysql, formatting the number as mysql wants it:

php's pack function: sounds like you need to use format value "V" for the length (unsigned long... 32 bit, little endian byte order)

Compress JSON string in PHP and decompress in Javascript for Database query for Google API

@Gavin gave the right answer in a comment above:

His answer was: Look into GZIP'ing your content. If your server is setup correctly, it can compress any application/json content and then your browser "should" automatically decompress it. http://bearpanther.com/2012/04/11/gzip-json-generated-on-the-fly



Related Topics



Leave a reply



Submit