JavaScript Implementation of Gzip

Is there a way to use browser's native gzip decompression using Javascript?

I feel that there should be a way to use the browser's native functionality to un-gzip. If I am wrong can someone explain why this browser functionality is not exposed as a javascript API?

No, there shouldn't be, unless it is in w3c standard and it is not. The only standard which says something about gzip compression is an HTTP standard.

I really believe it won't become standard because there are thousands of algorithms (for compression, encryption, etc.) which you may want to use and browsers cannot handle them all; it would also be unfair to create interfaces for one algorithm while not creating them for another.

HTTP protocol is a kind of exception. The compression here is done to make the life of millions of people easier. The HTTP is a bottle neck in web performance, so as long as the compression is available there I cannot imagine any case when you elsewhere need to use a compression in JavaScript. The only case I know is compressing items in a localStorage / indexedDB, but even there gzip won't work because it is not producing UTF-16 output.

That is why this is not in the standard and this is why it most likely won't appear there.

Your particular case is a server side implementation error. Using compressed output without a proper header really smells. Either don't use the compression or do it right.

And is there a way to un-gzip a file in javascript without adding an additional library?

Actually other than that there is a possible solution: create a browser extension which injects a proper header in a server response and you won't need a library but will need to distribute the extension to the users. Could be even worse, but still may work.

Implement GZIP decompression with deflate algorithm

I found that the deflate spec does not include headers, but zlib does, and many libraries call zlib decompression inflate, while calling true inflation inflateRaw. I changed my method calls from inflate to inflateRaw and it immediately worked.

I need JSZip and gzip for my web page, and JSZip has all the ingredients, but hides them in ways I can't crack

As Evert was saying, I should have checked first for the build instructions in the documentation https://stuk.github.io/jszip/documentation/contributing.html.

From that it is clear, first one needs git and makes a local clone. Then one needs to set up the grunt command line, which requires, npm, which comes with nodejs. Once grunt runs, there are other dependencies that need to be npm install-ed. It's the usual little things off and not working, but enough Googling and brute force retrying to get it done.

Now jszip/lib/index.js contains the resource that is finally exported. It is that JSZip object. So just to play with the internal stuff, I could add these to the JSZip object, for example, it already contains:

JSZip.external = require("./external");
module.exports = JSZip;

and so we can easily add other resources we want to play with:

JSZip.flate = require("./flate");
JSZip.DataWorker = require('./stream/DataWorker');
JSZip.DataLengthProbe = require('./stream/DataLengthProbe');
JSZip.Crc32Probe = require('./stream/Crc32Probe');
JSZip.StreamHelper = require('./stream/StreamHelper');
JSZip.pako = require("pako");

Now with that, I can create a proof of concept in the Chrome debugger:

(new JSZip.StreamHelper(
(new JSZip.DataWorker(Promise.resolve("Hello World! Hello World! Hello World! Hello World! Hello World! Hello World!")))
.pipe(new JSZip.DataLengthProbe("uncompressedSize"))
.pipe(new JSZip.Crc32Probe())
.pipe(JSZip.flate.compressWorker({}))
.pipe(new JSZip.DataLengthProbe("compressedSize"))
.on("end", function(event) { console.log("onEnd: ", this.streamInfo) }),
"uint8array", "")
).accumulate(function(data) { console.log("acc: ", data); })
.then(function(data) { console.log("then: ", data); })

and this works. I have been making myself a GZipFileStream with gzip header and trailer, creating everything correctly. I put a jszip/lib/generate/GZipFileWorker.js in as follows:

'use strict';

var external = require('../external');
var utils = require('../utils');
var flate = require('../flate');
var GenericWorker = require('../stream/GenericWorker');
var DataWorker = require('../stream/DataWorker');
var StreamHelper = require('../stream/StreamHelper');
var DataLengthProbe = require('../stream/DataLengthProbe');
var Crc32Probe = require('../stream/Crc32Probe');

function GZipFileWorker() {
GenericWorker.call(this, "GZipFileWorker");
this.virgin = true;
}
utils.inherits(GZipFileWorker, GenericWorker);

GZipFileWorker.prototype.processChunk = function(chunk) {
if(this.virgin) {
this.virgin = false;
var headerBuffer = new ArrayBuffer(10);
var headerView = new DataView(headerBuffer);
headerView.setUint16(0, 0x8b1f, true); // GZip magic
headerView.setUint8(2, 0x08); // compression algorithm DEFLATE
headerView.setUint8(3, 0x00); // flags
// bit 0 FTEXT
// bit 1 FHCRC
// bit 2 FEXTRA
// bit 3 FNAME
// bit 4 FCOMMENT
headerView.setUint32(4, (new Date()).getTime()/1000>>>0, true);
headerView.setUint8(8, 0x00); // no extension headers
headerView.setUint8(9, 0x03); // OS type UNIX
this.push({data: new Uint8Array(headerBuffer)});
}
this.push(chunk);
};

GZipFileWorker.prototype.flush = function() {
var trailerBuffer = new ArrayBuffer(8);
var trailerView = new DataView(trailerBuffer);
trailerView.setUint32(0, this.streamInfo["crc32"]>>>0, true);
trailerView.setUint32(4, this.streamInfo["originalSize"]>>>0 & 0xffffffff, true);
this.push({data: new Uint8Array(trailerBuffer)});
};

exports.gzip = function(data, inputFormat, outputFormat, compressionOptions, onUpdate) {
var mimeType = data.contentType || data.mimeType || "";
if(! (data instanceof GenericWorker)) {
inputFormat = (inputFormat || "").toLowerCase();
data = new DataWorker(
utils.prepareContent(data.name || "gzip source",
data,
inputFormat !== "string",
inputFormat === "binarystring",
inputFormat === "base64"));
}
return new StreamHelper(
data
.pipe(new DataLengthProbe("originalSize"))
.pipe(new Crc32Probe())
.pipe(flate.compressWorker( compressionOptions || {} ))
.pipe(new GZipFileWorker()),
outputFormat.toLowerCase(), mimeType).accumulate(onUpdate);
};

and in jszip/lib/index.js I need just this:

var gzip = require("./generate/GZipFileWorker");
JSZip.gzip = gzip.gzip;

and this works like that:

JSZip.gzip("Hello World! Hello World! Hello World! Hello World! Hello World! Hello World!", "string", "base64", {level: 3}).then(function(result) { console.log(result); })

I can paste the result into a UNIX pipe like this:

$ echo -n "H4sIAOyR/VsAA/NIzcnJVwjPL8pJUVTwoJADAPCORolNAAAA" |base64 -d |zcat

and it correctly returns

Hello World! Hello World! Hello World! Hello World! Hello World! Hello World!

It can also be used with files:

JSZip.gzip(file, "", "Blob").then(function(blob) { 
xhr.setRequestProperty("Content-encoding", "gzip");
xhr.send(blob);
})

and I can send the blob to my web server. I have checked that indeed the large file is processed in chunks.

The only thing I don't like about this is that the final blob is still assembled as one big Blob, so I am assuming it holds all compressed data in memory. It would be better if that Blow was an end-point of that Worker pipeline so that when the xhr.send grabs the data chunk-wise from the Blob, it would consume chunks from the Worker pipeline only then. However, the impact is lessened a lot given that it only holds compressed content, and likely (for me at least) large files would be multi-media files that won't need to be gzip compressed anyway.

I did not write a gunzip function, because frankly, I don't need one and I don't want to make one that fails to properly parse extension headers in the gzip headers. As soon as I have uploaded compressed content to the server (S3 in my case), when I'm fetching it again I assume the browser would do the decompressing for me. I haven't checked that though. If it's becoming a problem I'll come back end edit this answer more.

Here is my fork on github: https://github.com/gschadow/jszip, pull request already entered.

Decompress gzip and zlib string in javascript

I can solve my problem by zlib . I fix my code as below

var base64Data = "eJztwTEBAAAAwqD1T20JT6AAAHgaCWAAAQ==";
var compressData = atob(base64Data);
var compressData = compressData.split('').map(function(e) {
return e.charCodeAt(0);
});
var inflate = new Zlib.Inflate(compressData);
var output = inflate.decompress();

How to require or evaluate gzipped JavaScript file with Node.js and zlib?

Here's a possible implementation:

const Module = require('module');
const zlib = require('zlib');
const fs = require('fs');

function requireGZ(filename) {
let code = zlib.gunzipSync(fs.readFileSync(filename)).toString();
let mod = new Module();

mod._compile(code, filename);

return mod.exports;
}

// Use:
let test = requireGZ('./test.js.gz');

Compressing efficiency

The optimization you are referring to might be called "token replacement" or something like that and is a reasonable approach to domain-specific compression.

This type of transformation doesn't prevent matching+entropy based algorithms like gzip from working, and so you aren't likely to get a larger final size after applying this transformation. That said, the replacement you are doing is exactly the type of thing that gzip is good at doing, so doing it yourself before invoking gzip may be a bit redundant.

To know for sure, you can simply test! What are the typical results of your token replacement + gzip, versus gzip alone?

Even without testing, here are some advantages and disadvantages of the token replacement-before-gzip based approach:

Advantages

  1. Sometimes you can do the token replacement more efficiently than gzip, if you do it early in the output generation, and can use this more compact form through most of your processing chain. You may get speedups in other parts of your code because things like string comparisons are now replaced by very fast integer comparisons (however, see disadvantage #1).
  2. gzip has certain limitations based partly on its age and huge range of target hardware, that you can work around with your token replacement. For example, it only finds matches in a 32 KiB window, while your token replacement works on the entire format.
  3. Token replacement effectively takes advantage of your knowledge of the format to encode more efficiently than gzip could. For example, although gzip will perform largely the same substitutions as you are for frequently occurring keys, your replacement will do well for infrequently occurring keys (but this doesn't matter if there are a lot of them). In effect, you are able to make use of a pre-distributed out of band dictionary, which helps compression.

Disadvantages

  1. Gzip is usually efficiently implemented using native libraries provided by your OS or runtime environment. Especially since you are calling gzip anyways, just using gzip is may be faster than doing your token replacement at the language level. How much faster (or slower) depends a lot on your implementation.
  2. Compression-wise gzip is largely doing the same replacement operation you are (plus many more), so replacing tokens yourself is somewhat redundant. That is, gzip looks for "matches": strings that already occurred earlier in the text, and replaces them with tokens, just like you do. Plus it does a lot of other good stuff, like entropy coding the tokens, so you want it as a final step no matter what.
  3. Adding your own pre-compression step may not harm compression (even if it doesn't help it much), but it adds complexity, a source for bugs, possibly performance issues and so on. You need to somehow transmit the key <-> integer mapping to remote clients, etc, etc.

Basically, I would recommend against it, unless your testing shows that it provides a significant performance boost. Usually it won't, since gzip is already removing most of the redundancy, but it depends on the specifics of your situation.

How Gzip in Tomcat works

There's two separate issues going on here: compression and minification.

Compression is the process by which the server compresses content (html,css,js) to send to the client (Browser). The browser then de-compresses the content back to exactly what it was before it got compressed. By the time you get to view-source or look at the developer tools in your browser you're seeing the original script. Think of it like sending a zip file to someone. The original file is still there exactly as it was, just wrapped up in a zip.

Compression can be enabled in several places in your app's architecture. You can enable it in your Webserver (you linked to Apache's httpd docs), your app server (Tomcat supports compression) or in your own code (search for 'servlet compression filter' for examples)

Minification (which is what YUI Compressor and other tools do) permanently changes a script, usually creating a -min.js version of the file. This file will be missing newlines and may have variables re-named. Because this altered file is what the server is sending, that's what you'll see in the browser and yes, it's hard to debug. Browser makers have recognized this and Chrome, Firefox and IE11+ support sourcemaps which tell the browser how to map from a minimized version of the code back to the original file. YUI Compressor doesn't support sourcemaps, but other tools like uglify do.

You can use minification and compression together and there are benefits to doing so. See this discussion for more detail.



Related Topics



Leave a reply



Submit