Client-Side Image Processing

Client-side image processing

This is the sort of issue that software architects run up against all the time. As per usual, there is no ideal solution. You need to select which compromise is most acceptable to your business.

To summarise your problem, most of your image processing software is written in .NET. You'd like to run it client-side on mobile devices, but there is limited .NET penetration on mobiles. The alternatives with higher penetration (eg. Flash) would require you to re-write your code, which you can't afford to do. In addition, these alternatives are not supported on the iPhone/iPad.

What you ideally want is a way to run all your .NET code on most existing platforms, including iPhone/iPad. I can say with some confidence that no such solution currently exists - there is no "silver bullet" answer that you have overlooked.

So what will you need to compromise on? It seems to me that even if you redevelop in flash, you are still going to miss out on a major market (iPhone). And redeveloping software is extremely costly anyway.

Here is the best solution to your problem - you need to compromise on your "client side execution" constraint. If you execute server side, you get to keep your existing code, and also get to deploy to just about every mobile client, including the iPhone.

You said your server power is limited, but server processing power is cheap when compared to software development costs. Indeed, it is not all that expensive to outsource your server component and just pay for what you use. It's most likely that your application will only have low penetration to start off with. As the business grows, you will be able to afford to upgrade your server capacity.

I believe this is the best solution to your problem.

Image manipulation on the server or client side?

I can see that you need the user to be able to manipulate images, so it would be more efficient to allow them to do so, client side.

For Client-Side:

There are a few JavaScript libraries available. FabricJS and CamanJS use the <canvas> element to provide image manipulation capabilities. CamanJS should be sufficient for your needs.

It is recommended you do not do the image processing server-side, but here are some libraries for that purpose, for information's sake.

For Server Side: Use Pillow server side, which is a fork of PIL - the Python Imaging Library.

It is one of best image manipulation tools, which can perform the cropping, making thumbnails, etc that your website requires.

I have used it in on a server and then uploaded it to S3.

Client-Side Image Manipulation (Cropping)

Basically I would go this way:

Load your image into a <canvas>
Crop the Image: http://www.html5canvastutorials.com/tutorials/html5-canvas-image-crop/
Save a image from the canvas: http://www.html5canvastutorials.com/advanced/html5-canvas-save-drawing-as-an-image/

Here is a very good tutorial.

Disclaimer: I haven't tested this yet, but I heard that this way work.

Also use background-size:cover; or background-size:contain; to get around nasty dimension problems.

image processing in client side

You can use a Javascript Image Processing Framework like MarvinJ. The example below demonstrates how to resize and crop an image in js in the client side.

var canvas1 = document.getElementById("canvas1");var canvas2 = document.getElementById("canvas2");var canvas3 = document.getElementById("canvas3");
image = new MarvinImage();image.load("https://i.imgur.com/gaW8OeL.jpg", imageLoaded);

function imageLoaded(){ imageOut = image.clone() image.draw(canvas1)   // Crop  Marvin.crop(image, imageOut, 50, 50, 100, 100);  imageOut.draw(canvas2);    // Scale  Marvin.scale(image, imageOut, 100); imageOut.draw(canvas3); }

<script src="https://www.marvinj.org/releases/marvinj-0.7.js"></script><canvas id="canvas1" width="200" height="200"></canvas><canvas id="canvas2" width="200" height="200"></canvas><br/><canvas id="canvas3" width="200" height="200"></canvas>

Why does browser/client side image re-sizing have such a large performance hit?

There are a few reasons that performance is degraded by client side resizing (listed in more detail in provided link):

Bandwidth cost of larger image that is discarded anyways
CPU/GPU cost of actual image translation

I may be wrong but I'm pretty sure that both CSS and JS would utilize the same browser libraries so I wouldn't expect much difference.

Also, client side manipulation leaves you at the mercy of the algorithm that the browser chooses so you don't get much input on quality vs. speed, lossy vs lossless, etc. There are a lot of algorithms to choose from in image rendering that all have different trade offs.

Is resizing images within the browser a good strategy?

Server-side image processing

Thanks for providing the gist; it made it quite easy to fiddle around with the code to see the performance effects. Also, +1 for using JMH.

First, here are the baseline results on my machine (2009 MacBook Pro, 2.8GHz Core2Duo, JDK 8u5):

Benchmark                              Mode   Samples        Score  Score error    Units
c.s.q.ShaderFunc.testProcess          thrpt         5        7.191        1.140    ops/s
c.s.q.ShaderFunc.testProcessInline    thrpt         5        7.592        0.465    ops/s
c.s.q.ShaderFunc.testProcessProc      thrpt         5        7.326        1.242    ops/s

(c.s.q is com.stackoverflow.questions)

The differences between the techniques in my runs is smaller, though the errors are somewhat higher, and the inline version is still fastest. Since the results were so close I started off optimizing testProcess which is the one that makes a direct call to the blur method, since that's the code you've included here. For other readers' convenience, the code that calls the blur method is this:

int width = 4000;
int height = 2000;
int[][] nextData = new int[width][height];
for (int i = 0; i < width; ++i) {
    for (int j = 0; j < height; ++j) {
        nextData[i][j] = blur(blurData, i, j);
    }
}

My first observation is that there are a lot of conditionals in the blur method that avoid stepping off the edges of the matrix. Conveniently, the accumulations that are done at the edges have the same result if the value "off the edge" is zero (I think this true of most image processing kernels). This means that if we pad around the edges of the matrix with zeroes and run the loops from 1 to the limit-1 instead of 0 to limit, we can drop the conditionals. The loop changes to this:

int width = 4002;
int height = 2002;
int[][] nextData = new int[width][height];
for (int i = 1; i < width-1; ++i) {
    for (int j = 1; j < height-1; ++j) {
        nextData[i][j] = blur(blurData, i, j);
    }
}

(You also have to make corresponding changes to the randomMatrix function that generates the input data.) If you remove the conditionals from the blur method it now looks like this:

public int blur(final int[][] data, final int x, final int y) {
    float accumulator = 0;
    int[] col = data[x];
    accumulator += col[y];
    accumulator += data[x-1][y] * 0.5f;
    accumulator += data[x+1][y] * 0.5f;
    accumulator += col[y-1] * 0.5f;
    accumulator += col[y+1] * 0.5f;
    return Math.round(accumulator / 3f);
}

The results for this version are maybe 15% faster:

Benchmark                        Mode   Samples        Score  Score error    Units
c.s.q.ShaderFunc.testProcess    thrpt         5        8.424        1.035    ops/s

Now let's take a closer look at the calculations. The input is all int data, but we're accumulating into a float variable. And then the output is an int as well. Instead of repeated multiplications by 0.5f we can accumulate double the amount and then divide by 6f at the end. (There is the possibility of overflow here, though, if the input data is in the 2 billion range.) With some additional simplifications, the revised code looks like this:

public int blur(final int[][] data, final int x, final int y) {
    int[] col = data[x];
    int accumulator = 2 * col[y]
                      + data[x-1][y]
                      + data[x+1][y]
                      + col[y-1]
                      + col[y+1];
    return Math.round(accumulator / 6f);
}

And the results are more than 80% faster!

Benchmark                        Mode   Samples        Score  Score error    Units
c.s.q.ShaderFunc.testProcess    thrpt         5       15.397        1.897    ops/s

With the simplified blur method, let's reconsider inlining. I won't reproduce the code, since it's just taking the body of the blur method and doing the obvious refactoring of it into the nested for loops above (adjusting variable names, etc.) Doing this gives the following results:

Benchmark                              Mode   Samples        Score  Score error    Units
c.s.q.ShaderFunc.testProcessInline    thrpt         5       15.619        1.607    ops/s

Just a little bit faster, but within the margin of error, so it's hard to tell for sure. It might not be worth inlining if keeping the functions separate makes it easier to plug in different algorithms.

The big win here is getting rid of floating point operations, particularly floating point multiplies. Many multi-core systems have more integer than floating-point hardware available, so avoiding FP on such systems will still help.

Ah, that gives me another idea. Can we get rid of the Math.round call and the FP divide? Again, depending on your input's numeric ranges, we can do integer-based rounding. Instead of

Math.round(accumulator / 6f)

we can do something more-or-less equivalent like:

(1000 * accumulator + 500) / 6000

The results with this change are another 25% improvement!

Benchmark                              Mode   Samples        Score  Score error    Units
c.s.q.ShaderFunc.testProcessInline    thrpt         5       19.517        2.894    ops/s

Today's lesson: to speed things up, replace floating point with integer computation. Of course, you have to pay attention to overflow and integer truncation issues. But if you can make it work, it's worth it.

UPDATE

In response to a suggestion from Alexey Shipilev (author of JMH) I've run an alternative version that multiplies by the reciprocal of 6.0f instead of dividing. The code is:

static final float ONE_SIXTH = 1.0f / 6.0f;

...

Math.round(accumulator * ONE_SIXTH);

This replaces a floating-point division with a floating-point multiplication. The results are:

Benchmark                              Mode   Samples        Score  Score error    Units
c.s.q.ShaderFunc.testProcessInline    thrpt         5       17.144        0.869    ops/s

Noticeably faster than the FP division, but not quite as fast as the integer computation.

Client-Side Image Processing