Fast Color Quantization in Opencv

Fast color quantization in OpenCV

There are many ways to quantize colors. Here I describe four.

Uniform quantization

Here we are using a color map with uniformly distributed colors, whether they exist in the image or not. In MATLAB-speak you would write

qimg = round(img*(N/255))*(255/N);

to quantize each channel into N levels (assuming the input is in the range [0,255]. You can also use floor, which is more suitable in some cases. This leads to N^3 different colors. For example with N=8 you get 512 unique RGB colors.

K-means clustering

This is the "classical" method to generate an adaptive palette. Obviously it is going to be the most expensive. The OP is applying k-means on the collection of all pixels. Instead, k-means can be applied to the color histogram. The process is identical, but instead of 10 million data points (a typical image nowadays), you have only maybe 32^3 = 33 thousand. The quantization caused by the histogram with reduced number of bins has little effect here when dealing with natural photographs. If you are quantizing a graph, which has a limited set of colors, you don't need to do k-means clustering.

You do a single pass through all pixels to create the histogram. Next, you run the regular k-means clustering, but using the histogram bins. Each data point has a weight now also (the number of pixels within that bin), that you need to take into account. The step in the algorithm that determines the cluster centers is affected. You need to compute the weighted mean of the data points, instead of the regular mean.

The result is affected by the initialization.

Octree quantization

An octree is a data structure for spatial indexing, where the volume is recursively divided into 8 sub-volumes by cutting each axis in half. The tree thus is formed of nodes with 8 children each. For color quantization, the RGB cube is represented by an octree, and the number of pixels per node is counted (this is equivalent to building a color histogram, and constructing an octree on top of that). Next, leaf nodes are removed until the desired number of them is left. Removing leaf nodes happens 8 at a time, such that a node one level up becomes a leaf. There are different strategies to pick which nodes to prune, but they typically revolve around pruning nodes with low pixel counts.

This is the method that Gimp uses.

Because the octree always splits nodes down the middle, it is not as flexible as k-means clustering or the next method.

Minimum variance quantization

MATLAB's rgb2ind, which the OP mentions, does uniform quantization and something they call "minimum variance quantization":

Minimum variance quantization cuts the RGB color cube into smaller boxes (not necessarily cubes) of different sizes, depending on how the colors are distributed in the image.

I'm not sure what this means. This page doesn't give away anything more, but it has a figure that looks like a k-d tree partitioning of the RGB cube. K-d trees are spatial indexing structures that divide spatial data in half recursively. At each level, you pick the dimension where there is most separation, and split along that dimension, leading to one additional leaf node. In contrast to octrees, the splitting can happen at an optimal location, it is not down the middle of the node.

The advantage of using a spatial indexing structure (either k-d trees or octrees) is that the color lookup is really fast. You start at the root, and make a binary decision based on either R, G or B value, until you reach a leaf node. There is no need to compute distances to each prototype cluster, as is the case of k-means.

[Edit two weeks later] I have been thinking about a possible implementation, and came up with one. This is the algorithm:

The full color histogram is considered a partition. This will be the root for a k-d tree, which right now is also the leaf node because there are yet no other nodes.
A priority queue is created. It contains all the leaf nodes of the k-d tree. The priority is given by the variance of the partition along one axis, minus the variances of the two halves if we were to split the partition along that axis. The split location is picked such that the variances of the two halves are minimal (using Otsu's algorithm). That is, the larger the priority, the more total variance we reduce by making the split. For each leaf node, we compute this value for each axis, and use the largest result.
We process partitions on the queue until we have the desired number of partitions:
- We split the partition with highest priority along the axis and at the location computed when determining the priority.
- We compute the priority for each of the two halves, and put them on the queue.

This is a relatively simple algorithm when described this way, the code is somewhat more complex, because I tried to make it efficient but generic.

Comparison

On a 256x256x256 RGB histogram I got these timings comparing k-means clustering and this new algorithm:

# clusters	kmeans (s)	minvar (s)
5	3.98	0.34
20	17.9	0.48
50	220.8	0.59

How to reduce the number of colors in an image with OpenCV?

There are many ways to do it. The methods suggested by jeff7 are OK, but some drawbacks are:

method 1 have parameters N and M, that you must choose, and you must also convert it to another colorspace.
method 2 answered can be very slow, since you should compute a 16.7 Milion bins histogram and sort it by frequency (to obtain the 64 higher frequency values)

I like to use an algorithm based on the Most Significant Bits to use in a RGB color and convert it to a 64 color image. If you're using C/OpenCV, you can use something like the function below.

If you're working with gray-level images I recommed to use the LUT() function of the OpenCV 2.3, since it is faster. There is a tutorial on how to use LUT to reduce the number of colors. See: Tutorial: How to scan images, lookup tables... However I find it more complicated if you're working with RGB images.

void reduceTo64Colors(IplImage *img, IplImage *img_quant) {
    int i,j;
    int height   = img->height;   
    int width    = img->width;    
    int step     = img->widthStep;

    uchar *data = (uchar *)img->imageData;
    int step2 = img_quant->widthStep;
    uchar *data2 = (uchar *)img_quant->imageData;

    for (i = 0; i < height ; i++)  {
        for (j = 0; j < width; j++)  {

          // operator XXXXXXXX & 11000000 equivalent to  XXXXXXXX AND 11000000 (=192)
          // operator 01000000 >> 2 is a 2-bit shift to the right = 00010000 
          uchar C1 = (data[i*step+j*3+0] & 192)>>2;
          uchar C2 = (data[i*step+j*3+1] & 192)>>4;
          uchar C3 = (data[i*step+j*3+2] & 192)>>6;

          data2[i*step2+j] = C1 | C2 | C3; // merges the 2 MSB of each channel
        }     
    }
}

how to speed up color-clustering in openCV?

Downsample the image first, then run k-means.

If you resize the image to 1/2th in both x and y, it shouldn't affect colors much, but k-means should take at most 1/4th of the time. If you resample to 1/10 of the width and height, k-means should run 100 times faster.

https://en.wikipedia.org/wiki/Color_quantization

By downsampling the image, you have less "pixels" to process during clustering. But in the end, it should produce roughly the same color scheme.

Small summary of k-means:

It maps each object (=pixel) to the nearest cluster center (= palette entry)
It recomputes each palette entry to best represent the assigned points (= pixels)
Repeat until nothing changes anymore.

So the real output is not an image or image regions. It's the palette.

You can then map an arbitrary image (including the full resolution version) to this color palette by simply replacing each pixel with the closest color!

Complexity and performance:

The complexity of k-means is O(n*k*i), where n is the number of pixels you have, k the desired number of output colors and i the number of iterations needed until convergence.

n: by downsampling, you can easily reduce n, the largest factor. In many situations, you can reduce this quite significantly before you see a degradation in performance.

k: this is your desired number of output colors. Whether you can reduce this or not depends on your actual use case.

i: various factors can have an effect on convergence (including both other factors!), but the strongest probably is having good starting values. So if you have a very fast but low quality method to choose the palette, run it first, then use k-means to refine this palette. Maybe OpenCV already includes an appropriate heuristic for this though!

You can see, the easiest approach is to reduce n. You can reduce n significantly, produce an optimized palette for the thumbnail, then rerun k-means on the full image refinining this palette. As - hopefully - this will reduce the number of iterations significantly, this can sometimes perform very well.

Is color quantization the right name for color quantization in OpenCV?

Color quantization is the conversion of infinite natural colors in the finite digital color space. Anyway to create a full color 'histogram' you can use opencv's sparse matrix implementation and write your own function to compute it. Of course you have to access the pixels one by one, if you have no other structural or continuity information about the image.

OpenCV: Grayscale color reduction

This is a color quantization. This is a simple technique to reduce the number of colors in an image that relies on the integer division

Since the starting range has 256 values, if you want to end up with N colors, you need to integer divide and then multiply by K = 256 / N.

So in your case, for N=8 you need a K = 256 / 8 = 32, and for N = 4 you need K = 256 / 4 = 64.

Fast Color Quantization in Opencv