How to Reduce the Image File Size Using Pil

How do I resize an image using PIL and maintain its aspect ratio?

Define a maximum size.
Then, compute a resize ratio by taking min(maxwidth/width, maxheight/height).

The proper size is oldsize*ratio.

There is of course also a library method to do this: the method Image.thumbnail.

Below is an (edited) example from the PIL documentation.

import os, sys
import Image

size = 128, 128

for infile in sys.argv[1:]:
    outfile = os.path.splitext(infile)[0] + ".thumbnail"
    if infile != outfile:
        try:
            im = Image.open(infile)
            im.thumbnail(size, Image.Resampling.LANCZOS)
            im.save(outfile, "JPEG")
        except IOError:
            print "cannot create thumbnail for '%s'" % infile

Why does PIL (pillow) Image.save() reduce file size?

The answer, as Charanjit pointed out, is in the amount of compression (which is controlled with the quality kwarg of the .save() method). More compression means a smaller file and less "quality".

Regarding "quality", this means that, although the image size and image resolution may be the same, the crispness of object edges, and color differentiation, within the image, will be reduced (possibly not apparent to the eye) by compression.

There is a good discussion of jpeg compression here: Understanding JPEG Quality.

There is also another stackoverflow answer that addresses the issue of trying to save jpeg file with the same quality as the original: https://stackoverflow.com/a/4355281/1639359 by setting kwarg quality='keep' (instead of quality=N where N is an integer between 1 and 100 %).

PIL Image compression

You could try Google's Guetzli encoder with pyguetzli, it usually generates smaller jpeg file but takes a substantial amount of time, compare:

original: 9.4M
pil_1024*768_q95.jpeg: 638K
pil_1024*768_q85.jpeg: 404K
guetzli_1024*768_q95.jpg: 376K

original jpeg file is from wiki common, By Diego Delso, CC BY-SA 4.0.

How to calculate the resulting filesize of Image.resize() in PIL

Long story short, you do not know how well the image will be compressed, because it depends a lot on what kind of image it is. That said, we can optimize your code.

Some optimizations:

Approximate the number of bytes per pixel using the memory size and the image width.
performing a ratio updated based on the new memory consumption and old memory consumption.

My coding solution applies both of the above methods, because applying them separately didn't seem to result in very stable convergence. The following sections will explain both part in more depth and show the test cases that I considered.

Reducing image memory

The following code approximates the new image dimensions based on the difference between the original file size (in bytes) and the preferred file size (in bytes). It will approximate the number of bytes per pixels and then applies the difference between the original bytes per pixel and the preferred bytes per pixel on the image width and height (therefore the square root is taken).

Then I use opencv-python (cv2) for the image rescaling, but that can be changed by your code.

def reduce_image_memory(path, max_file_size: int = 2 ** 20):
    """
        Reduce the image memory by downscaling the image.

        :param path: (str) Path to the image
        :param max_file_size: (int) Maximum size of the file in bytes
        :return: (np.ndarray) downscaled version of the image
    """
    image = cv2.imread(path)
    height, width = image.shape[:2]

    original_memory = os.stat(path).st_size
    original_bytes_per_pixel = original_memory / np.product(image.shape[:2])

    # perform resizing calculation
    new_bytes_per_pixel = original_bytes_per_pixel * (max_file_size / original_memory)
    new_bytes_ratio = np.sqrt(new_bytes_per_pixel / original_bytes_per_pixel)
    new_width, new_height = int(new_bytes_ratio * width), int(new_bytes_ratio * height)

    new_image = cv2.resize(image, (new_width, new_height), interpolation=cv2.INTER_LINEAR_EXACT)
    return new_image

Applying ratio

Most of the magic happens in ratio *= max_file_size / new_memory, where we calculate our error with respect to the preferred size and correct our ratio with that value.

The program will search for a ratio that satisfies the following condition:

abs(1 - max_file_size / new_memory) > max_deviation_percentage

This means that the new file size has to be relatively close to the preferred file size. You control this closeness ratio by delta. The higher the delta is the, the smaller your file can be (be lower than max_file_size). The smaller the delta is the closer the new file size will be to the max_file_size, but it will never be larger.

The trade of is in time, the smaller delta is the more time it will take to find a ratio satisfying the condition, empirically testing shows that values between 0.01 and 0.05 are good.

if __name__ == '__main__':
    image_location = "test img.jpg"

    # delta denotes the maximum variation allowed around the max_file_size
    # The lower the delta the more time it takes, but the close it will be to `max_file_size`.
    delta = 0.01
    max_file_size = 2 ** 20 * (1 - delta)
    max_deviation_percentage = delta

    current_memory = new_memory = os.stat(image_location).st_size
    ratio = 1
    steps = 0

    # make sure that the comparison is within a certain deviation.
    while abs(1 - max_file_size / new_memory) > max_deviation_percentage:
        new_image = reduce_image_memory(image_location, max_file_size=max_file_size * ratio)
        cv2.imwrite(f"resize {image_location}", new_image)

        new_memory = os.stat(f"resize {image_location}").st_size
        ratio *= max_file_size / new_memory
        steps += 1

    print(f"Memory resize: {current_memory / 2 ** 20:5.2f}, {new_memory / 2 ** 20:6.4f} MB, number of steps {steps}")

Test cases

For testing I had two different approaches, using randomly generated images and an example from google.

For the random images I used the following code

def generate_test_image(ratio: Tuple[int, int], file_size: int) -> Image:
    """
        Generate a test image with fixed width height ratio and an approximate size.

        :param ratio: (Tuple[int, int]) screen ratio for the image
        :param file_size: (int) Approximate size of the image, note that this may be off due to image compression.
    """
    height, width = ratio  # Numpy reverse values
    scale = np.int(np.sqrt(file_size // (width * height)))
    img = np.random.randint(0, 255, (width * scale, height * scale, 3), dtype=np.uint8)
    return img

results

Using a randomly generated image

image_location = "test image random.jpg"
# Generate a large image with fixed ratio and a file size of ~1.7MB
image = generate_test_image(ratio=(16, 9), file_size=1531494)
cv2.imwrite(image_location, image)

Memory resize: 1.71, 0.99 MB, number of steps 2

In 2 steps it reduces the original size from 1.7 MB to 0.99 MB.

(before)
original randomly generated image of 1.7 MB

(after)
resized randomly generated image of 0.99 MB

Using a google image

Memory resize: 1.51, 0.996 MB, number of steps 4

In 4 steps it reduces the original size from 1.51 MB to 0.996 MB.

(before)
original google image of a lake with waterfalls

(after)
resized google image of a lake with waterfalls

Bonus

It also works for .png, .jpeg, .tiff, etc...
Besides downscaling it can also be used to upscale images to a certain memory consumption.
The image ratio is maintained as good as possible.

Edit

I made the code a bit more user friendly, and added the suggestion from Mark Setchell using the io.Buffer, this roughly speeds up the code with a factor of 2. There is also a step_limit, that prevents endless looping if the delta is very small.

import io
import os
import time
from typing import Tuple

import cv2
import numpy as np
from PIL import Image

def generate_test_image(ratio: Tuple[int, int], file_size: int) -> Image:
    """
        Generate a test image with fixed width height ratio and an approximate size.

        :param ratio: (Tuple[int, int]) screen ratio for the image
        :param file_size: (int) Approximate size of the image, note that this may be off due to image compression.
    """
    height, width = ratio  # Numpy reverse values
    scale = np.int(np.sqrt(file_size // (width * height)))
    img = np.random.randint(0, 255, (width * scale, height * scale, 3), dtype=np.uint8)
    return img

def _change_image_memory(path, file_size: int = 2 ** 20):
    """
        Tries to match the image memory to a specific file size.

        :param path: (str) Path to the image
        :param file_size: (int) Size of the file in bytes
        :return: (np.ndarray) rescaled version of the image
    """
    image = cv2.imread(path)
    height, width = image.shape[:2]

    original_memory = os.stat(path).st_size
    original_bytes_per_pixel = original_memory / np.product(image.shape[:2])

    # perform resizing calculation
    new_bytes_per_pixel = original_bytes_per_pixel * (file_size / original_memory)
    new_bytes_ratio = np.sqrt(new_bytes_per_pixel / original_bytes_per_pixel)
    new_width, new_height = int(new_bytes_ratio * width), int(new_bytes_ratio * height)

    new_image = cv2.resize(image, (new_width, new_height), interpolation=cv2.INTER_LINEAR_EXACT)
    return new_image

def _get_size_of_image(image):
    # Encode into memory and get size
    buffer = io.BytesIO()
    image = Image.fromarray(image)
    image.save(buffer, format="JPEG")
    size = buffer.getbuffer().nbytes
    return size

def limit_image_memory(path, max_file_size: int, delta: float = 0.05, step_limit=10):
    """
        Reduces an image to the required max file size.

        :param path: (str) Path to the original (unchanged) image.
        :param max_file_size: (int) maximum size of the image
        :param delta: (float) maximum allowed variation from the max file size.
            This is a value between 0 and 1, relatively to the max file size.
        :return: an image path to the limited image.
    """
    start_time = time.perf_counter()
    max_file_size = max_file_size * (1 - delta)
    max_deviation_percentage = delta
    new_image = None

    current_memory = new_memory = os.stat(image_location).st_size
    ratio = 1
    steps = 0

    while abs(1 - max_file_size / new_memory) > max_deviation_percentage:
        new_image = _change_image_memory(path, file_size=max_file_size * ratio)
        new_memory = _get_size_of_image(new_image)
        ratio *= max_file_size / new_memory
        steps += 1

        # prevent endless looping
        if steps > step_limit:  break

    print(f"Stats:"
          f"\n\t- Original memory size: {current_memory / 2 ** 20:9.2f} MB"
          f"\n\t- New memory size     : {new_memory / 2 ** 20:9.2f} MB"
          f"\n\t- Number of steps {steps}"
          f"\n\t- Time taken: {time.perf_counter() - start_time:5.3f} seconds")

    if new_image is not None:
        cv2.imwrite(f"resize {path}", new_image)
        return f"resize {path}"
    return path

if __name__ == '__main__':
    image_location = "your nice image.jpg"

    # Uncomment to generate random test images
    # test_image = generate_test_image(ratio=(16, 9), file_size=1567289)
    # cv2.imwrite(image_location, test_image)

    path = limit_image_memory(image_location, max_file_size=2 ** 20, delta=0.01)