How to Convert PDF to Png Efficiently

How to convert PDF to PNG efficiently?

After struggling with this for a whole day, I end up answering my own question.

The solution is to drop lower, into Core Graphics and Image I/O frameworks, to render each PDF page into a bitmap context. This problem lends itself very well to paralellization since each page can be converted into a bitmap on its own thread.

struct ImageFileType {
    var uti: CFString
    var fileExtention: String

    // This list can include anything returned by CGImageDestinationCopyTypeIdentifiers()
    // I'm including only the popular formats here
    static let bmp = ImageFileType(uti: kUTTypeBMP, fileExtention: "bmp")
    static let gif = ImageFileType(uti: kUTTypeGIF, fileExtention: "gif")
    static let jpg = ImageFileType(uti: kUTTypeJPEG, fileExtention: "jpg")
    static let png = ImageFileType(uti: kUTTypePNG, fileExtention: "png")
    static let tiff = ImageFileType(uti: kUTTypeTIFF, fileExtention: "tiff")
}

func convertPDF(at sourceURL: URL, to destinationURL: URL, fileType: ImageFileType, dpi: CGFloat = 200) throws -> [URL] {
    let pdfDocument = CGPDFDocument(sourceURL as CFURL)!
    let colorSpace = CGColorSpaceCreateDeviceRGB()
    let bitmapInfo = CGImageAlphaInfo.noneSkipLast.rawValue

    var urls = [URL](repeating: URL(fileURLWithPath : "/"), count: pdfDocument.numberOfPages)
    DispatchQueue.concurrentPerform(iterations: pdfDocument.numberOfPages) { i in
        // Page number starts at 1, not 0
        let pdfPage = pdfDocument.page(at: i + 1)!

        let mediaBoxRect = pdfPage.getBoxRect(.mediaBox)
        let scale = dpi / 72.0
        let width = Int(mediaBoxRect.width * scale)
        let height = Int(mediaBoxRect.height * scale)

        let context = CGContext(data: nil, width: width, height: height, bitsPerComponent: 8, bytesPerRow: 0, space: colorSpace, bitmapInfo: bitmapInfo)!
        context.interpolationQuality = .high
        context.setFillColor(.white)
        context.fill(CGRect(x: 0, y: 0, width: width, height: height))
        context.scaleBy(x: scale, y: scale)
        context.drawPDFPage(pdfPage)

        let image = context.makeImage()!
        let imageName = sourceURL.deletingPathExtension().lastPathComponent
        let imageURL = destinationURL.appendingPathComponent("\(imageName)-Page\(i+1).\(fileType.fileExtention)")

        let imageDestination = CGImageDestinationCreateWithURL(imageURL as CFURL, fileType.uti, 1, nil)!
        CGImageDestinationAddImage(imageDestination, image, nil)
        CGImageDestinationFinalize(imageDestination)

        urls[i] = imageURL
    }
    return urls
}

Usage:

let sourceURL = URL(string: "http://files.shareholder.com/downloads/AAPL/4907179320x0x952191/4B5199AE-34E7-47D7-8502-CF30488B3B05/10-Q_Q3_2017_As-Filed_.pdf")!
let destinationURL = URL(fileURLWithPath: "/Users/mike/PDF")
let urls = try convertPDF(at: sourceURL, to: destinationURL, fileType: .png, dpi: 200)

Conversion is now blisteringly fast. Memory usage is a lot lower. Obviously the higher DPI you go the more CPU and memory it needs. Not sure about GPU acceleration as I only have a weak Intel integrated GPU.

Time efficient way to convert PDF to image

I found an answer to that problem using another module called fitz which is a python binding to MuPDF.

First of all install PyMuPDF:

The documentation can be found here but for windows users it's rather simple:

pip install PyMuPDF

Then import the fitz module:

import fitz
print(fitz.__doc__)

>>>PyMuPDF 1.18.13: Python bindings for the MuPDF 1.18.0 library.
>>>Version date: 2021-05-05 06:32:22.
>>>Built for Python 3.7 on win32 (64-bit).

Open your file and save every page as images:

The get_pixmap() method accepts different parameters that allows you to control the image (variation,resolution,color...) so I suggest that you red the documentation here.

def convert_pdf_to_image(fic):
    #open your file
    doc = fitz.open(fic)
    #iterate through the pages of the document and create a RGB image of the page
    for page in doc:
        pix = page.get_pixmap()
        pix.save("page-%i.png" % page.number)

Hope this helps anyone else.

Converting PDF to Image - Swift iOS

Make sure that your pptURL is file url.

URL(string: "path/to/pdf") and URL(fileURLWithPath: "path/to/pdf") are different things and you must use the last one while initiating your url.

The output should start with "file:///" prefix, f.e.

file:///Users/dev/Library/Developer/CoreSimulator/Devices/4FF18699-D82F-4308-88D6-44E3C11C955A/data/Containers/Bundle/Application/8F230041-AC15-45D9-863F-5778B565B12F/myApp.app/example.pdf

Convert each page of a multi-paged pdf into separate png files in R

We can create a png of each page using the image_read_pdf function from the magick package:

#install magick package
install.packages("magick")
library("magick")

# creating magick-image class with a png for each page of the pdf
pages <- magick::image_read_pdf(example_pdf)
pages


# saving each page of the pdf as a png
j <- 1:13
for (i in j){
pages[i] %>% image_write(., path = paste0("image",i,".png"), format = "png")

}

This would save each page as "image(page number).png" in your main directory file.

ImageMagick convert adds whitespace when converting PDF to PNG

The hint from KenS was exactly what I was looking for - the PDF defines a CropBox that ImageMagick 7.1.0 was not using by default. The solution therefore is to modify the command to include the following -define information:

convert -define pdf:use-cropbox=true file.pdf /tmp/file.png

Thank you all for your help!

How to Convert PDF to Png Efficiently