How to convert PDF to PNG efficiently?
After struggling with this for a whole day, I end up answering my own question.
The solution is to drop lower, into Core Graphics and Image I/O frameworks, to render each PDF page into a bitmap context. This problem lends itself very well to paralellization since each page can be converted into a bitmap on its own thread.
struct ImageFileType {
var uti: CFString
var fileExtention: String
// This list can include anything returned by CGImageDestinationCopyTypeIdentifiers()
// I'm including only the popular formats here
static let bmp = ImageFileType(uti: kUTTypeBMP, fileExtention: "bmp")
static let gif = ImageFileType(uti: kUTTypeGIF, fileExtention: "gif")
static let jpg = ImageFileType(uti: kUTTypeJPEG, fileExtention: "jpg")
static let png = ImageFileType(uti: kUTTypePNG, fileExtention: "png")
static let tiff = ImageFileType(uti: kUTTypeTIFF, fileExtention: "tiff")
}
func convertPDF(at sourceURL: URL, to destinationURL: URL, fileType: ImageFileType, dpi: CGFloat = 200) throws -> [URL] {
let pdfDocument = CGPDFDocument(sourceURL as CFURL)!
let colorSpace = CGColorSpaceCreateDeviceRGB()
let bitmapInfo = CGImageAlphaInfo.noneSkipLast.rawValue
var urls = [URL](repeating: URL(fileURLWithPath : "/"), count: pdfDocument.numberOfPages)
DispatchQueue.concurrentPerform(iterations: pdfDocument.numberOfPages) { i in
// Page number starts at 1, not 0
let pdfPage = pdfDocument.page(at: i + 1)!
let mediaBoxRect = pdfPage.getBoxRect(.mediaBox)
let scale = dpi / 72.0
let width = Int(mediaBoxRect.width * scale)
let height = Int(mediaBoxRect.height * scale)
let context = CGContext(data: nil, width: width, height: height, bitsPerComponent: 8, bytesPerRow: 0, space: colorSpace, bitmapInfo: bitmapInfo)!
context.interpolationQuality = .high
context.setFillColor(.white)
context.fill(CGRect(x: 0, y: 0, width: width, height: height))
context.scaleBy(x: scale, y: scale)
context.drawPDFPage(pdfPage)
let image = context.makeImage()!
let imageName = sourceURL.deletingPathExtension().lastPathComponent
let imageURL = destinationURL.appendingPathComponent("\(imageName)-Page\(i+1).\(fileType.fileExtention)")
let imageDestination = CGImageDestinationCreateWithURL(imageURL as CFURL, fileType.uti, 1, nil)!
CGImageDestinationAddImage(imageDestination, image, nil)
CGImageDestinationFinalize(imageDestination)
urls[i] = imageURL
}
return urls
}
Usage:
let sourceURL = URL(string: "http://files.shareholder.com/downloads/AAPL/4907179320x0x952191/4B5199AE-34E7-47D7-8502-CF30488B3B05/10-Q_Q3_2017_As-Filed_.pdf")!
let destinationURL = URL(fileURLWithPath: "/Users/mike/PDF")
let urls = try convertPDF(at: sourceURL, to: destinationURL, fileType: .png, dpi: 200)
Conversion is now blisteringly fast. Memory usage is a lot lower. Obviously the higher DPI you go the more CPU and memory it needs. Not sure about GPU acceleration as I only have a weak Intel integrated GPU.
Time efficient way to convert PDF to image
I found an answer to that problem using another module called fitz
which is a python binding to MuPDF
.
First of all install PyMuPDF:
The documentation can be found here but for windows users it's rather simple:
pip install PyMuPDF
Then import the fitz module:
import fitz
print(fitz.__doc__)
>>>PyMuPDF 1.18.13: Python bindings for the MuPDF 1.18.0 library.
>>>Version date: 2021-05-05 06:32:22.
>>>Built for Python 3.7 on win32 (64-bit).
Open your file and save every page as images:
The get_pixmap() method accepts different parameters that allows you to control the image (variation,resolution,color...) so I suggest that you red the documentation here.
def convert_pdf_to_image(fic):
#open your file
doc = fitz.open(fic)
#iterate through the pages of the document and create a RGB image of the page
for page in doc:
pix = page.get_pixmap()
pix.save("page-%i.png" % page.number)
Hope this helps anyone else.
Converting PDF to Image - Swift iOS
Make sure that your pptURL
is file url.URL(string: "path/to/pdf")
and URL(fileURLWithPath: "path/to/pdf")
are different things and you must use the last one while initiating your url.
The output should start with "file:///" prefix, f.e.
file:///Users/dev/Library/Developer/CoreSimulator/Devices/4FF18699-D82F-4308-88D6-44E3C11C955A/data/Containers/Bundle/Application/8F230041-AC15-45D9-863F-5778B565B12F/myApp.app/example.pdf
Convert each page of a multi-paged pdf into separate png files in R
We can create a png of each page using the image_read_pdf
function from the magick package
:
#install magick package
install.packages("magick")
library("magick")
# creating magick-image class with a png for each page of the pdf
pages <- magick::image_read_pdf(example_pdf)
pages
# saving each page of the pdf as a png
j <- 1:13
for (i in j){
pages[i] %>% image_write(., path = paste0("image",i,".png"), format = "png")
}
This would save each page as "image(page number).png" in your main directory file.
ImageMagick convert adds whitespace when converting PDF to PNG
The hint from KenS was exactly what I was looking for - the PDF defines a CropBox that ImageMagick 7.1.0 was not using by default. The solution therefore is to modify the command to include the following -define
information:
convert -define pdf:use-cropbox=true file.pdf /tmp/file.png
Thank you all for your help!
Related Topics
Cannot Invoke 'Join' with an Argument List of Type (String, [String]) in Swift 2.0
Swift 3: Convert a Null-Terminated Unsafepointer<Uint8> to a String
Creating a Countableclosedrange<Character>
I Won't Be Able to Return a Value with Alamofire in Swift
What Is the Way to Save Fonts and Sizes in Firebase for Textview Swift
Nstextfield, Change Text in Swift
Swift Function with Args... Pass to Another Function with Args
In Swift, How to Get Memory Back to Normal After an Skscene Is Removed
Rotate an Object in Its Direction of Motion
Swift: Second Occurrence with Indexof
What Determines Whether a Swift 5.5 Task Initializer Runs on the Main Thread
Constant Speed Orbit Around Point with Sknode
Converting a C-Style for Loop That Uses Division for the Step to Swift 3
Single-Element Parethesized Expressions/Tuples VS Common Use of Parentheses
Uiscrollview with Embedded Uiimageview; How to Get the Image to Fill the Screen
How to Use Alamofire with Custom Headers for Post Request