Export PDF Pages to a Series of Images in Java

Export PDF pages to a series of images in Java

Here is one way to do it, combining some code fragments from around the web.

How do I draw a PDF into an Image?

https://pdf-renderer.dev.java.net/examples.html

Creating a Buffered Image from an Image

ORIGINAL: http://www.exampledepot.com/egs/java.awt.image/Image2Buf.html

UPDATED: How to convert buffered image to image and vice-versa?

Saving a Generated Graphic to a PNG or JPEG File

ORIGINAL: http://www.exampledepot.com/egs/javax.imageio/Graphic2File.html

UPDATED: http://docs.oracle.com/javase/tutorial/2d/images/saveimage.html

Combined together into something that works like this to turn all the pages into images:

import com.sun.pdfview.PDFFile;
import com.sun.pdfview.PDFPage;

import java.awt.Graphics;
import java.awt.GraphicsConfiguration;
import java.awt.GraphicsDevice;
import java.awt.GraphicsEnvironment;
import java.awt.HeadlessException;
import java.awt.Image;
import java.awt.Rectangle;
import java.awt.Transparency;
import java.io.*;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import javax.swing.*;
import javax.imageio.*;
import java.awt.image.*;

public class ImageMain {
public static void setup() throws IOException {
// load a pdf from a byte buffer
File file = new File("test.pdf");
RandomAccessFile raf = new RandomAccessFile(file, "r");
FileChannel channel = raf.getChannel();
ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
PDFFile pdffile = new PDFFile(buf);
int numPgs = pdffile.getNumPages();
for (int i = 0; i < numPgs; i++) {
// draw the first page to an image
PDFPage page = pdffile.getPage(i);
// get the width and height for the doc at the default zoom
Rectangle rect = new Rectangle(0, 0, (int) page.getBBox().getWidth(), (int) page.getBBox().getHeight());
// generate the image
Image img = page.getImage(rect.width, rect.height, // width & height
rect, // clip rect
null, // null for the ImageObserver
true, // fill background with white
true // block until drawing is done
);
// save it as a file
BufferedImage bImg = toBufferedImage(img);
File yourImageFile = new File("page_" + i + ".png");
ImageIO.write(bImg, "png", yourImageFile);
}
}

// This method returns a buffered image with the contents of an image
public static BufferedImage toBufferedImage(Image image) {
if (image instanceof BufferedImage) {
return (BufferedImage) image;
}
// This code ensures that all the pixels in the image are loaded
image = new ImageIcon(image).getImage();
// Determine if the image has transparent pixels; for this method's
// implementation, see e661 Determining If an Image Has Transparent
// Pixels
boolean hasAlpha = hasAlpha(image);
// Create a buffered image with a format that's compatible with the
// screen
BufferedImage bimage = null;
GraphicsEnvironment ge = GraphicsEnvironment.getLocalGraphicsEnvironment();
try {
// Determine the type of transparency of the new buffered image
int transparency = Transparency.OPAQUE;
if (hasAlpha) {
transparency = Transparency.BITMASK;
}
// Create the buffered image
GraphicsDevice gs = ge.getDefaultScreenDevice();
GraphicsConfiguration gc = gs.getDefaultConfiguration();
bimage = gc.createCompatibleImage(image.getWidth(null), image.getHeight(null), transparency);
} catch (HeadlessException e) {
// The system does not have a screen
}
if (bimage == null) {
// Create a buffered image using the default color model
int type = BufferedImage.TYPE_INT_RGB;
if (hasAlpha) {
type = BufferedImage.TYPE_INT_ARGB;
}
bimage = new BufferedImage(image.getWidth(null), image.getHeight(null), type);
}
// Copy image to buffered image
Graphics g = bimage.createGraphics();
// Paint the image onto the buffered image
g.drawImage(image, 0, 0, null);
g.dispose();
return bimage;
}

public static boolean hasAlpha(Image image) {
// If buffered image, the color model is readily available
if (image instanceof BufferedImage) {
BufferedImage bimage = (BufferedImage) image;
return bimage.getColorModel().hasAlpha();
}
// Use a pixel grabber to retrieve the image's color model;
// grabbing a single pixel is usually sufficient
PixelGrabber pg = new PixelGrabber(image, 0, 0, 1, 1, false);
try {
pg.grabPixels();
} catch (InterruptedException e) {
}
// Get the image's color model
ColorModel cm = pg.getColorModel();
return cm.hasAlpha();
}

public static void main(final String[] args) {
SwingUtilities.invokeLater(new Runnable() {
public void run() {
try {
ImageMain.setup();
} catch (IOException ex) {
ex.printStackTrace();
}
}
});
}
}

Convert a PDF file to image

You can easily convert 04-Request-Headers.pdf file pages into image format.

Convert all pdf pages into image format in Java using PDF Box.

Solution for Apache PDFBox 1.8.* version:

Jar required pdfbox-1.8.3.jar

or the maven dependency

<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>1.8.3</version>
</dependency>

Here is the solution:

package com.pdf.pdfbox.examples;

import java.awt.image.BufferedImage;
import java.io.File;
import java.util.List;

import javax.imageio.ImageIO;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;

@SuppressWarnings("unchecked")
public class ConvertPDFPagesToImages {
public static void main(String[] args) {
try {
String sourceDir = "C:/Documents/04-Request-Headers.pdf"; // Pdf files are read from this folder
String destinationDir = "C:/Documents/Converted_PdfFiles_to_Image/"; // converted images from pdf document are saved here

File sourceFile = new File(sourceDir);
File destinationFile = new File(destinationDir);
if (!destinationFile.exists()) {
destinationFile.mkdir();
System.out.println("Folder Created -> "+ destinationFile.getAbsolutePath());
}
if (sourceFile.exists()) {
System.out.println("Images copied to Folder: "+ destinationFile.getName());
PDDocument document = PDDocument.load(sourceDir);
List<PDPage> list = document.getDocumentCatalog().getAllPages();
System.out.println("Total files to be converted -> "+ list.size());

String fileName = sourceFile.getName().replace(".pdf", "");
int pageNumber = 1;
for (PDPage page : list) {
BufferedImage image = page.convertToImage();
File outputfile = new File(destinationDir + fileName +"_"+ pageNumber +".png");
System.out.println("Image Created -> "+ outputfile.getName());
ImageIO.write(image, "png", outputfile);
pageNumber++;
}
document.close();
System.out.println("Converted Images are saved at -> "+ destinationFile.getAbsolutePath());
} else {
System.err.println(sourceFile.getName() +" File not exists");
}

} catch (Exception e) {
e.printStackTrace();
}
}
}

Possible conversions of image into jpg, jpeg, png, bmp, gif format.

Note: I mentioned the mainly used image formats.

ImageIO.write(image , "jpg", new File( destinationDir +fileName+"_"+pageNumber+".jpg" ));
ImageIO.write(image , "jpeg", new File( destinationDir +fileName+"_"+pageNumber+".jpeg" ));
ImageIO.write(image , "png", new File( destinationDir +fileName+"_"+pageNumber+".png" ));
ImageIO.write(image , "bmp", new File( destinationDir +fileName+"_"+pageNumber+".bmp" ));
ImageIO.write(image , "gif", new File( destinationDir +fileName+"_"+pageNumber+".gif" ));

Console Output:

Images copied to Folder: Converted_PdfFiles_to_Image
Total files to be converted -> 13
Aug 06, 2014 1:35:49 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_1.png
Aug 06, 2014 1:35:50 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_2.png
Aug 06, 2014 1:35:51 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_3.png
Aug 06, 2014 1:35:51 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_4.png
Aug 06, 2014 1:35:52 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_5.png
Aug 06, 2014 1:35:52 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_6.png
Aug 06, 2014 1:35:53 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_7.png
Aug 06, 2014 1:35:53 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_8.png
Aug 06, 2014 1:35:54 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_9.png
Aug 06, 2014 1:35:54 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_10.png
Aug 06, 2014 1:35:54 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_11.png
Aug 06, 2014 1:35:55 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_12.png
Aug 06, 2014 1:35:55 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_13.png
Converted Images are saved at -> C:\Documents\Converted_PdfFiles_to_Image

Solution for Apache PDFBox 2.0.* version:

Required Jars pdfbox-2.0.16.jar, fontbox-2.0.16.jar, commons-logging-1.2.jar

or from the pom.xml dependencies

<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.16</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/fontbox -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>fontbox</artifactId>
<version>2.0.16</version>
</dependency>
<!-- https://mvnrepository.com/artifact/commons-logging/commons-logging -->
<dependency>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
<version>1.2</version>
</dependency>

Solution for 2.0.16 version:

package com.pdf.pdfbox.examples;

import java.awt.image.BufferedImage;
import java.io.File;

import javax.imageio.ImageIO;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.ImageType;
import org.apache.pdfbox.rendering.PDFRenderer;

/**
*
* @author venkataudaykiranp
*
* @version 2.0.16(Apache PDFBox version support)
*
*/
public class ConvertPDFPagesToImages {
public static void main(String[] args) {
try {
String sourceDir = "C:\\Users\\venkataudaykiranp\\Downloads\\04-Request-Headers.pdf"; // Pdf files are read from this folder
String destinationDir = "C:\\Users\\venkataudaykiranp\\Downloads\\Converted_PdfFiles_to_Image/"; // converted images from pdf document are saved here

File sourceFile = new File(sourceDir);
File destinationFile = new File(destinationDir);
if (!destinationFile.exists()) {
destinationFile.mkdir();
System.out.println("Folder Created -> "+ destinationFile.getAbsolutePath());
}
if (sourceFile.exists()) {
System.out.println("Images copied to Folder Location: "+ destinationFile.getAbsolutePath());
PDDocument document = PDDocument.load(sourceFile);
PDFRenderer pdfRenderer = new PDFRenderer(document);

int numberOfPages = document.getNumberOfPages();
System.out.println("Total files to be converting -> "+ numberOfPages);

String fileName = sourceFile.getName().replace(".pdf", "");
String fileExtension= "png";
/*
* 600 dpi give good image clarity but size of each image is 2x times of 300 dpi.
* Ex: 1. For 300dpi 04-Request-Headers_2.png expected size is 797 KB
* 2. For 600dpi 04-Request-Headers_2.png expected size is 2.42 MB
*/
int dpi = 300;// use less dpi for to save more space in harddisk. For professional usage you can use more than 300dpi

for (int i = 0; i < numberOfPages; ++i) {
File outPutFile = new File(destinationDir + fileName +"_"+ (i+1) +"."+ fileExtension);
BufferedImage bImage = pdfRenderer.renderImageWithDPI(i, dpi, ImageType.RGB);
ImageIO.write(bImage, fileExtension, outPutFile);
}

document.close();
System.out.println("Converted Images are saved at -> "+ destinationFile.getAbsolutePath());
} else {
System.err.println(sourceFile.getName() +" File not exists");
}
} catch (Exception e) {
e.printStackTrace();
}
}
}

PDF to image using Java

You will need a PDF renderer. There are a few more or less good ones on the market (ICEPdf, pdfrenderer), but without, you will have to rely on external tools. The free PDF renderers also cannot render embedded fonts, and so will only be good for creating thumbnails (what you eventually want).

My favorite external tool is Ghostscript, which can convert PDFs to images with a single command line invocation.

This converts Postscript (and PDF?) files to bmp for us, just as a guide to modify for your needs (Know you need the env vars for gs to work!):

pushd 
setlocal

Set BIN_DIR=C:\Program Files\IKOffice_ACME\bin
Set GS=C:\Program Files\IKOffice_ACME\gs
Set GS_DLL=%GS%\gs8.54\bin\gsdll32.dll
Set GS_LIB=%GS%\gs8.54\lib;%GS%\gs8.54\Resource;%GS%\fonts
Set Path=%Path%;%GS%\gs8.54\bin
Set Path=%Path%;%GS%\gs8.54\lib

call "%GS%\gs8.54\bin\gswin32c.exe" -q -dSAFER -dNOPAUSE -dBATCH -sDEVICE#bmpmono -r600x600 -sOutputFile#%2 -f %1

endlocal
popd

UPDATE: pdfbox is now able to embed fonts, so no need for Ghostscript anymore.

Pdf to image using java

here is the solution: thanks mkl for the idea

    private void PdfToImage(String PDFFILE){
try{

PDDocument document = PDDocument.load(new File(PDFFILE));
PDPage pd;

PDFRenderer pdfRenderer = new PDFRenderer(document);
for (int page = 0; page < document.getNumberOfPages(); ++page)
{

pd = document.getPage(page);
pd.setCropBox(new PDRectangle(100, 100,100,100));
BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
ImageIOUtil.writeImage(bim, outputpath + (page+1) + ".png", 300);

}
document.close();
}catch (Exception ex){
JOptionPane.showMessageDialog(null, ex.getStackTrace());
}
}

Get a page from pdf and save it to an image file with itext

Appearently (according to 1T3XT BVBA), you can only save an iText Image from a PDF page, not a raster image.
You can store it everywhere, if you will use later to put it in another PDF page... otherwise, you'll have to use a tool like JPedal:

http://www.idrsolutions.com/convert-pdf-to-images/

===================================

EDIT: maybe PDFBox can do it for you too!:

http://pdfbox.apache.org/commandlineutilities/PDFToImage.html

http://gal-levinsky.blogspot.it/2011/11/convert-pdf-to-image-via-pdfbox.html

Insert images in different pdf pages

The OP clarified his question in a comment

i have already another pages, i just want how to jump to them 

You seem to be creating a new document using a PdfWriter. That class is designed to create a pdf one page after the other. As soon as you start a new page, all former ones are written to file.

Thus, in this process you cannot jump to arbitrary pages. You have to add all information for a page while it is the current one.

If, after creating a multi page document, you need to manipulate the content of its pages, first close the document (finishing it), read it into a PdfReader, and apply a PdfStamper which allows you to manipulate arbitrary pages of an existing PDF.

Alternatively, especially if your images constitute something like a water mark or header/footer Logos, consider using page events in your pdf creation process with the PdfWriter.



Related Topics



Leave a reply



Submit