Count the Number of Pages in a PDF in Only PHP

Count the number of pages in a PDF in only PHP

You can use the ImageMagick extension for PHP. ImageMagick understands PDF's, and you can use the identify command to extract the number of pages. The PHP function is Imagick::identifyImage().

Get the number of pages in a PDF document

A simple command line executable called: pdfinfo.

It is downloadable for Linux and Windows. You download a compressed file containing several little PDF-related programs. Extract it somewhere.

One of those files is pdfinfo (or pdfinfo.exe for Windows). An example of data returned by running it on a PDF document:

Title:          test1.pdf
Author: John Smith
Creator: PScript5.dll Version 5.2.2
Producer: Acrobat Distiller 9.2.0 (Windows)
CreationDate: 01/09/13 19:46:57
ModDate: 01/09/13 19:46:57
Tagged: yes
Form: none
Pages: 13 <-- This is what we need
Encrypted: no
Page size: 2384 x 3370 pts (A0)
File size: 17569259 bytes
Optimized: yes
PDF version: 1.6

I haven't seen a PDF document where it returned a false pagecount (yet). It is also really fast, even with big documents of 200+ MB the response time is a just a few seconds or less.

There is an easy way of extracting the pagecount from the output, here in PHP:

// Make a function for convenience 
function getPDFPages($document)
{
$cmd = "/path/to/pdfinfo"; // Linux
$cmd = "C:\\path\\to\\pdfinfo.exe"; // Windows

// Parse entire output
// Surround with double quotes if file name has spaces
exec("$cmd \"$document\"", $output);

// Iterate through lines
$pagecount = 0;
foreach($output as $op)
{
// Extract the number
if(preg_match("/Pages:\s*(\d+)/i", $op, $matches) === 1)
{
$pagecount = intval($matches[1]);
break;
}
}

return $pagecount;
}

// Use the function
echo getPDFPages("test 1.pdf"); // Output: 13

Of course this command line tool can be used in other languages that can parse output from an external program, but I use it in PHP.

I know its not pure PHP, but external programs are way better in PDF handling (as seen in the question).

I hope this can help people, because I have spent a whole lot of time trying to find the solution to this and I have seen a lot of questions about PDF pagecount in which I didn't find the answer I was looking for. That's why I made this question and answered it myself.

Security Notice: Use escapeshellarg on $document if document name is being fed from user input or file uploads.

PHP Number of pages in a PDF file via ImageMagick

I tried out your code but it did not work with the PDFs I have.
I use Free PDF to create PDFs. It could be that the resulting PDFs are not linearized.

I found some code under question 1098156 and it seems to work ok with the PDFs I have:

function count_pages($pdfname) {
$pdftext = file_get_contents($pdfname);
$num = preg_match_all("/\/Page\W/", $pdftext, $dummy);
return $num;
}

Count pages in PDF file using Imagemagick - PHP

I solved it using;

exec("identify -format %n $file")

Can a PDF file have 0 pages defined or otherwise result in 0 as page size?

Sure, a PDF file is a container format that can contain pretty much anything, including (only) metadata with 0 pages. But even so, with this code it's quite possible to request a thumbnail for page 21 on a document that only contains 5 pages.

If that happens, the problem will occur on this line:

$img    =   new Imagick($src."[$page]");

This will throw an exception if the provided page does not exist. You can catch that exception and handle it however you want:

try {
$img = new Imagick($src."[$page]");
} catch (ImagickException $error) {
return false;
}

If you want to read the number of pages beforehand, you can try to let Imagick parse the document first:

$pdf = new Imagick($src);
$pages = $pdf->getNumberImages();

The function name is a bit misleading, see this comment in the PHP manual:

"For PDFs this function indicates the number of pages on the PDF, NOT images that might be embedded within the PDF."

Here as well, if the PDF document is invalid in some way, this can throw an exception so you might want to catch that and handle it:

try {
$pdf = new Imagick($src);
$pages = $pdf->getNumberImages();
} catch (ImagickException $error) {
return false;
}

if ($pages < $page) {
return false;
}


Related Topics



Leave a reply



Submit