Get the Number of Pages in a PDF Document

Get the number of pages in a PDF document

A simple command line executable called: pdfinfo.

It is downloadable for Linux and Windows. You download a compressed file containing several little PDF-related programs. Extract it somewhere.

One of those files is pdfinfo (or pdfinfo.exe for Windows). An example of data returned by running it on a PDF document:

Title:          test1.pdf
Author: John Smith
Creator: PScript5.dll Version 5.2.2
Producer: Acrobat Distiller 9.2.0 (Windows)
CreationDate: 01/09/13 19:46:57
ModDate: 01/09/13 19:46:57
Tagged: yes
Form: none
Pages: 13 <-- This is what we need
Encrypted: no
Page size: 2384 x 3370 pts (A0)
File size: 17569259 bytes
Optimized: yes
PDF version: 1.6

I haven't seen a PDF document where it returned a false pagecount (yet). It is also really fast, even with big documents of 200+ MB the response time is a just a few seconds or less.

There is an easy way of extracting the pagecount from the output, here in PHP:

// Make a function for convenience 
function getPDFPages($document)
{
$cmd = "/path/to/pdfinfo"; // Linux
$cmd = "C:\\path\\to\\pdfinfo.exe"; // Windows

// Parse entire output
// Surround with double quotes if file name has spaces
exec("$cmd \"$document\"", $output);

// Iterate through lines
$pagecount = 0;
foreach($output as $op)
{
// Extract the number
if(preg_match("/Pages:\s*(\d+)/i", $op, $matches) === 1)
{
$pagecount = intval($matches[1]);
break;
}
}

return $pagecount;
}

// Use the function
echo getPDFPages("test 1.pdf"); // Output: 13

Of course this command line tool can be used in other languages that can parse output from an external program, but I use it in PHP.

I know its not pure PHP, but external programs are way better in PDF handling (as seen in the question).

I hope this can help people, because I have spent a whole lot of time trying to find the solution to this and I have seen a lot of questions about PDF pagecount in which I didn't find the answer I was looking for. That's why I made this question and answered it myself.

Security Notice: Use escapeshellarg on $document if document name is being fed from user input or file uploads.

Determine the number of pages in a PDF file

You can use Apache PDFBox to load a PDF document and then call the getNumberOfPages method to return the page count.

PDDocument doc = PDDocument.load(new File("file.pdf"));
int count = doc.getNumberOfPages();

How to get the number of pages of a .PDF uploaded by user?

In case you use pdf.js you may reference an example on github ('.../examples/node/getinfo.js') with following code that prints number of pages in a pdf file.

const pdfjsLib = require('pdfjs-dist');
...
pdfjsLib.getDocument(pdfPath).then(function (doc) {
var numPages = doc.numPages;
console.log('# Document Loaded');
console.log('Number of Pages: ' + numPages);
})

Determine number of pages in a PDF file

You'll need a PDF API for C#. iTextSharp is one possible API, though better ones might exist.

iTextSharp Example

You must install iTextSharp.dll as a reference. Download iTextsharp from SourceForge.net This is a complete working program using a console application.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using iTextSharp.text.pdf;
using iTextSharp.text.xml;
namespace GetPages_PDF
{
class Program
{
static void Main(string[] args)
{
// Right side of equation is location of YOUR pdf file
string ppath = "C:\\aworking\\Hawkins.pdf";
PdfReader pdfReader = new PdfReader(ppath);
int numberOfPages = pdfReader.NumberOfPages;
Console.WriteLine(numberOfPages);
Console.ReadLine();
}
}
}

How to get the number of pages in a pdf document using VBA?

This solution works when Excel 2013 Professional and Adobe Acrobat 9.0 Pro are installed.

You will need to enable the Adobe object model: Tools -> References -> Acrobat checkbox selected.

Adobe's SDK has limited documentation on the GetNumPages method.

'with Adobe Acrobat 9 Professional installed
'with Tools -> References -> Acrobat checkbox selected

Sub AcrobatGetNumPages()

Dim AcroDoc As Object

Set AcroDoc = New AcroPDDoc

AcroDoc.Open ("C:\Users\Public\Lorem ipsum.pdf") 'update file location

PageNum = AcroDoc.GetNumPages

MsgBox PageNum

AcroDoc.Close

End Sub

Get number of pages in a pdf using a cmd batch file

Using pdftk:

pdftk my.pdf dump_data | grep NumberOfPages

does the trick.

Count total number of pages in pdf file

function menuItem() {
var folder =
DriveApp.getFoldersByName('Test').next();
var contents = folder.searchFiles('title contains ".PDF"');
var file;
var name;
var sheet = SpreadsheetApp.getActiveSheet();
var count;

sheet.clear();
sheet.appendRow(["Name", "Number of pages"]);

while(contents.hasNext()) {
file = contents.next();
name = file.getName();
count =
file.getBlob().getDataAsString().split("/Contents").length - 1;

data = [name, count]
sheet.appendRow(data);
}
};

function onOpen() {
var ui = SpreadsheetApp.getUi();
ui.createMenu('PDF Page Calculator')
.addItem("PDF Page Calculator",
'menuItem')
.addToUi();
};

How to get the number of pages in PDF file?

You could use a pure javascript (typescript syntax) solution:

const reader = new FileReader();
const fileInfo = event.target.files[0];
if (fileInfo) {
reader.readAsBinaryString(event.target.files[0]);
reader.onloadend = () => {
const count = reader.result.match(/\/Type[\s]*\/Page[^s]/g).length;
console.log('Number of Pages:', count);
}
}

I tested it on many pdf docs and it works.

-Best regards.

Count the number of pages in a PDF in only PHP

You can use the ImageMagick extension for PHP. ImageMagick understands PDF's, and you can use the identify command to extract the number of pages. The PHP function is Imagick::identifyImage().



Related Topics



Leave a reply



Submit