Get the number of pages in a PDF document
A simple command line executable called: pdfinfo.
It is downloadable for Linux and Windows. You download a compressed file containing several little PDF-related programs. Extract it somewhere.
One of those files is pdfinfo (or pdfinfo.exe for Windows). An example of data returned by running it on a PDF document:
Title: test1.pdf
Author: John Smith
Creator: PScript5.dll Version 5.2.2
Producer: Acrobat Distiller 9.2.0 (Windows)
CreationDate: 01/09/13 19:46:57
ModDate: 01/09/13 19:46:57
Tagged: yes
Form: none
Pages: 13 <-- This is what we need
Encrypted: no
Page size: 2384 x 3370 pts (A0)
File size: 17569259 bytes
Optimized: yes
PDF version: 1.6
I haven't seen a PDF document where it returned a false pagecount (yet). It is also really fast, even with big documents of 200+ MB the response time is a just a few seconds or less.
There is an easy way of extracting the pagecount from the output, here in PHP:
// Make a function for convenience
function getPDFPages($document)
{
$cmd = "/path/to/pdfinfo"; // Linux
$cmd = "C:\\path\\to\\pdfinfo.exe"; // Windows
// Parse entire output
// Surround with double quotes if file name has spaces
exec("$cmd \"$document\"", $output);
// Iterate through lines
$pagecount = 0;
foreach($output as $op)
{
// Extract the number
if(preg_match("/Pages:\s*(\d+)/i", $op, $matches) === 1)
{
$pagecount = intval($matches[1]);
break;
}
}
return $pagecount;
}
// Use the function
echo getPDFPages("test 1.pdf"); // Output: 13
Of course this command line tool can be used in other languages that can parse output from an external program, but I use it in PHP.
I know its not pure PHP, but external programs are way better in PDF handling (as seen in the question).
I hope this can help people, because I have spent a whole lot of time trying to find the solution to this and I have seen a lot of questions about PDF pagecount in which I didn't find the answer I was looking for. That's why I made this question and answered it myself.
Security Notice: Use escapeshellarg
on $document
if document name is being fed from user input or file uploads.
Determine the number of pages in a PDF file
You can use Apache PDFBox to load a PDF document and then call the getNumberOfPages
method to return the page count.
PDDocument doc = PDDocument.load(new File("file.pdf"));
int count = doc.getNumberOfPages();
How to get the number of pages of a .PDF uploaded by user?
In case you use pdf.js you may reference an example on github ('.../examples/node/getinfo.js') with following code that prints number of pages in a pdf file.
const pdfjsLib = require('pdfjs-dist');
...
pdfjsLib.getDocument(pdfPath).then(function (doc) {
var numPages = doc.numPages;
console.log('# Document Loaded');
console.log('Number of Pages: ' + numPages);
})
Determine number of pages in a PDF file
You'll need a PDF API for C#. iTextSharp is one possible API, though better ones might exist.
iTextSharp Example
You must install iTextSharp.dll as a reference. Download iTextsharp from SourceForge.net This is a complete working program using a console application.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using iTextSharp.text.pdf;
using iTextSharp.text.xml;
namespace GetPages_PDF
{
class Program
{
static void Main(string[] args)
{
// Right side of equation is location of YOUR pdf file
string ppath = "C:\\aworking\\Hawkins.pdf";
PdfReader pdfReader = new PdfReader(ppath);
int numberOfPages = pdfReader.NumberOfPages;
Console.WriteLine(numberOfPages);
Console.ReadLine();
}
}
}
How to get the number of pages in a pdf document using VBA?
This solution works when Excel 2013 Professional and Adobe Acrobat 9.0 Pro are installed.
You will need to enable the Adobe object model: Tools -> References -> Acrobat checkbox selected.
Adobe's SDK has limited documentation on the GetNumPages method.
'with Adobe Acrobat 9 Professional installed
'with Tools -> References -> Acrobat checkbox selected
Sub AcrobatGetNumPages()
Dim AcroDoc As Object
Set AcroDoc = New AcroPDDoc
AcroDoc.Open ("C:\Users\Public\Lorem ipsum.pdf") 'update file location
PageNum = AcroDoc.GetNumPages
MsgBox PageNum
AcroDoc.Close
End Sub
Get number of pages in a pdf using a cmd batch file
Using pdftk:
pdftk my.pdf dump_data | grep NumberOfPages
does the trick.
Count total number of pages in pdf file
function menuItem() {
var folder =
DriveApp.getFoldersByName('Test').next();
var contents = folder.searchFiles('title contains ".PDF"');
var file;
var name;
var sheet = SpreadsheetApp.getActiveSheet();
var count;
sheet.clear();
sheet.appendRow(["Name", "Number of pages"]);
while(contents.hasNext()) {
file = contents.next();
name = file.getName();
count =
file.getBlob().getDataAsString().split("/Contents").length - 1;
data = [name, count]
sheet.appendRow(data);
}
};
function onOpen() {
var ui = SpreadsheetApp.getUi();
ui.createMenu('PDF Page Calculator')
.addItem("PDF Page Calculator",
'menuItem')
.addToUi();
};
How to get the number of pages in PDF file?
You could use a pure javascript (typescript syntax) solution:
const reader = new FileReader();
const fileInfo = event.target.files[0];
if (fileInfo) {
reader.readAsBinaryString(event.target.files[0]);
reader.onloadend = () => {
const count = reader.result.match(/\/Type[\s]*\/Page[^s]/g).length;
console.log('Number of Pages:', count);
}
}
I tested it on many pdf docs and it works.
-Best regards.
Count the number of pages in a PDF in only PHP
You can use the ImageMagick extension for PHP. ImageMagick understands PDF's, and you can use the identify
command to extract the number of pages. The PHP function is Imagick::identifyImage().
Related Topics
Get Content Within a HTML Tag Using PHP and Replace It After Processing
How to Rename a Tag in Simplexml Through a Dom Object
How to Remove <Br /> Tags and More from a String
Parse Select Clause of SQL Queries into a PHP Array
Increase the Limit of File Upload Size in Heroku While Uploading to Dropbox
Php: Can Curl Follow Meta Redirects
PHP Sending Variables to File_Get_Contents()
Removing Password from Rsa Private Key
Multi Dimensional Array in Random Order
Pdo: Call to a Member Function Fetch() on a Non-Object
Replacing Invalid Utf-8 Characters by Question Marks, Mbstring.Substitute_Character Seems Ignored