How to Automate HTML-To-Pdf Conversions

How can I automate HTML-to-PDF conversions?

NOTE: This answer is from 2008 and is probably now incorrect; please check the other answers

PrinceXML is the best one I've seen (it parses regular HTML as well as XML/XHTML). How is it the best? Well, it passes the acid2 test which I thought was pretty darn impressive

It is however, quite expensive

Converting HTML to PDF using convert utility in Mac OS X

Have a look at wkhtmltopdf— a command-line utility that uses the WebKit rendering engine to produce PDFs from HTML. I've found that it produces a nicer result. You shouldn't have any trouble integrating it with your current script.

Dynamic HTML to PDF

There is no way it can be done. The interfaces available for scripts in PDF are extremely limited compared to the full DOM and BOM access you enjoy in a web browser. Such interaction as you can achieve in PDF is not readily translatable from how it works in a browser and would almost certainly need hand authoring.

Your example page has many effects that PDF, as an essentially static document layout format, simply cannot reproduce at all.

Edit:

I just want the finial rendering of the screen to be captured in the PDF

Ah, OK, that's a far easier and more common problem then.

In that case you'll have to use and automate a real web browser (like Firefox), or a toolkit that provides all the logic of a web browser (like WebKit), then either:

  • export to PDF, either using built-in tools like ‘Print to file’ in Firefox (with background images/colours turned on) or one of the PDF export add-ons, or

  • take a image snapsnot of the browser (and include the image in a PDF if you have to)

See these questions for some discussion of browser snapshotting.



Related Topics



Leave a reply



Submit