Convert PDF to HTML in PHP

Convert PDF to HTML in PHP?

1) download and unpack the .exe file to a folder: http://sourceforge.net/projects/pdftohtml/

2) create a .php file, and put this code (assuming, that the pdftohtml.exe is inside that folder, and the source sample.pdf too):

<?php
$source_pdf="sample.pdf";
$output_folder="MyFolder";

if (!file_exists($output_folder)) { mkdir($output_folder, 0777, true);}
$a= passthru("pdftohtml $source_pdf $output_folder/new_file_name",$b);
var_dump($a);
?>

3) enter MyFolder, and you will see the converted files (depends on the number of pages..)

p.s. i dont know, but there exists many commercial or trial apis too.

Convert PDF to HTML in PHP similar to DocuSign

What you are trying to do is not as straight forward it may seem. I have worked with Adobe Sign, formerly known as EchoSign, for years and I have a pretty good idea on how these services work. With that been said I strongly suggest looking into one of these eSign services instead of trying to roll out your own. It will save you a lot of time.

This is how it all works

  1. The PDF must have a form itself with named fields. In other words, if you open such PDF in Adobe Reader or Chrome you should be able to fill in the fields. If your PDF does not have a PDF form you will need additional software like Acrobat PRO to create the form.
  2. You must convert the PDF into a flat image that can be rendered in the browser.
  3. You will need a tool to extract the PDF Form information, such as the field names, types, dimensions, and coordinates.
  4. With all this information you can then render the PDF image(s) in the browser. Place absolute positioned HTML form elements over the image using the field type, dimensions, and coordinates from the previous step. Each HTML element needs to reference a PDF form field by name.
  5. Once you have collected the information and a data map like field_name => field_value from your HTML widget, you will need to use additional software to programmatically fill in the PDF form in the original PDF. A PDF form information is often stored in FDF or XFDF file.

I don't know of a single tool that will help you with the things outlined above, at least not in PHP. However, I can provide you with a suggestion can be helpful:

  • PDFtk Server - Can help you to both, extract the PDF form fields information and fill in the same an XFDF file. Unforutently, the form field information that you can extract with such tool does not include dimensions and coordinates.
  • iText - A library available in .Net and Java that can be used to extract detailed information about the PDF form including the dimension and coordinates of the fields. You can create microservice using this toolkit that can communicate with PHP.

There are definitely a lot more tools out there for the job. Hopefully, this information will guide you in the right direction or help you make a decision on how to move forward with your project.

How do I convert a PDF file to HTML in PHP?

Google pdf2html, pdftohtml looks to be the only viable one. and it's based on a command line program, not PHP. so it may not be useful to you. Google is capable of converting, so there may be a way to do it with GDocs as well. though I'm not sure of that. At any rate, I hope this gets you on the proper path at least.

Convert PDF to HTML

pdftohtml works fine : fast, stable but the html result is ugly at best. I have used it for quite some time for a web site that has many job resumes.

It is a good solution for extracting textual content however.

I would give the scribd API a try

or the google apps document API. GOogle does a great job a displaying and converting pdf files



Related Topics



Leave a reply



Submit