Converting HTML to Pdf Using Itext

Converting HTML to PDF using iText

Why your code doesn't work

As explained in the introduction of the HTML to PDF tutorial, HTMLWorker has been deprecated many years ago. It wasn't intended to convert complete HTML pages. It doesn't know that an HTML page has a <head> and a <body> section; it just parses all the content. It was meant to parse small HTML snippets, and you could define styles using the StyleSheet class; real CSS wasn't supported.

Then came XML Worker. XML Worker was meant as a generic framework to parse XML. As a proof of concept, we decided to write some XHTML to PDF functionality, but we didn't support all of the HTML tags. For instance: forms weren't supported at all, and it was very hard to support CSS that is used to position content. Forms in HTML are very different from forms in PDF. There was also a mismatch between the iText architecture and the architecture of HTML + CSS. Gradually, we extended XML Worker, mostly based on requests from customers, but XML Worker became a monster with many tentacles.

Eventually, we decided to rewrite iText from scratch, with the requirements for HTML + CSS conversion in mind. This resulted in iText 7. On top of iText 7, we created several add-ons, the most important one in this context being pdfHTML.

How to solve the problem

Using the latest version of iText (iText 7.1.0 + pdfHTML 2.0.0) the code to convert the HTML from the question to PDF is reduced to this snippet:

public static final String SRC = "src/main/resources/html/sample.html";
public static final String DEST = "target/results/sample.pdf";
public void createPdf(String src, String dest) throws IOException {
    HtmlConverter.convertToPdf(new File(src), new File(dest));
}

The result looks like this:

Sample Image

As you can see, this is pretty much the result you'd expect. Since iText 7.1.0 / pdfHTML 2.0.0, the default font is Times-Roman. The CSS is being respected: the image is now floating on the right.

Some additional thoughts.

Developers often feel opposed to upgrade to a newer iText version when I give the advice to upgrade to iText 7 / pdfHTML 2. Allow me to answer to the top 3 of arguments I hear:

I need to use the free iText, and iText 7 isn't free / the pdfHTML add-on is closed source.

iText 7 is released using the AGPL, just like iText 5 and XML Worker. The AGPL allows free use in the sense of free of charge in the context of open source projects. If you are distributing a closed source / proprietary product (e.g. you use iText in a SaaS context), you can't use iText for free; in that case, you have to purchase a commercial license. This was already true for iText 5; this is still true for iText 7. As for versions prior to iText 5: you shouldn't use these at all. Regarding pdfHTML: the first versions were indeed only available as closed source software. We have had heavy discussion within iText Group: on the one hand, there were the people who wanted to avoid the massive abuse by companies who don't listen to their developers when those developers tell the powers that be that open source isn't the same as free. Developers were telling us that their boss forced them to do the wrong thing, and that they couldn't convince their boss to purchase a commercial license. On the other hand, there were the people who argued that we shouldn't punish developers for the wrong behavior of their bosses. Eventually, the people in favor of open sourcing pdfHTML, that is: the developers at iText, won the argument. Please prove that they weren't wrong, and use iText correctly: respect the AGPL if you're using iText for free; make sure that your boss purchases a commercial license if you're using iText in a closed source context.

I need to maintain a legacy system, and I have to use an old iText version.

Seriously? Maintenance also involves applying upgrades and migrating to new versions of the software you're using. As you can see, the code needed when using iText 7 and pdfHTML is very simple, and less error-prone than the code needed before. A migration project shouldn't take too long.

I've only just started and I didn't know about iText 7; I only found out after I finished my project.

That's why I'm posting this question and answer. Think of yourself as an eXtreme Programmer. Throw away all of your code, and start anew. You'll notice that it's not as much work as you imagined, and you'll sleep better knowing that you've made your project future-proof because iText 5 is being phased out. We still offer support to paying customers, but eventually, we'll stop supporting iText 5 altogether.

Convert HTML Template to PDF using iText 7: how to move a table that prints across pages

The CSS property page-break-inside is processed by iText 7's HTML to PDF conversion to avoid page breaks within an element. On the table that you don't want to split over two pages:

<table style="page-break-inside: avoid;">
<!-- table content -->
</table>

Of course, if the table grows to a height that is larger than a page, it's inevitable to split it.

Alternatively, when converting to Elements first with ConvertToElements, you can use SetKeepTogether(true) on the resulting Table instance.

How to convert HTML to PDF using iText

You can do it with the HTMLWorker class (deprecated) like this:

import com.itextpdf.text.html.simpleparser.HTMLWorker;
//...
try {
    String k = "<html><body> This is my Project </body></html>";
    OutputStream file = new FileOutputStream(new File("C:\\Test.pdf"));
    Document document = new Document();
    PdfWriter.getInstance(document, file);
    document.open();
    HTMLWorker htmlWorker = new HTMLWorker(document);
    htmlWorker.parse(new StringReader(k));
    document.close();
    file.close();
} catch (Exception e) {
    e.printStackTrace();
}

or using the XMLWorker, (download from this jar) using this code:

import com.itextpdf.tool.xml.XMLWorkerHelper;
//...
try {
    String k = "<html><body> This is my Project </body></html>";
    OutputStream file = new FileOutputStream(new File("C:\\Test.pdf"));
    Document document = new Document();
    PdfWriter writer = PdfWriter.getInstance(document, file);
    document.open();
    InputStream is = new ByteArrayInputStream(k.getBytes());
    XMLWorkerHelper.getInstance().parseXHtml(writer, document, is);
    document.close();
    file.close();
} catch (Exception e) {
    e.printStackTrace();
}

Convert html to pdf in landscape mode using iText

The best way to achieve landscape page size when converting HTML to PDF is to provide the corresponding CSS instruction for the page to become landscape.

This is done with the following CSS:

@page {
    size: landscape;
}

Now, if you have your input HTML document in htmlAsStringToConvert variable then you can process it as an HTML using Jsoup library which iText embeds. Basically we are just adding the necessary CSS instruction into our <head>:

Document htmlDoc = Jsoup.parse(htmlAsStringToConvert);
htmlDoc.head().append("<style>" +
        "@page { size: landscape; } "
        + "</style>");

HtmlConverter.convertToPdf(htmlDoc.outerHtml(), new FileOutputStream(outPdf));

Beware that if you already have @page declarations in your HTML then the one you append might be in conflict with the ones you already have - in this case, you need to make sure you insert your declaration as the latest one (this should be the case with the code above as long as all of your declaration are in <head> element).

Convert HTML with images to PDF using iText

The following is based on iText5 5.5.12 version

Suppose you have this directory structure:

Sample Image

With this code and using latest iText5:

package converthtmltopdf;

import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.XMLWorker;
import com.itextpdf.tool.xml.XMLWorkerHelper;
import com.itextpdf.tool.xml.html.Tags;
import com.itextpdf.tool.xml.net.FileRetrieve;
import com.itextpdf.tool.xml.net.FileRetrieveImpl;
import com.itextpdf.tool.xml.parser.XMLParser;
import com.itextpdf.tool.xml.pipeline.css.CSSResolver;
import com.itextpdf.tool.xml.pipeline.css.CssResolverPipeline;
import com.itextpdf.tool.xml.pipeline.end.PdfWriterPipeline;
import com.itextpdf.tool.xml.pipeline.html.AbstractImageProvider;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipeline;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipelineContext;
import com.itextpdf.tool.xml.pipeline.html.LinkProvider;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

/**
 *
 * @author george.mavrommatis
 */
public class ConvertHtmlToPdf {
    public static final String HTML = "C:\\Users\\zzz\\Desktop\\itext\\index.html";
    public static final String DEST = "C:\\Users\\zzz\\Desktop\\itext\\index.pdf";
    public static final String IMG_PATH = "C:\\Users\\zzz\\Desktop\\itext\\";
    public static final String RELATIVE_PATH = "C:\\Users\\zzz\\Desktop\\itext\\";
    public static final String CSS_DIR = "C:\\Users\\zzz\\Desktop\\itext\\";

    /**
     * Creates a PDF with the words "Hello World"
     * @param file
     * @throws IOException
     * @throws DocumentException
     */
    public void createPdf(String file) throws IOException, DocumentException {
        // step 1
        Document document = new Document();
        // step 2
        PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
        // step 3
        document.open();
        // step 4

        // CSS
        CSSResolver cssResolver =
                XMLWorkerHelper.getInstance().getDefaultCssResolver(false);
        FileRetrieve retrieve = new FileRetrieveImpl(CSS_DIR);
        cssResolver.setFileRetrieve(retrieve);

        // HTML
        HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
        htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
        htmlContext.setImageProvider(new AbstractImageProvider() {
            public String getImageRootPath() {
                return IMG_PATH;
            }
        });
        htmlContext.setLinkProvider(new LinkProvider() {
            public String getLinkRoot() {
                return RELATIVE_PATH;
            }
        });

        // Pipelines
        PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
        HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
        CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);

        // XML Worker
        XMLWorker worker = new XMLWorker(css, true);
        XMLParser p = new XMLParser(worker);
        p.parse(new FileInputStream(HTML));

        // step 5
        document.close();
    }
    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) throws IOException, DocumentException {
        // TODO code application logic here
        new ConvertHtmlToPdf().createPdf(DEST);
    }

}

And here is the result:

Sample Image

This example uses code from: https://developers.itextpdf.com/examples/xml-worker-itext5/xml-worker-examples

Hope this helps

Converting HTML to Pdf Using Itext