Generate PDF Based on HTML Code (Itextsharp, Pdfsharp)

Generate PDF with PDFSharp from HTML template and send to browser

I figured out the problem. Apparently it was because I was doing an ajax request via the update panels. It works fine without the ajax.

Can PDFSharp create Pdf file from a Html string in Net Core?

I recently ran into the exact same issue myself. There is currently extremely limited ways in dealing with PDFs in general in .NET Core. To go through them...

  • PDF Sharp - Does not support .NET Core/Standard.
  • SelectPDF - Does have a "free" community version hidden in their website footer. Should be useable in most cases. https://selectpdf.com/community-edition/
  • IronPDF - "Enterprise" pricing. Starts at $1.5k
  • WKHTMLTOPDF - This is actually just an executable that someone has written a C# wrapper over the top to run the exe. Not a great solution.
  • iTextSharp - Has "hidden" pricing but apparently this is the only one that specifically will run on Linux under .NET Core (If that's important to you).

IMO the only free one that will do what you need is SelectPDF. And that's saying something because I don't rate the library or the API. But it's free and it works.

More info : https://dotnetcoretutorials.com/2019/07/02/creating-a-pdf-in-net-core/

How to convert HTML to PDF using iTextSharp

First, HTML and PDF are not related although they were created around the same time. HTML is intended to convey higher level information such as paragraphs and tables. Although there are methods to control it, it is ultimately up to the browser to draw these higher level concepts. PDF is intended to convey documents and the documents must "look" the same wherever they are rendered.

In an HTML document you might have a paragraph that's 100% wide and depending on the width of your monitor it might take 2 lines or 10 lines and when you print it it might be 7 lines and when you look at it on your phone it might take 20 lines. A PDF file, however, must be independent of the rendering device, so regardless of your screen size it must always render exactly the same.

Because of the musts above, PDF doesn't support abstract things like "tables" or "paragraphs". There are three basic things that PDF supports: text, lines/shapes and images. (There are other things like annotations and movies but I'm trying to keep it simple here.) In a PDF you don't say "here's a paragraph, browser do your thing!". Instead you say, "draw this text at this exact X,Y location using this exact font and don't worry, I've previously calculated the width of the text so I know it will all fit on this line". You also don't say "here's a table" but instead you say "draw this text at this exact location and then draw a rectangle at this other exact location that I've previously calculated so I know it will appear to be around the text".

Second, iText and iTextSharp parse HTML and CSS. That's it. ASP.Net, MVC, Razor, Struts, Spring, etc, are all HTML frameworks but iText/iTextSharp is 100% unaware of them. Same with DataGridViews, Repeaters, Templates, Views, etc. which are all framework-specific abstractions. It is your responsibility to get the HTML from your choice of framework, iText won't help you. If you get an exception saying The document has no pages or you think that "iText isn't parsing my HTML" it is almost definite that you don't actually have HTML, you only think you do.

Third, the built-in class that's been around for years is the HTMLWorker however this has been replaced with XMLWorker (Java / .Net). Zero work is being done on HTMLWorker which doesn't support CSS files and has only limited support for the most basic CSS properties and actually breaks on certain tags. If you do not see the HTML attribute or CSS property and value in this file then it probably isn't supported by HTMLWorker. XMLWorker can be more complicated sometimes but those complications also make it more extensible.

Below is C# code that shows how to parse HTML tags into iText abstractions that get automatically added to the document that you are working on. C# and Java are very similar so it should be relatively easy to convert this. Example #1 uses the built-in HTMLWorker to parse the HTML string. Since only inline styles are supported the class="headline" gets ignored but everything else should actually work. Example #2 is the same as the first except it uses XMLWorker instead. Example #3 also parses the simple CSS example.

//Create a byte array that will eventually hold our final PDF
Byte[] bytes;

//Boilerplate iTextSharp setup here
//Create a stream that we can write to, in this case a MemoryStream
using (var ms = new MemoryStream()) {

//Create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF
using (var doc = new Document()) {

//Create a writer that's bound to our PDF abstraction and our stream
using (var writer = PdfWriter.GetInstance(doc, ms)) {

//Open the document for writing
doc.Open();

//Our sample HTML and CSS
var example_html = @"<p>This <em>is </em><span class=""headline"" style=""text-decoration: underline;"">some</span> <strong>sample <em> text</em></strong><span style=""color: red;"">!!!</span></p>";
var example_css = @".headline{font-size:200%}";

/**************************************************
* Example #1 *
* *
* Use the built-in HTMLWorker to parse the HTML. *
* Only inline CSS is supported. *
* ************************************************/

//Create a new HTMLWorker bound to our document
using (var htmlWorker = new iTextSharp.text.html.simpleparser.HTMLWorker(doc)) {

//HTMLWorker doesn't read a string directly but instead needs a TextReader (which StringReader subclasses)
using (var sr = new StringReader(example_html)) {

//Parse the HTML
htmlWorker.Parse(sr);
}
}

/**************************************************
* Example #2 *
* *
* Use the XMLWorker to parse the HTML. *
* Only inline CSS and absolutely linked *
* CSS is supported *
* ************************************************/

//XMLWorker also reads from a TextReader and not directly from a string
using (var srHtml = new StringReader(example_html)) {

//Parse the HTML
iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, srHtml);
}

/**************************************************
* Example #3 *
* *
* Use the XMLWorker to parse HTML and CSS *
* ************************************************/

//In order to read CSS as a string we need to switch to a different constructor
//that takes Streams instead of TextReaders.
//Below we convert the strings into UTF8 byte array and wrap those in MemoryStreams
using (var msCss = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(example_css))) {
using (var msHtml = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(example_html))) {

//Parse the HTML
iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, msHtml, msCss);
}
}

doc.Close();
}
}

//After all of the PDF "stuff" above is done and closed but **before** we
//close the MemoryStream, grab all of the active bytes from the stream
bytes = ms.ToArray();
}

//Now we just need to do something with those bytes.
//Here I'm writing them to disk but if you were in ASP.Net you might Response.BinaryWrite() them.
//You could also write the bytes to a database in a varbinary() column (but please don't) or you
//could pass them to another function for further PDF processing.
var testFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "test.pdf");
System.IO.File.WriteAllBytes(testFile, bytes);

2017's update

There are good news for HTML-to-PDF demands. As this answer showed, the W3C standard css-break-3 will solve the problem... It is a Candidate Recommendation with plan to turn into definitive Recommendation this year, after tests.

As not-so-standard there are solutions, with plugins for C#, as showed by print-css.rocks.

ITextSharp HTML to PDF?

after doing some digging I found a good way to accomplish what I need with ITextSharp.

Here is some sample code if it will help anyone else in the future:

protected void Page_Load(object sender, EventArgs e)
{
Document document = new Document();
try
{
PdfWriter.GetInstance(document, new FileStream("c:\\my.pdf", FileMode.Create));
document.Open();
WebClient wc = new WebClient();
string htmlText = wc.DownloadString("http://localhost:59500/my.html");
Response.Write(htmlText);
List<IElement> htmlarraylist = HTMLWorker.ParseToList(new StringReader(htmlText), null);
for (int k = 0; k < htmlarraylist.Count; k++)
{
document.Add((IElement)htmlarraylist[k]);
}

document.Close();
}
catch
{
}
}

Unable to create PDF from HTML in Xamarin Android

Use Nuget package Xam.iTextSharpLGPL

Below is the sample code

using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.html.simpleparser;
using Android.Graphics;

string path = Android.OS.Environment.ExternalStorageDirectory.Path;
string pdfPath = System.IO.Path.Combine(path, "samplee.pdf");
System.IO.FileStream fs = new FileStream(pdfPath, FileMode.Create);
Document document = new Document(PageSize.A4);
PdfWriter writer = PdfWriter.GetInstance(document, fs);
HTMLWorker worker = new HTMLWorker(document);
document.Open();
StringBuilder html = new StringBuilder();
html.Append("<? xml version='1.0' encoding='utf-8' ?><html><head><title></title></head>");
html.Append("<CENTER>Simple Sample html</H1>");
html.Append("<H4>By User1</H4>");
html.Append("<H2>Demonstrating a few HTML features</H2>");
html.Append("</CENTER>");
html.Append("<p>HTML doesn't normally use line breaks for ordinary text. A white space of any size is treated as a single space. This is because the author of the page has no way of knowing the size of the reader's screen, or what size type they will have their browser set for.");
html.Append("</p></body</html>");
TextReader reader = new StringReader(html.ToString());
worker.StartDocument();
worker.Parse(reader);
worker.EndDocument();
worker.Close();
document.Close();
writer.Close();
fs.Close();

HTML Renderer/PDFsharp Combine Two HTML-Generated PDF Documents

Here is code that works:

static void Main(string[] args)
{
PdfDocument pdf1 = PdfGenerator.GeneratePdf("<p><h1>Hello World</h1>This is html rendered text #1</p>", PageSize.A4);
PdfDocument pdf2 = PdfGenerator.GeneratePdf("<p><h1>Hello World</h1>This is html rendered text #2</p>", PageSize.A4);

PdfDocument pdf1ForImport = ImportPdfDocument(pdf1);
PdfDocument pdf2ForImport = ImportPdfDocument(pdf2);

var combinedPdf = new PdfDocument();

combinedPdf.Pages.Add(pdf1ForImport.Pages[0]);
combinedPdf.Pages.Add(pdf2ForImport.Pages[0]);

combinedPdf.Save("document.pdf");
}

private static PdfDocument ImportPdfDocument(PdfDocument pdf1)
{
using (var stream = new MemoryStream())
{
pdf1.Save(stream, false);
stream.Position = 0;
var result = PdfReader.Open(stream, PdfDocumentOpenMode.Import);
return result;
}
}

I save the PDF document to a MemoryStream and open them for import. This allows to add the pages to a new PdfDocument. Only the first page of the documents is used for simplicity - add loops as needed.



Related Topics



Leave a reply



Submit