Display Unicode characters in converting Html to Pdf
When dealing with Unicode characters and iTextSharp there's a couple of things you need to take care of. The first one you did already and that's getting a font that supports your characters. The second thing is that you want to actually register the font with iTextSharp so that its aware of it.
//Path to our font
string arialuniTff = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "ARIALUNI.TTF");
//Register the font with iTextSharp
iTextSharp.text.FontFactory.Register(arialuniTff);
Now that we have a font we need to create a StyleSheet
object that tells iTextSharp when and how to use it.
//Create a new stylesheet
iTextSharp.text.html.simpleparser.StyleSheet ST = new iTextSharp.text.html.simpleparser.StyleSheet();
//Set the default body font to our registered font's internal name
ST.LoadTagStyle(HtmlTags.BODY, HtmlTags.FACE, "Arial Unicode MS");
The one non-HTML part that you also need to do is set a special encoding
parameter. This encoding is specific to iTextSharp and in your case you want it to be Identity-H
. If you don't set this then it default to Cp1252
(WINANSI
).
//Set the default encoding to support Unicode characters
ST.LoadTagStyle(HtmlTags.BODY, HtmlTags.ENCODING, BaseFont.IDENTITY_H);
Lastly, we need to pass our stylesheet to the ParseToList
method:
//Parse our HTML using the stylesheet created above
List<IElement> list = HTMLWorker.ParseToList(new StringReader(stringBuilder.ToString()), ST);
Putting that all together, from open to close you'd have:
doc.Open();
//Sample HTML
StringBuilder stringBuilder = new StringBuilder();
stringBuilder.Append(@"<p>This is a test: <strong>α,β</strong></p>");
//Path to our font
string arialuniTff = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "ARIALUNI.TTF");
//Register the font with iTextSharp
iTextSharp.text.FontFactory.Register(arialuniTff);
//Create a new stylesheet
iTextSharp.text.html.simpleparser.StyleSheet ST = new iTextSharp.text.html.simpleparser.StyleSheet();
//Set the default body font to our registered font's internal name
ST.LoadTagStyle(HtmlTags.BODY, HtmlTags.FACE, "Arial Unicode MS");
//Set the default encoding to support Unicode characters
ST.LoadTagStyle(HtmlTags.BODY, HtmlTags.ENCODING, BaseFont.IDENTITY_H);
//Parse our HTML using the stylesheet created above
List<IElement> list = HTMLWorker.ParseToList(new StringReader(stringBuilder.ToString()), ST);
//Loop through each element, don't bother wrapping in P tags
foreach (var element in list) {
doc.Add(element);
}
doc.Close();
EDIT
In your comment you show HTML that specifies an override font. iTextSharp does not spider the system for fonts and its HTML parser doesn't use font fallback techniques. Any fonts specified in HTML/CSS must be manually registered.
string lucidaTff = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "l_10646.ttf");
iTextSharp.text.FontFactory.Register(lucidaTff);
HTML to PDF with cyrillic characters
This worked for me!
public static void main(String[] args) throws DocumentException, IOException, SAXException, ParserConfigurationException {
String htmlString = "<!DOCTYPE html>\n" + "<html lang=\"ru\">\n" + "<head>\n"
+ " <meta charset=\"UTF-8\"/>\n" + " <meta http-equiv=\"Content-Type\" content=\"text/html\"/>\n"
+ " <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\"/>\n"
+ " <style type='text/css'> "
+ " * { font-family: Verdana; }/n"
+ " </style>/n"
+ "</head>\n"
+ "<body>\n" + " <h3>ПРЕДСТАВЛЕНИЕ</h3>\n" + "</body>\n" + "</html>";
String path = FileSystemView.getFileSystemView().getDefaultDirectory().getPath() + "/A.pdf";
OutputStream os = new FileOutputStream(path);
ITextRenderer renderer = new ITextRenderer();
renderer.getFontResolver().addFont("c:/windows/fonts/verdana.ttf", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
renderer.setDocumentFromString(htmlString);
renderer.layout();
renderer.createPDF(os);
os.close();
}
I think the trick is to add the CSS to the HTML and the font must match what you set on the PDF.
html to pdf convert, cyrillic characters not displayed properly
First this: it is very hard to believe that your font directory is C:\\
. You are assuming that you have a file with path C:\\arialuni.ttf
whereas I assume that the path to MS Arial Unicode is C:\\windows\fonts\arialuni.ttf
.
Secondly: I don't think arialuni
is the correct name. I'm pretty sure it's arial unicode ms
. You can check this by running this code:
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register("c:/windows/fonts/arialuni.ttf");
for (String s : fontProvider.getRegisteredFamilies()) {
System.out.println(s);
}
The output should be:
courier
arial unicode ms
zapfdingbats
symbol
helvetica
times
times-roman
These are the values you can use; arialuni
isn't one of them.
Also: aren't you defining the character set in the wrong place?
I have slightly adapted your source code in the sense that I stored the HTML in an HTML file cyrillic.html:
<html>
<head>
<meta http-equiv="content-type" content="application/xhtml+xml; charset=UTF-8"/>
</head>
<body>
<h4 style="font-family: Arial Unicode MS, FreeSans; font-size:16px; font-weight: normal; " >Здраво Kristijan!</h4>
</body>
</html>
Note that I replaced arialuni
with Arial Unicode MS
and that I used FreeSans
as an alternative font. In my code, I used FreeSans.ttf
instead of arialttf
.
See ParseHtml11:
public static final String DEST = "results/xmlworker/cyrillic.pdf";
public static final String HTML = "resources/xml/cyrillic.html";
public static final String FONT = "resources/fonts/FreeSans.ttf";
public void createPdf(String file) throws IOException, DocumentException {
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
// step 3
document.open();
// step 4
XMLWorkerFontProvider fontImp = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontImp.register(FONT);
FontFactory.setFontImp(fontImp);
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new FileInputStream(HTML), null, Charset.forName("UTF-8"), fontImp);
// step 5
document.close();
}
As you can see, I use the Charset
when parsing the HTML. The result looks like this:
If you insist on using Arial Unicode, just replace this line:
public static final String FONT = "resources/fonts/FreeSans.ttf";
With this one:
public static final String FONT = "c:/windows/fonts/arialuni.ttf";
I have tested this on a Windows machine and it works too:
trouble in converting unicode template to pdf using xhtml2pdf
Its partially solved. Providing the absolute path to the font i.e.
<style>
@font-face {
font-family: Preeti;
src: url("c:/static/fonts/preeti.ttf");
}
body {
font-family: Preeti;
}
</style>
Now another problem has raised. I have mixed texts, partially in unicode and partially in normal Font(I think I should say it normal fonts :D), since fonts have been overridden, now the normal Fonts are coming in rectangular boxes. In this case a empty box.
Related Topics
Converting File into Base64String and Back Again
How to Get the Directory from a File's Full Path
Populate Treeview with File System Directory Structure
How to Add a Custom Routed Command in Wpf
How to Check If Object Already Exists in a List
How to Execute Code After a Form Has Loaded
Cryptographicexception Was Unhandled: System Cannot Find the Specified File
Set Background Color of Wpf Textbox in C# Code
Generating Xml File Using Xsd File
Differencebetween Casting and Conversion
Check If Property Has Attribute
C#:Is Variance (Covariance/Contravariance) Another Word for Polymorphism
What Is the Correct Way to Read a Serial Port Using .Net Framework
App.Config for a Class Library
Importing Nested Namespaces Automatically in C#
How to Pinvoke to Getwindowlongptr and Setwindowlongptr on 32-Bit Platforms
It Is Possible to Copy All the Properties of a Certain Control? (C# Window Forms)