Pdfbox Encode Symbol Currency Euro

How to draw text containing characters like € in PDF using PDFBox

by referring to this I manage to implement what I needed. Please refer following code snippet.

contentStream.beginText();

/*x will act as placeholder here*/
byte[] commands = "(500 x/year**) Tj ".getBytes();

/* commands[index_of_x] = (byte)128, where 128 is decimal value of octal
* 200. (char code for '€' in WinAnsiEncoding).
* you may want to refer annex D.2, Latin Character Set and Encodings of
* PDF specification ISO 32000-1
*/
commands[5] = (byte) 128;

contentStream.appendRawCommands(commands);
contentStream.endText();
contentStream.close();

PDF specification ISO 32000-1

PDFBox U+00A0 is not available in this font's encoding

You'll have to embed a font and not use WinAnsiEncoding:

PDFont formFont = PDType0Font.load(doc, new FileInputStream("c:/windows/fonts/somefont.ttf"), false); // check that the font has what you need; ARIALUNI.TTF is good but huge
PDResources res = acroForm.getDefaultResources(); // could be null, if so, then create it with the setter
String fontName = res.add(formFont).getName();
String defaultAppearanceString = "/" + fontName + " 0 Tf 0 g"; // adjust to replace existing font name
textField.setDefaultAppearance(defaultAppearanceString);

Note that this code must be ran before calling setValue().

More about this in the CreateSimpleFormWithEmbeddedFont.java example from the source code download.

U+FFFD is not available in this font's encoding: WinAnsiEncoding

U+FFFD is used to replace an incoming character whose value is unknown or unrepresentable in Unicode compare the use of U+001A as a control character to indicate the substitute function (source).

That said it is likely that that character gets messed up somewhere. Maybe the encoding of the file is not UTF-8 and that's why the character is messed up.

As a general rule you should only write ASCII characters in the source code. You can still represent the whole Unicode range using the escaped form \uXXXX. In this case ä -> \u00E4.

-- UPDATE --

Apparently the problem is in how the user input get encoded/decoded from client/server side using the JS function btoa. A solution to this problem can be found at this link:

Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings

Using PDFBox to write unicode strings to a PDF

Essentially all the answers you linked to are correct. You have to keep in mind which PDFBox version they respectively refer to.

concerning this answer:

In the pre-2.0.0 versions (up to the current 1.8.8) the text drawing operations were very limited and didn't support even the full WinAnsi encoding which font objects generated by these versions used as encoding.

concerning this answer:

The current 2.0.0-SNAPSHOT development state has much improved. This means that the limitations of the text drawing operations have been removed, they properly encode the text and the used fonts are properly encoded and embedded. Bugs in the early implementations of these improvements meanwhile have mostly been fixed.

concerning this answer:

This answer points to something one needs to keep in mind, no matter which PDFBox version one uses: specific fonts do not necessarily support the whole Unicode range of code points. If the font you use does not contain a glyph definition for a character, you can encode as much as you want, your character won't be drawn properly. This especially concerns the standard 14 fonts which every PDF viewer has to support: they need only support characters from a few Latin-style encodings, by far not the the full Unicode set.

Write arabic characters with PDFBOX

Arabic can be written by applying both PDFBOX-922 and PDFBOX-1287 .(the diff files are attached to in issues description)
I hope that the patches will be applied in the version 2.0.

PDFBox “special” characters in Helvetica

Looking at the PDFBox code, it really seems like a bug. If you look at the PDType1Font.encode() method, it automatically throws if the code point is > 0xFF. However, if the logic instead proceeded in this case, the GlyphList would convert the "\u2019" character to "quoteright", which would then be a valid character in the font.



Related Topics



Leave a reply



Submit