Pdfkitten Is Highlighting on Wrong Position

PDFKitten is highlighting on wrong position

This might be a bug in PDFKitten when calculating the width of characters whose character identifier does not coincide with its unicode character code.

appendPDFString in StringDetector works with two strings when processing some string data:

// Use CID string for font-related computations.
NSString *cidString = [font stringWithPDFString:string];

// Use Unicode string to compare with user input.
NSString *unicodeString = [[font stringWithPDFString:string] lowercaseString];

stringWithPDFString in Font transforms the sequence of character identifiers of its argument into a unicode string.

Thus, in spite of the name of the variable, cidString is not a sequence of character identifiers but instead of unicode chars. Nonetheless its entries are used as argument of didScanCharacter which in Scanner is implemented to forward the position by the character width: It is using the value as parameter of widthOfCharacter in Font to determine the character width, and that method (according to the comment "Width of the given character (CID) scaled to fontsize") expects its argument to be a character identifier.

So, if CID and unicode character code don't coincide, the wrong character widths is determined and the position of any following character cannot be trusted. In the case at hand, the /fi ligature has a CID of 12 which is way different from its Unicode code 0xfb01.

I would propose PDFKitten to be enhanced to also define a didScanCID method in StringDetector which in appendPDFString should be called next to didScanCharacter for each processed character forwarding its CID. Scanner then should make use of this new method instead to calculate the width to forward its cursor.

This should be triple-checked first, though. Maybe some widthOfCharacter implementations (there are different ones for different font types) in spite of the comment expect the argument to be a unicode code after all...

(Sorry if I used the wrong vocabulary here or there, I'm a 'Java guy... :))

How to select lines of text in a PDF and then highlight them? (iOS)

Your potential solution is the way to go. The size of the bounding rectangle of a Tj string is the sum of bounding rectangles of each glyph in the string so you can select anything in the string. THe PDFKitten library might help you with text processing: https://github.com/KurtCode/PDFKitten

Possible to show PDF over a PageViewController in an iOS app?

First, create a new instance of the scanner.

CGPDFPageRef page = CGPDFDocumentGetPage(document, 1);
Scanner *scanner = [Scanner scannerWithPage:page];

Set a keyword (case-insensitive) and scan a page.

NSArray *selections = [scanner select:@"happiness"];

Finally, scan the page and draw the selections.

for (Selection *selection in selections)
{
// draw selection
}

and then highlight the selections using core graphics framework.



Related Topics



Leave a reply



Submit