Itext Get Field Coordinates from Existing PDF

Itext get field coordinates from existing pdf

To completely solve the problem, I wrote this java class:

// GetSigPos.java 
import com.itextpdf.text.*;
import com.itextpdf.text.pdf.*;
import java.io.*;
//import java.util.*;
import java.util.List;
//import java.awt.List;

class GetSigPos {
public static void main(String[] args) throws IOException {
String pdfFile = args[0];
PdfReader reader = new PdfReader(pdfFile);

AcroFields fields = reader.getAcroFields();

for(String signame : fields.getBlankSignatureNames()) {
List<AcroFields.FieldPosition> positions = fields.getFieldPositions(signame);
Rectangle rect = positions.get(0).position; // In points:
float left = rect.getLeft();
float bTop = rect.getTop();
float width = rect.getWidth();
float height = rect.getHeight();

int page = positions.get(0).page;
Rectangle pageSize = reader.getPageSize(page);
float pageHeight = pageSize.getTop();
float top = pageHeight - bTop;

System.out.print(signame + "::" + page + "::" + left + "::" + top + "::" + width + "::" + height + "\n");
}
}
}

Then I can run it in command line:

javac GetSigPos.java
java GetSigPos "MyForm.pdf"

Or in my php program I can execute them using this command:

exec('java -cp .:/usr/local/bin/pdfbox/itextpdf-5.4.4.jar:/usr/local/bin/pdfbox GetSigPos "'.$pdfName.'" 2>&1', $output);

echo '<pre>';
print_r($output);
echo '</pre>';

P.S. Don't forget to type CLASSPATH to your java! I'm using Centos 6:

vi /root/.bash_rofile

And type this:

export JAVA_HOME=/usr/lib/jvm/jre-1.5.0-gcj
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:/usr/local/bin/pdfbox/itextpdf-5.4.4.jar:/usr/local/bin/pdfbox

Possible to position fillable field in PDF to a coordinate using iText or some other API?

Your question isn't entirely clear, and the answer is different if you make different assumptions.

Assumption 1: Suppose that you have a PDF that consists of an image that fills the complete page. You now want to add text fields at positions that you know in advance.

In this case, you'd use PdfStamper and the addAnnotation() method as is done in the answer to the StackOverflow question How can I add a new AcroForm field to a PDF?

PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
// create a field for which you define the coordinates using a Rectangle
stamper.addAnnotation(field, 1);
stamper.close();

Here we add field to page 1 using the addAnnotation() method.

Now for the question: how to create that field object. That's easy. See for instance the ReadOnlyField example:

Rectangle rect = new Rectangle(36, 720, 144, 806);
TextField tf = new TextField(stamper.getWriter(), rect, "text");
tf.setOptions(TextField.MULTILINE);
PdfFormField field = tf.getTextField();

Note that I use the coordinate of the lower-left corner (36, 720) and the upper-right corner (144, 806) to create a Rectangle object. I create a TextField using the stamper's PdWriter instance, that rect and I give that field the name text. Assuming that you want the text that is entered to be wrapped, I made the text field a MULTILINE field. I then obtain a PdfFormFieldinstance from the TextField object.

Assumption 2: you are creating a PDF document from scratch in which you create a page to which you add an image with the same size of the page. Now you just want to add form fields to add text. There are many examples on how to define and add a text field on the official iText web site: MultiLineField, TextFields, GenericFields, CreateFormInTable, and many more.

You'll also find a good example in the question How to add a hidden text field?. The example in the question shows how to add a visible text field; the answer shows how to hide it.

In this example, x and y are the coordinates of the lower-left corner, whereas w and h are the width and the height of the field:

TextField field = new TextField(writer, new Rectangle(x, y - h, x + w, y), name);
field.BackgroundColor = new BaseColor(bgcolor[0], bgcolor[1], bgcolor[2]);
field.BorderColor = new BaseColor(
bordercolor[0], bordercolor[1], bordercolor[2]);
field.BorderWidth = border;
field.BorderStyle = PdfBorderDictionary.STYLE_SOLID;
field.Text = text;
writer.AddAnnotation(field.GetTextField());

This is an iTextSharp example (written in C#), but it's very easy to port it to Java.

Finally: maybe you already knew all of this. Maybe you were just wondering what all these coordinates are about. The answer to this question can also be found on the official iText web site:

  • Where is the origin (x,y) of a PDF page?
  • How should I interpret the coordinates of a rectangle in PDF?

Almost all of the links in my answer refer to examples and answers that were written in answer to previous questions on StackOverflow. Please refrain from saying things like I have searched for days on how to use iText to accomplish setting up fillable field and positioning text of any kind to an absolute position because it is hard to believe for people who know that all the answers can be found on the official iText web site. Your boss might wonder which sites you were searching for all those days.

Pdf form fields position retrieval with Itext

I was about to close this question as a duplicate of Find field absolute position and dimension by acrokey but that's a Java answer, and although most developers have no problem converting the Java to C#, it may be helpful for some developers to get the C# answer.

Fields in a PDF are visualized using widget annotations. One field can correspond with different of those annotations. For instance, you could have a field named name that is visualized on every page. In this case, the value of this field would be shown on every page.

There's a GetFieldPositions() method that returns a list of multiple positions, one for every widget annotations.

This is some code I copied from the answer to the question iTextSharp GetFieldPositions to SetSimpleColumn

IList<AcroFields.FieldPosition> fieldPositions = fields.GetFieldPositions("fieldNameInThePDF");
if (fieldPositions == null || fieldPositions.Count <= 0) throw new ApplicationException("Error locating field");
AcroFields.FieldPosition fieldPosition = fieldPositions[0];
left = fieldPosition.position.Left;
right = fieldPosition.position.Right;
top = fieldPosition.position.Top;
bottom = fieldPosition.position.Bottom;

If one field corresponds with one widget annotation, then left, right, top, and bottom will give you the left, right, top and bottom coordinate of the field. The width of the field can be calculated like this: right - left; the height like this: top - bottom. These values are expressed in user units. By default there are 72 user units in one inch.

If your document contains more than one page, then fieldPosition.page will give you the page number where you'll find the field.

All of this is documented on http://developers.itextpdf.com/

How can I get the page number and position for a specific pdf form field and insert a table in it's place in iText7?

Field by itself does not have any position or page number. However, its widget annotations do.

To access those, you can use field.getWidgets(). You then can use annotation.getPage() and annotation.getRectangle() to get information about the annotation's position.

To layout an single element at some specific position, one of the best choices is using Canvas layout object. Annotation can then be removed with page.removeAnnotation(annotation);.

Overall, this compiles into following solution:

String fieldName = "Text1";
PdfFormField field = form.getField(fieldName);

for (PdfAnnotation annotation : field.getWidgets()) {
PdfPage page = annotation.getPage();
Rectangle rectangle = annotation.getRectangle().toRectangle();

Table table = new Table(UnitValue.createPointArray(new float[] {-1, -1}));
table.setHorizontalAlignment(HorizontalAlignment.CENTER);
Cell cell = new Cell();
cell.add(new Paragraph("Name"));
table.addCell(cell);
cell = new Cell();
cell.add(new Paragraph("Address"));
table.addCell(cell);

Canvas canvas = new Canvas(new PdfCanvas(page), pdfDocument, rectangle);
canvas.add(table);
canvas.close();

page.removeAnnotation(annotation);
}

Of course, you will have to take full responsibility for creating a table of proper size so that it fits into the area you want. You can use smaller font sizes etc.

How to get the text position from the pdf page in iText 7

First, SimpleTextExtractionStrategy is not exactly the 'smartest' strategy (as the name would suggest.

Second, if you want the position you're going to have to do a lot more work. TextExtractionStrategy assumes you are only interested in the text.

Possible implementation:

  • implement IEventListener
  • get notified for all events that render text, and store the corresponding TextRenderInfo object
  • once you're finished with the document, sort these objects based on their position in the page
  • loop over this list of TextRenderInfo objects, they offer both the text being rendered and the coordinates

how to:

  1. implement ITextExtractionStrategy (or extend an existing
    implementation)
  2. use PdfTextExtractor.getTextFromPage(doc.getPage(pageNr), strategy), where strategy denotes the strategy you created in step 1
  3. your strategy should be set up to keep track of locations for the text it processed

ITextExtractionStrategy has the following method in its interface:

@Override
public void eventOccurred(IEventData data, EventType type) {

// you can first check the type of the event
if (!type.equals(EventType.RENDER_TEXT))
return;

// now it is safe to cast
TextRenderInfo renderInfo = (TextRenderInfo) data;
}

Important to keep in mind is that rendering instructions in a pdf do not need to appear in order.
The text "Lorem Ipsum Dolor Sit Amet" could be rendered with instructions similar to:
render "Ipsum Do"

render "Lorem "

render "lor Sit Amet"

You will have to do some clever merging (depending on how far apart two TextRenderInfo objects are), and sorting (to get all the TextRenderInfo objects in the proper reading order.

Once that's done, it should be easy.



Related Topics



Leave a reply



Submit