How to Access Openxml Content by Page Number

How to access OpenXML content by page number?

This is how I ended up doing it.

  public void OpenWordprocessingDocumentReadonly()
{
string filepath = @"C:\...\test.docx";
// Open a WordprocessingDocument based on a filepath.
Dictionary<int, string> pageviseContent = new Dictionary<int, string>();
int pageCount = 0;
using (WordprocessingDocument wordDocument =
WordprocessingDocument.Open(filepath, false))
{
// Assign a reference to the existing document body.
Body body = wordDocument.MainDocumentPart.Document.Body;
if (wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text != null)
{
pageCount = Convert.ToInt32(wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text);
}
int i = 1;
StringBuilder pageContentBuilder = new StringBuilder();
foreach (var element in body.ChildElements)
{
if (element.InnerXml.IndexOf("<w:br w:type=\"page\" />", StringComparison.OrdinalIgnoreCase) < 0)
{
pageContentBuilder.Append(element.InnerText);
}
else
{
pageviseContent.Add(i, pageContentBuilder.ToString());
i++;
pageContentBuilder = new StringBuilder();
}
if (body.LastChild == element && pageContentBuilder.Length > 0)
{
pageviseContent.Add(i, pageContentBuilder.ToString());
}
}
}
}

Downside: This wont work in all scenarios. This will work only when you have a page break, but if you have text extended from page 1 to page 2, there is no identifier to know you are in page two.

how to get page numbers based on openxmlelement

After a lot of ground work, I found that, page number cannot be retrieved using openxml element.
We can approximate it. But we cannot be sure. Because Page numbers are rendered by word processor layout engine. This happens after all the OpenXML elements are passed to word processor.
We can calculate it with LastRenderedPageBreak. But we cannot be sure that location of the element is correct.

So, I would suggest to go with UpdateFieldsOnOpen or Macro for an easier solution.

Can't get the pages count from a word document with OpenXml

Since

  1. pagination is a dynamic property dependent upon rendering,
  2. any given DOCX file may or may not have ever been rendered, and
  3. OpenXML SDK does not render or perform calculations needed for rendering,

you cannot necessarily obtain a page count from an arbitrary DOCX file.

For further details and some limited work-arounds, see How to access OpenXML content by page number?

Automatically Update Page Numbers for Table of Contents in DOCX generated using OpenXML SDK

Since the page numbers can be different depending on how the opening application renders the document, i don't think that there is a way to do that by means of the OpenXml-SDK.

You can update fields/tocs using a macro or automation:
How to automatically update tables of contents



Related Topics



Leave a reply



Submit