How to access OpenXML content by page number?
This is how I ended up doing it.
public void OpenWordprocessingDocumentReadonly()
{
string filepath = @"C:\...\test.docx";
// Open a WordprocessingDocument based on a filepath.
Dictionary<int, string> pageviseContent = new Dictionary<int, string>();
int pageCount = 0;
using (WordprocessingDocument wordDocument =
WordprocessingDocument.Open(filepath, false))
{
// Assign a reference to the existing document body.
Body body = wordDocument.MainDocumentPart.Document.Body;
if (wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text != null)
{
pageCount = Convert.ToInt32(wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text);
}
int i = 1;
StringBuilder pageContentBuilder = new StringBuilder();
foreach (var element in body.ChildElements)
{
if (element.InnerXml.IndexOf("<w:br w:type=\"page\" />", StringComparison.OrdinalIgnoreCase) < 0)
{
pageContentBuilder.Append(element.InnerText);
}
else
{
pageviseContent.Add(i, pageContentBuilder.ToString());
i++;
pageContentBuilder = new StringBuilder();
}
if (body.LastChild == element && pageContentBuilder.Length > 0)
{
pageviseContent.Add(i, pageContentBuilder.ToString());
}
}
}
}
Downside: This wont work in all scenarios. This will work only when you have a page break, but if you have text extended from page 1 to page 2, there is no identifier to know you are in page two.
how to get page numbers based on openxmlelement
After a lot of ground work, I found that, page number cannot be retrieved using openxml element.
We can approximate it. But we cannot be sure. Because Page numbers are rendered by word processor layout engine. This happens after all the OpenXML elements are passed to word processor.
We can calculate it with LastRenderedPageBreak. But we cannot be sure that location of the element is correct.
So, I would suggest to go with UpdateFieldsOnOpen or Macro for an easier solution.
Can't get the pages count from a word document with OpenXml
Since
- pagination is a dynamic property dependent upon rendering,
- any given DOCX file may or may not have ever been rendered, and
- OpenXML SDK does not render or perform calculations needed for rendering,
you cannot necessarily obtain a page count from an arbitrary DOCX file.
For further details and some limited work-arounds, see How to access OpenXML content by page number?
Automatically Update Page Numbers for Table of Contents in DOCX generated using OpenXML SDK
Since the page numbers can be different depending on how the opening application renders the document, i don't think that there is a way to do that by means of the OpenXml-SDK.
You can update fields/tocs using a macro or automation:
How to automatically update tables of contents
Related Topics
Reading Excel Files as a Server Process
Add Shape Information to a Listview When Its Created
How to Set the Wix Installer Version to the Current Build Version
C# Iterating Through an Enum? (Indexing a System.Array)
How to Read the Data in a Wav File to an Array
Convert Datetime for MySQL Using C#
405 Method Not Allowed Web API
Shortcut for "Null If Object Is Null, or Object.Member If Object Is Not Null"
How to Do Generic Polymorphism on Open Types in C#
What Is Func, How and When Is It Used
Should I Always Return Ienumerable<T> Instead of Ilist<T>
How to Add a Custom Routed Command in Wpf
How Does Comparison Operator Works with Null Int
How to Disable Copy, Paste and Delete Features on a Textbox Using C#