Merge Multiple Word Documents into One Open Xml

Merge multiple word documents into one Open Xml

Using openXML SDK only, you can use AltChunk element to merge the multiple document into one.

This link the-easy-way-to-assemble-multiple-word-documents and this one How to Use altChunk for Document Assembly provide some samples.

EDIT 1

Based on your code that uses altchunk in the updated question (update#1), here is the VB.Net code I have tested and that works like a charm for me:

Using myDoc = DocumentFormat.OpenXml.Packaging.WordprocessingDocument.Open("D:\\Test.docx", True)
Dim altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString().Substring(0, 2)
Dim mainPart = myDoc.MainDocumentPart
Dim chunk = mainPart.AddAlternativeFormatImportPart(
DocumentFormat.OpenXml.Packaging.AlternativeFormatImportPartType.WordprocessingML, altChunkId)
Using fileStream As IO.FileStream = IO.File.Open("D:\\Test1.docx", IO.FileMode.Open)
chunk.FeedData(fileStream)
End Using
Dim altChunk = New DocumentFormat.OpenXml.Wordprocessing.AltChunk()
altChunk.Id = altChunkId
mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements(Of DocumentFormat.OpenXml.Wordprocessing.Paragraph).Last())
mainPart.Document.Save()
End Using

EDIT 2

The second issue (update#2)

This code is appending the Test2 data twice, in place of Test1 data as
well.

is related to altchunkid.

For each document you want to merge in the main document, you need to:

  1. add an AlternativeFormatImportPart in the mainDocumentPart with an Id which must be unique. This element contains the inserted data
  2. add in the body an Altchunk element in which you set the id to reference the previous AlternativeFormatImportPart.

In your code, you are using the same Id for all the AltChunks. It's why you see many time the same text.

I am not sure the altchunkid will be unique with your code: string altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString().Substring(0, 2);

If you don't need to set a specific value, I recommend you to not set explicitly the AltChunkId when you add the AlternativeFormatImportPart. Instead, you get the one generated by the SDK like this:

VB.Net

Dim chunk As AlternativeFormatImportPart = mainPart.AddAlternativeFormatImportPart(DocumentFormat.OpenXml.Packaging.AlternativeFormatImportPartType.WordprocessingML)
Dim altchunkid As String = mainPart.GetIdOfPart(chunk)

C#

AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(DocumentFormat.OpenXml.Packaging.AlternativeFormatImportPartType.WordprocessingML);
string altchunkid = mainPart.GetIdOfPart(chunk);

Merge multiple Word Document styles into one Open Xml

What Word does with formatting ("styling") depends on the origin of the formatting and algorithms inherent in Word on how to handle formatting conflicts. Based on the information you provide it's difficult to know exactly what the situation is with these documents, but here are some rules of thumb:

  1. Word will retain direct formatting (such as clicking Bold or Italics)

  2. When in-coming documents have styles of the same name as styles already present in the target document, the in-coming styles will take on the definitions of the styles in the target document. This is by design as Word was conceived as a document production tool, not an archiving tool.

I'm guessing that (2) is the situation with which you're confronted. The only way to retain the style definitions is to first give the styles different names / define a different set of styles and apply those to the text in place of the existing styles. For example, if the Normal style of two documents is defined differently you'd need to copy the style definition to a new style (Normal1, for example) then replace the id used for normal in the various Parts that make up the document with the id used for Normal1.

Something I've never tried would be to rename the id and name for Normal so that you wouldn't need the last step. But you might have to create a Normal style using the "old" id and name as Word expects that to be in the document. (But you could try without, since Word might create it automatically without thinking the document is invalid).

How to merge word documents with different headers using openxml?

I encountered this question a few years ago and spent quite some time on it; I eventually wrote a blog article that links to a sample file. Achieving integrating files with headers and footers using Alt-Chunk is not straight-forward. I'll try to cover the essentials, here. Depending on what kinds of content the headers and footers contain (and assuming Microsoft has not addressed any of the problems I originally ran into) it may not be possible to rely soley on AltChunk.

(Note also that there may be Tools/APIs that can handle this - I don't know and asking that on this site would be off-topic.)

Background

Before attacking the problem, it helps to understand how Word handles different headers and footers. To get a feel for it, start Word...

Section Breaks / Unlinking headers/footers

  • Type some text on the page and insert a header
  • Move the focus to the end of the page and go to the Page Layout tab in the Ribbon
  • Page Setup/Breaks/Next Page section break
  • Go into the Header area for this page and note the information in the blue "tags": you'll see a section identifier on the left and "Same as previous" on the right. "Same as Previous" is the default, to create a different Header click the "Link to Previous" button in the Header

So, the rule is:

a section break is required, with unlinked headers (and/or footers),
in order to have different header/footer content within a document.

Master/Sub-documents

Word has an (in)famous functionality called "Master Document" that enables linking outside ("sub") documents into a "master" document. Doing so automatically adds the necessary section breaks and unlinks the headers/footers so that the originals are retained.

  • Go to Word's Outline view
  • Click "Show Document"
  • Use "Insert" to insert other files

Notice that two section breaks are inserted, one of the type "Next page" and the other "Continuous". The first is inserted in the file coming in; the second in the "master" file.

Two section breaks are necessary when inserting a file because the last paragraph mark (which contains the section break for the end of the document) is not carried over to the target document. The section break in the target document carries the information to unlink the in-coming header from those already in the target document.

When the master is saved, closed and re-opened the sub documents are in a "collapsed" state (file names as hyperlinks instead of the content). They can be expanded by going back to the Outline view and clicking the "Expand" button. To fully incorporate a sub-document into the document click on the icon at the top left next to a sub-document then clicking "Unlink".

Merging Word Open XML files

This, then, is the type of environment the Open XML SDK needs to create when merging files whose headers and footers need to be retained. Theoretically, either approach should work. Practically, I had problems with using only section breaks; I've never tested using the Master Document feature in Word Open XML.

Inserting section breaks

Here's the basic code for inserting a section break and unlinking headers before bringing in a file using AltChunk. Looking at my old posts and articles, as long as there's no complex page numbering involved, it works:

private void btnMergeWordDocs_Click(object sender, EventArgs e)
{
string sourceFolder = @"C:\Test\MergeDocs\";
string targetFolder = @"C:\Test\";

string altChunkIdBase = "acID";
int altChunkCounter = 1;
string altChunkId = altChunkIdBase + altChunkCounter.ToString();

MainDocumentPart wdDocTargetMainPart = null;
Document docTarget = null;
AlternativeFormatImportPartType afType;
AlternativeFormatImportPart chunk = null;
AltChunk ac = null;
using (WordprocessingDocument wdPkgTarget = WordprocessingDocument.Create(targetFolder + "mergedDoc.docx", DocumentFormat.OpenXml.WordprocessingDocumentType.Document, true))
{
//Will create document in 2007 Compatibility Mode.
//In order to make it 2010 a Settings part must be created and a CompatMode element for the Office version set.
wdDocTargetMainPart = wdPkgTarget.MainDocumentPart;
if (wdDocTargetMainPart == null)
{
wdDocTargetMainPart = wdPkgTarget.AddMainDocumentPart();
Document wdDoc = new Document(
new Body(
new Paragraph(
new Run(new Text() { Text = "First Para" })),
new Paragraph(new Run(new Text() { Text = "Second para" })),
new SectionProperties(
new SectionType() { Val = SectionMarkValues.NextPage },
new PageSize() { Code = 9 },
new PageMargin() { Gutter = 0, Bottom = 1134, Top = 1134, Left = 1318, Right = 1318, Footer = 709, Header = 709 },
new Columns() { Space = "708" },
new TitlePage())));
wdDocTargetMainPart.Document = wdDoc;
}
docTarget = wdDocTargetMainPart.Document;
SectionProperties secPropLast = docTarget.Body.Descendants<SectionProperties>().Last();
SectionProperties secPropNew = (SectionProperties)secPropLast.CloneNode(true);
//A section break must be in a ParagraphProperty
Paragraph lastParaTarget = (Paragraph)docTarget.Body.Descendants<Paragraph>().Last();
ParagraphProperties paraPropTarget = lastParaTarget.ParagraphProperties;
if (paraPropTarget == null)
{
paraPropTarget = new ParagraphProperties();
}
paraPropTarget.Append(secPropNew);
Run paraRun = lastParaTarget.Descendants<Run>().FirstOrDefault();
//lastParaTarget.InsertBefore(paraPropTarget, paraRun);
lastParaTarget.InsertAt(paraPropTarget, 0);

//Process the individual files in the source folder.
//Note that this process will permanently change the files by adding a section break.
System.IO.DirectoryInfo di = new System.IO.DirectoryInfo(sourceFolder);
IEnumerable<System.IO.FileInfo> docFiles = di.EnumerateFiles();
foreach (System.IO.FileInfo fi in docFiles)
{
using (WordprocessingDocument pkgSourceDoc = WordprocessingDocument.Open(fi.FullName, true))
{
IEnumerable<HeaderPart> partsHeader = pkgSourceDoc.MainDocumentPart.GetPartsOfType<HeaderPart>();
IEnumerable<FooterPart> partsFooter = pkgSourceDoc.MainDocumentPart.GetPartsOfType<FooterPart>();
//If the source document has headers or footers we want to retain them.
//This requires inserting a section break at the end of the document.
if (partsHeader.Count() > 0 || partsFooter.Count() > 0)
{
Body sourceBody = pkgSourceDoc.MainDocumentPart.Document.Body;
SectionProperties docSectionBreak = sourceBody.Descendants<SectionProperties>().Last();
//Make a copy of the document section break as this won't be imported into the target document.
//It needs to be appended to the last paragraph of the document
SectionProperties copySectionBreak = (SectionProperties)docSectionBreak.CloneNode(true);
Paragraph lastpara = sourceBody.Descendants<Paragraph>().Last();
ParagraphProperties paraProps = lastpara.ParagraphProperties;
if (paraProps == null)
{
paraProps = new ParagraphProperties();
lastpara.Append(paraProps);
}
paraProps.Append(copySectionBreak);
}
pkgSourceDoc.MainDocumentPart.Document.Save();
}
//Insert the source file into the target file using AltChunk
afType = AlternativeFormatImportPartType.WordprocessingML;
chunk = wdDocTargetMainPart.AddAlternativeFormatImportPart(afType, altChunkId);
System.IO.FileStream fsSourceDocument = new System.IO.FileStream(fi.FullName, System.IO.FileMode.Open);
chunk.FeedData(fsSourceDocument);
//Create the chunk
ac = new AltChunk();
//Link it to the part
ac.Id = altChunkId;
docTarget.Body.InsertAfter(ac, docTarget.Body.Descendants<Paragraph>().Last());
docTarget.Save();
altChunkCounter += 1;
altChunkId = altChunkIdBase + altChunkCounter.ToString();
chunk = null;
ac = null;
}
}
}

If there's complex page numbering (quoted from my blog article):

Unfortunately, there’s a bug in the Word application when integrating
Word document “chunks” into the main document. The process has the
nasty habit of not retaining a number of SectionProperties, among them
the one that sets whether a section has a Different First Page
() and the one to restart Page Numbering () in a section. As long as your documents don’t need to
manage these kinds of headers and footers you can probably use the
“altChunk” approach.

But if you do need to handle complex headers and footers the only
method currently available to you is to copy in the each document in
its entirety, part-by-part. This is a non-trivial undertaking, as
there are numerous possible types of Parts that can be associated not
only with the main document body, but also with each header and footer
part.

...or try the Master/Sub Document approach.

Master/Sub Document

This approach will certainly maintain all information, it will open as a Master document, however, and the Word API (either the user or automation code) is required to "unlink" the sub-documents to turn it into a single, integrated document.

Opening a Master Document file in the Open XML SDK Productivity Tool shows that inserting sub documents into the master document is a fairly straight-forward procedure:

The underlying Word Open XML for the document with one sub-document:

<w:body xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:p>
<w:pPr>
<w:pStyle w:val="Heading1" />
</w:pPr>
<w:subDoc r:id="rId6" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" />
</w:p>
<w:sectPr>
<w:headerReference w:type="default" r:id="rId7" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" />
<w:type w:val="continuous" />
<w:pgSz w:w="11906" w:h="16838" />
<w:pgMar w:top="1417" w:right="1417" w:bottom="1134" w:left="1417" w:header="708" w:footer="708" w:gutter="0" />
<w:cols w:space="708" />
<w:docGrid w:linePitch="360" />
</w:sectPr>
</w:body>

and the code:

public class GeneratedClass
{
// Creates an Body instance and adds its children.
public Body GenerateBody()
{
Body body1 = new Body();

Paragraph paragraph1 = new Paragraph();

ParagraphProperties paragraphProperties1 = new ParagraphProperties();
ParagraphStyleId paragraphStyleId1 = new ParagraphStyleId(){ Val = "Heading1" };

paragraphProperties1.Append(paragraphStyleId1);
SubDocumentReference subDocumentReference1 = new SubDocumentReference(){ Id = "rId6" };

paragraph1.Append(paragraphProperties1);
paragraph1.Append(subDocumentReference1);

SectionProperties sectionProperties1 = new SectionProperties();
HeaderReference headerReference1 = new HeaderReference(){ Type = HeaderFooterValues.Default, Id = "rId7" };
SectionType sectionType1 = new SectionType(){ Val = SectionMarkValues.Continuous };
PageSize pageSize1 = new PageSize(){ Width = (UInt32Value)11906U, Height = (UInt32Value)16838U };
PageMargin pageMargin1 = new PageMargin(){ Top = 1417, Right = (UInt32Value)1417U, Bottom = 1134, Left = (UInt32Value)1417U, Header = (UInt32Value)708U, Footer = (UInt32Value)708U, Gutter = (UInt32Value)0U };
Columns columns1 = new Columns(){ Space = "708" };
DocGrid docGrid1 = new DocGrid(){ LinePitch = 360 };

sectionProperties1.Append(headerReference1);
sectionProperties1.Append(sectionType1);
sectionProperties1.Append(pageSize1);
sectionProperties1.Append(pageMargin1);
sectionProperties1.Append(columns1);
sectionProperties1.Append(docGrid1);

body1.Append(paragraph1);
body1.Append(sectionProperties1);
return body1;
}
}

c# / openxml merge of word documents to one, fails --closed--

Fixed it. It seems word cannot merge different Formats in one Document. So if you have 2 documents with a footer and other 3 without it just won´t work. Obviously it can happen that some customers have these kind of issues; at least the Code is fine



Related Topics



Leave a reply



Submit