How to Convert Word Files to Pdf Programmatically

How do I convert Word files to PDF programmatically?

Use a foreach loop instead of a for loop - it solved my problem.

int j = 0;
foreach (Microsoft.Office.Interop.Word.Page p in pane.Pages)
{
var bits = p.EnhMetaFileBits;
var target = path1 +j.ToString()+ "_image.doc";
try
{
using (var ms = new MemoryStream((byte[])(bits)))
{
var image = System.Drawing.Image.FromStream(ms);
var pngTarget = Path.ChangeExtension(target, "png");
image.Save(pngTarget, System.Drawing.Imaging.ImageFormat.Png);
}
}
catch (System.Exception ex)
{
MessageBox.Show(ex.Message);
}
j++;
}

Here is a modification of a program that worked for me. It uses Word 2007 with the Save As PDF add-in installed. It searches a directory for .doc files, opens them in Word and then saves them as a PDF. Note that you'll need to add a reference to Microsoft.Office.Interop.Word to the solution.

using Microsoft.Office.Interop.Word;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;

...

// Create a new Microsoft Word application object
Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();

// C# doesn't have optional arguments so we'll need a dummy value
object oMissing = System.Reflection.Missing.Value;

// Get list of Word files in specified directory
DirectoryInfo dirInfo = new DirectoryInfo(@"\\server\folder");
FileInfo[] wordFiles = dirInfo.GetFiles("*.doc");

word.Visible = false;
word.ScreenUpdating = false;

foreach (FileInfo wordFile in wordFiles)
{
// Cast as Object for word Open method
Object filename = (Object)wordFile.FullName;

// Use the dummy value as a placeholder for optional arguments
Document doc = word.Documents.Open(ref filename, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing);
doc.Activate();

object outputFileName = wordFile.FullName.Replace(".doc", ".pdf");
object fileFormat = WdSaveFormat.wdFormatPDF;

// Save document into PDF Format
doc.SaveAs(ref outputFileName,
ref fileFormat, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing);

// Close the Word document, but leave the Word application open.
// doc has to be cast to type _Document so that it will find the
// correct Close method.
object saveChanges = WdSaveOptions.wdDoNotSaveChanges;
((_Document)doc).Close(ref saveChanges, ref oMissing, ref oMissing);
doc = null;
}

// word has to be cast to type _Application so that it will find
// the correct Quit method.
((_Application)word).Quit(ref oMissing, ref oMissing, ref oMissing);
word = null;

How convert word document to pdf programmatically

Since you are building HTML to convert, you can use iTextSharp API to go from HTML to PDF: ITextSharp HTML to PDF?. iTextSharp is free.

Convert Word doc, docx and Excel xls, xlsx to PDF with PHP

I found a solution to my issue and after a request, will post it here to help others. Apologies if I missed any details, it's been a while since I worked on this solution.

The first thing that is required is to install Openoffice.org on the server. I requested my hosting provider to install the open office RPM on my VPS. This can be done through WHM directly.

Now that the server has the capability to handle MS Office files you are able to convert the files by executing command line instructions via PHP. To handle this, I found PyODConverter: https://github.com/mirkonasato/pyodconverter

I created a directory on the server and placed the PyODConverter python file within it. I also created a plain text file above the web root (I named it "adocpdf"), with the following command line instructions in it:

directory=$1
filename=$2
extension=$3
SERVICE='soffice'
if [ "`ps ax|grep -v grep|grep -c $SERVICE`" -lt 1 ]; then
unset DISPLAY
/usr/bin/soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard &
sleep 5s
fi
python /home/website/python/DocumentConverter.py /home/website/$directory$filename$extension /home/website/$directory$filename.pdf

This checks that the openoffice.org libraries are running and then calls the PyODConverter script to process the file and output it as a PDF. The 3 variables on the first three lines are provided when the script is executed from with a PHP file. The delay ("sleep 5s") is used to ensure that openoffice.org has enough to time to initiate if required. I have used this for months now and the 5s gap seems to give enough breathing room.

The script will create a PDF version of the document in the same directory as the original.

Finally, initiating the conversion of a Word / Excel file from within PHP (I have it within a function that checks if the file we are dealing with is a word / excel document)...

//use openoffice.org
$output = array();
$return_var = 0;
exec("/opt/adocpdf {$directory} {$filename} {$extension}", $output, $return_var);

This PHP function is called once the Word / Excel file has been uploaded to the server. The 3 variables in the exec() call relate directly to the 3 at the start of the plain text script above. Note that the $directory variable requires no leading forward slash if the file for conversion is within the web root.

OK, that's it! Hopefully this will be useful to someone and save them the difficulties and learning curve I faced.

.doc to pdf using python

A simple example using comtypes, converting a single file, input and output filenames given as commandline arguments:

import sys
import os
import comtypes.client

wdFormatPDF = 17

in_file = os.path.abspath(sys.argv[1])
out_file = os.path.abspath(sys.argv[2])

word = comtypes.client.CreateObject('Word.Application')
doc = word.Documents.Open(in_file)
doc.SaveAs(out_file, FileFormat=wdFormatPDF)
doc.Close()
word.Quit()

You could also use pywin32, which would be the same except for:

import win32com.client

and then:

word = win32com.client.Dispatch('Word.Application')

How can I convert a Word document to PDF?

This is quite a hard task, ever harder if you want perfect results (impossible without using Word) as such the number of APIs that just do it all for you in pure Java and are open source is zero I believe (Update: I am wrong, see below).

Your basic options are as follows:

  1. Using JNI/a C# web service/etc script MS Office (only option for 100% perfect results)
  2. Using the available APIs script Open Office (90+% perfect)
  3. Use Apache POI & iText (very large job, will never be perfect).

Update - 2016-02-11
Here is a cut down copy of my blog post on this subject which outlines existing products that support Word-to-PDF in Java.

Converting Microsoft Office (Word, Excel) documents to PDFs in Java

Three products that I know of can render Office documents:

yeokm1/docs-to-pdf-converter
Irregularly maintained, Pure Java, Open Source
Ties together a number of libraries to perform the conversion.

xdocreport
Actively developed, Pure Java, Open Source
It's Java API to merge XML document created with MS Office (docx) or OpenOffice (odt), LibreOffice (odt) with a Java model to generate report and convert it if you need to another format (PDF, XHTML...).

Snowbound Imaging SDK
Closed Source, Pure Java
Snowbound appears to be a 100% Java solution and costs over $2,500. It contains samples describing how to convert documents in the evaluation download.

OpenOffice API
Open Source, Not Pure Java - Requires Open Office installed
OpenOffice is a native Office suite which supports a Java API. This supports reading Office documents and writing PDF documents. The SDK contains an example in document conversion (examples/java/DocumentHandling/DocumentConverter.java). To write PDFs you need to pass the "writer_pdf_Export" writer rather than the "MS Word 97" one.
Or you can use the wrapper API JODConverter.

JDocToPdf - Dead as of 2016-02-11
Uses Apache POI to read the Word document and iText to write the PDF. Completely free, 100% Java but has some limitations.

How to convert Word and Excel documents to PDF programmatically?

Office 2007 allows for this. I have found PDFCreator to be good, the VBA is included in sample files, and have heard that CutePDF is also good. PDFCreator and CutePDF are free.

To work without Office, you would need viewers, as far as I know:
http://www.microsoft.com/downloads/details.aspx?FamilyID=c8378bf4-996c-4569-b547-75edbd03aaf0&displaylang=EN

http://www.microsoft.com/downloads/details.aspx?familyid=95E24C87-8732-48D5-8689-AB826E7B8FDF&displaylang=en



Related Topics



Leave a reply



Submit