.Doc to Pdf Using Python

.doc to pdf using python

A simple example using comtypes, converting a single file, input and output filenames given as commandline arguments:

import sys
import os
import comtypes.client

wdFormatPDF = 17

in_file = os.path.abspath(sys.argv[1])
out_file = os.path.abspath(sys.argv[2])

word = comtypes.client.CreateObject('Word.Application')
doc = word.Documents.Open(in_file)
doc.SaveAs(out_file, FileFormat=wdFormatPDF)
doc.Close()
word.Quit()

You could also use pywin32, which would be the same except for:

import win32com.client

and then:

word = win32com.client.Dispatch('Word.Application')

Convert Microsoft Word document to PDF using Python

i solved this problem and fixed the code has following

import os
import win32com.client
import re
path = (r'D:\programing\test')
word_file_names = []
word = win32com.client.Dispatch('Word.Application')
for dirpath, dirnames, filenames in os.walk(path):
for f in filenames:
if f.lower().endswith(".docx") :
new_name = f.replace(".docx", ".pdf")
in_file =(dirpath + '/'+ f)
new_file =(dirpath + '/' + new_name)
doc = word.Documents.Open(in_file)
doc.SaveAs(new_file, FileFormat = 17)
doc.Close()
if f.lower().endswith(".doc"):
new_name = f.replace(".doc", ".pdf")
in_file =(dirpath +'/' + f)
new_file =(dirpath +'/' + new_name)
doc = word.Documents.Open(in_file)
doc.SaveAs(new_file, FileFormat = 17)
doc.Close()
word.Quit()

Error when trying to convert .doc to .pdf with Python

Example:

# Open Microsoft DOC 

app = client.Dispatch("Word.Application")

# Read Doc File
doc = app.Documents.Open('C:/Users/<User>/Downloads/document.docx')

# Convert into PDF File
doc.ExportAsFixedFormat('C:/Users/<User>/Downloads/document.pdf', 17, Item=7, CreateBookmarks=0)
app.Quit()

If it still doesn't work, try deleting the cache files inside the "gen" folder in the path:

C:\Users\\AppData\Local\Programs\Python\Python39\Lib\site-packages\comtypes\gen

Try using the msoffice2pdf library using Microsoft Office or LibreOffice installed in the environment.

https://pypi.org/project/msoffice2pdf/

I need to convert .doc and .docx files to .pdf using python

I have been in the similar problem earlier,

My suggestion:

sorry there is no such direct python library to handle Microsoft office formats specially (.doc)

So try to use LibreOffice as a service in Ubuntu its "libreoffice"
if windows its "soffice.exe" use this in command line to convert the document to .PDF without opening LibreOffice

and its easy and fast too and more over gives almost perfect conversion of the file.

A sample:

For Windows:

    C:\Program Files (x86)\LibreOffice 4\program\soffice.exe" --headless --convert-to pdf "input_file_path" --outdir "output_dir_path"

This will convert the input file into pdf in the given output directory without opening the LibreOffice ans just using it as a service.

To run this command from python you can use "subprocess" like libraries.

Docx to pdf using pandoc in python

The second argument to convert_file is output format, or, in this case, the format through which pandoc generates the pdf. Pandoc doesn't know how to produce a PDF through docx, hence the error.

Use pypandoc.convert_file('thisisdoc.docx', 'latex', outputfile="thisisdoc.pdf") or pypandoc.convert_file('thisisdoc.docx', 'pdf', outputfile="thisisdoc.pdf") instead.

Converting .docx to .pdf in Python (File locked for editing)

It works fine when I run your code no matter the path. Maybe you need to exit ms Word before you try and run it?



Related Topics



Leave a reply



Submit