How to Read .Docx File

How can i read .docx file?

The easiest way is probably to use the Open XML SDK 2.0

Get Code Snippets for Visual Studio 2008 for some examples

And I would highly recommend downloading the Open XML SDK productivity tool which will help you understand how the Open XML files are structured, and can even help you generate source code to use with the SDK based on the structure of your documents. You can download the tool from the same page as the SDK. It's 100MB, but it's worth the download.

python: find numbers in docx file and replace

You can use the docx library to read the content of .docx files.

pip install python-docx

Adapting some code from here and combining with the code you posted I got:

import docx

def getText(filename):
doc = docx.Document(filename)
fullText = []
for para in doc.paragraphs:
fullText.append(para.text)
return '\n'.join(fullText)

text = getText('Doc1.docx')

a = [int(s) for s in text.split() if s.isdigit()]

which worked for me with a simple test file - although you may need to adjust some parts depending on how you want the search for numbers to work.



Related Topics



Leave a reply



Submit