How to read contents of an Table in MS-Word file Using Python?
Here is what works for me in Python 2.7:
import win32com.client as win32
word = win32.Dispatch("Word.Application")
word.Visible = 0
word.Documents.Open("MyDocument")
doc = word.ActiveDocument
To see how many tables your document has:
doc.Tables.Count
Then, you can select the table you want by its index. Note that, unlike python, COM indexing starts at 1:
table = doc.Tables(1)
To select a cell:
table.Cell(Row = 1, Column= 1)
To get its content:
table.Cell(Row =1, Column =1).Range.Text
Hope that this helps.
EDIT:
An example of a function that returns Column index based on its heading:
def Column_index(header_text):
for i in range(1 , table.Columns.Count+1):
if table.Cell(Row = 1,Column = i).Range.Text == header_text:
return i
then you can access the cell you want this way for example:
table.Cell(Row =1, Column = Column_index("The Column Header") ).Range.Text
Python: How to read a table from word when it is in a text box?
I wrote a solution using another python package docx2python
.
from docx2python import docx2python
doc = docx2python(word_document_path)
doc_body = doc.body
table = doc_body[table_number]
table = pd.DataFrame(table)
How do you read a table from a certain part in a word document using python-docx?
The code here may be of interest: https://github.com/python-openxml/python-docx/issues/276#issuecomment-199502885.
What you're looking for, I believe, is a way to iterate the block level items in a document, in the order they appear. A Word document has two types of block-level items, paragraphs and tables. The function at the link above allows you to iterate those in document order.
Reading Table Contet In Header And Footer In MS-Word File Using Python
Accessing Headers and Footers is a bit tricky. Here is how to do it:
HeaderTable = doc.Sections(1).Headers(1).Range.Tables(1)
FooterTable = doc.Sections(1).Footers(1).Range.Tables(1)
You can get the table count this way:
HeaderTablesCount = doc.Sections(1).Headers(1).Range.Tables.Count
FooterTablesCount = doc.Sections(1).Footers(1).Range.Tables.Count
And get the text from cells this way:
HeaderTable.Cell(1,1).Range.Text
FooterTable.Cell(1,1).Range.Text
How to extract a Word table from multiple files using python docx
You are reinitializing the data
list to []
(empty) for every document. So you carefully collect the row-data from a document and then in the next step throw it away.
If you move data = []
outside the loop then after iterating through the documents it will contain all the extracted rows.
data = []
for name in filenames:
...
data.append(row_data)
print(data)
python -docx to extract table from word docx
Your code works fine for me. How about inserting it into a dataframe?
import pandas as pd
from docx.api import Document
document = Document('test_word.docx')
table = document.tables[0]
data = []
keys = None
for i, row in enumerate(table.rows):
text = (cell.text for cell in row.cells)
if i == 0:
keys = tuple(text)
continue
row_data = dict(zip(keys, text))
data.append(row_data)
print (data)
df = pd.DataFrame(data)
How can i display particular row and column in that table?
We can extract rows and cols based on index with iloc
# iloc[row,columns]
df.iloc[0,:].tolist() # [5,6,7,8] - row index 0
df.iloc[:,0].tolist() # [5,9,13,17] - column index 0
df.iloc[0,0] # 5 - cell(0,0)
df.iloc[1:,2].tolist() # [11,15,19] - column index 2, but skip first row
and so on...
However, if your columns have names (in this case it is numbers) you can do it like this:
#df["name"].tolist()
df[1].tolist() # [5,6,7,8] - column with name 1
print(df)
prints, which is how the table looks like in my sample doc.
1 2 3 4
0 5 6 7 8
1 9 10 11 12
2 13 14 15 16
3 17 18 19 20
How to extract text data in a table created in a docx document
Try using python-docx module instead
pip install python-docx
import docx
doc = docx.Document("document.docx")
for table in doc.tables:
for row in table.rows:
for cell in row.cells:
print(cell.text)
Related Topics
Why Is the Time Complexity of Python's List.Append() Method O(1)
Pip Install Gives Error: Unable to Find Vcvarsall.Bat
Python Command Line Input in a Process
About the Pil Error -- Ioerror: Decoder Zip Not Available
Opencv Python: Cv2.Findcontours - Valueerror: Too Many Values to Unpack
How to Send an Xml Body Using Requests Library
Import Module Works in Terminal But Not in Idle
Scikit-Learn Gridsearchcv with Multiple Repetitions
Importerror: No Module Named Tensorflow
Pyplot Move Alternative Y Axis to Background
Loop Over a List Containing Path to Sound Files
How to Include a Python Package with Hadoop Streaming Job
Search by Objectid in Mongodb with Pymongo
Populate a Pandas Sparsedataframe from a Scipy Sparse Matrix
Python - Using the Multiply Operator to Create Copies of Objects in Lists