How to read pdf files one by one from a folder in python
First read all files that are available under that directory
from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
And then run your code for each file in that list
import PyPDF2
from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
for file in onlyfiles:
fileReader = PyPDF2.PdfFileReader(open(file,'rb'))
count = 0
while count < 3:
pageObj = fileReader.getPage(count)
count +=1
text = pageObj.extractText()
os.listdir() will get you everything that's in a directory - files and directories. So be careful to have only pdf files in your path or you will need to implement simple filtration for list.
Edit 1
You can also use glob module, as it does pattern matching.
>>> import glob
>>> print(glob.glob('/home/rszamszur/*.sh'))
['/home/rszamszur/work-monitors.sh', '/home/rszamszur/default-monitor.sh', '/home/rszamszur/home-monitors.sh']
Key difference between OS module and glob is that OS will work for all systems, where glob only for Unix like.
Read and extract multiple PDF's from multiple folders using python
Maybe you could try something like this :
# your code
import os
folder = ['A','B','C','D','E','F','G','H']
allyourpdf = []
for fold in folder:
allyourfiles = os.listdir(fold)
firstpdf = ""
for i in allyourfiles:
if '.pdf' in i:
firstpdf = i
break
with open('F:/technophile/Proj/SOURCE/'+fold+firstpdf, 'rb') as fh:
for page in PDFPage.get_pages(fh, caching=True, check_extractable=True):
page_interpreter.process_page(page)
text = fake_file_handle.getvalue()
allyourpdf.append(text)
# your code
I think it should work
Related Topics
Pandas - Calculate Average of Columns With Condition Based on Values in Other Columns
How to Read from S3 in Pyspark Running in Local Mode
Pandas Extract Numbers from Column into New Columns
Why Does Tkinter Image Not Show Up If Created in a Function
Defining and Calling a Function Within a Python Class
Python: Searching for Common Values in Two Files
How to Find the Unit Digits of a Specific Number
Create an Array With a Pre Determined Mean and Standard Deviation
How to Download Multiple Files or an Entire Folder from Google Colab
How to Insert a Checkbox in a Django Form
Matplotlib Rotate Image File by X Degrees
Python File Opens and Immediately Closes
How to Increment a Variable on a for Loop in Jinja Template
How to Change Default Python Version
Collect_List by Preserving Order Based on Another Variable
Missing 1 Required Positional Argument - Issue
How to Code My Bot to Generate Random Images from One Command