Python Progress Bar
There are specific libraries (like this one here) but maybe something very simple would do:
import time
import sys
toolbar_width = 40
# setup toolbar
sys.stdout.write("[%s]" % (" " * toolbar_width))
sys.stdout.flush()
sys.stdout.write("\b" * (toolbar_width+1)) # return to start of line, after '['
for i in xrange(toolbar_width):
time.sleep(0.1) # do real work here
# update the bar
sys.stdout.write("-")
sys.stdout.flush()
sys.stdout.write("]\n") # this ends the progress bar
Note: progressbar2 is a fork of progressbar which hasn't been maintained in years.
Text progress bar in terminal with block characters
Python 3
A Simple, Customizable Progress Bar
Here's an aggregate of many of the answers below that I use regularly (no imports required).
Note: All code in this answer was created for Python 3; see end of answer to use this code with Python 2.
# Print iterations progress
def printProgressBar (iteration, total, prefix = '', suffix = '', decimals = 1, length = 100, fill = '█', printEnd = "\r"):
"""
Call in a loop to create terminal progress bar
@params:
iteration - Required : current iteration (Int)
total - Required : total iterations (Int)
prefix - Optional : prefix string (Str)
suffix - Optional : suffix string (Str)
decimals - Optional : positive number of decimals in percent complete (Int)
length - Optional : character length of bar (Int)
fill - Optional : bar fill character (Str)
printEnd - Optional : end character (e.g. "\r", "\r\n") (Str)
"""
percent = ("{0:." + str(decimals) + "f}").format(100 * (iteration / float(total)))
filledLength = int(length * iteration // total)
bar = fill * filledLength + '-' * (length - filledLength)
print(f'\r{prefix} |{bar}| {percent}% {suffix}', end = printEnd)
# Print New Line on Complete
if iteration == total:
print()
Sample Usage
import time
# A List of Items
items = list(range(0, 57))
l = len(items)
# Initial call to print 0% progress
printProgressBar(0, l, prefix = 'Progress:', suffix = 'Complete', length = 50)
for i, item in enumerate(items):
# Do stuff...
time.sleep(0.1)
# Update Progress Bar
printProgressBar(i + 1, l, prefix = 'Progress:', suffix = 'Complete', length = 50)
Sample Output
Progress: |█████████████████████████████████████████████-----| 90.0% Complete
Update
There was discussion in the comments regarding an option that allows the progress bar to adjust dynamically to the terminal window width. While I don't recommend this, here's a gist that implements this feature (and notes the caveats).
Single-Call Version of The Above
A comment below referenced a nice answer posted in response to a similar question. I liked the ease of use it demonstrated and wrote a similar one, but opted to leave out the import of the sys
module while adding in some of the features of the original printProgressBar
function above.
Some benefits of this approach over the original function above include the elimination of an initial call to the function to print the progress bar at 0% and the use of enumerate
becoming optional (i.e. it is no longer explicitly required to make the function work).
def progressBar(iterable, prefix = '', suffix = '', decimals = 1, length = 100, fill = '█', printEnd = "\r"):
"""
Call in a loop to create terminal progress bar
@params:
iterable - Required : iterable object (Iterable)
prefix - Optional : prefix string (Str)
suffix - Optional : suffix string (Str)
decimals - Optional : positive number of decimals in percent complete (Int)
length - Optional : character length of bar (Int)
fill - Optional : bar fill character (Str)
printEnd - Optional : end character (e.g. "\r", "\r\n") (Str)
"""
total = len(iterable)
# Progress Bar Printing Function
def printProgressBar (iteration):
percent = ("{0:." + str(decimals) + "f}").format(100 * (iteration / float(total)))
filledLength = int(length * iteration // total)
bar = fill * filledLength + '-' * (length - filledLength)
print(f'\r{prefix} |{bar}| {percent}% {suffix}', end = printEnd)
# Initial Call
printProgressBar(0)
# Update Progress Bar
for i, item in enumerate(iterable):
yield item
printProgressBar(i + 1)
# Print New Line on Complete
print()
Sample Usage
import time
# A List of Items
items = list(range(0, 57))
# A Nicer, Single-Call Usage
for item in progressBar(items, prefix = 'Progress:', suffix = 'Complete', length = 50):
# Do stuff...
time.sleep(0.1)
Sample Output
Progress: |█████████████████████████████████████████████-----| 90.0% Complete
Python 2
To use the above functions in Python 2, set the encoding to UTF-8 at the top of your script:
# -*- coding: utf-8 -*-
And replace the Python 3 string formatting in this line:
print(f'\r{prefix} |{bar}| {percent}% {suffix}', end = printEnd)
With Python 2 string formatting:
print('\r%s |%s| %s%% %s' % (prefix, bar, percent, suffix), end = printEnd)
Python: Progress bar in parse function?
You pretty much just need to break up your list comprehension. I'll use Enlighten here but you can accomplish the same thing with tqdm.
import enlighten
records: list = ...
manager = enlighten.get_manager()
pbar = manager.counter(total=len(records), desc='Parsing records', unit='records')
result = []
for item in records:
result.append(parse_record(item))
pbar.update()
df = pd.DataFrame(result)
If records
is a generator not an iterable, you'll need to wrap it with list()
or tuple()
first so you can get the length.
Can't get progress bar to work in python rich
The problem was that through the use of for i in track(range(1), description='Scraping'):
the bar would only go to 100% when the loop had finished. By changing the range()
value would make the code loop and would update the bar. To fix this issue I used another rich module called Progress
.
By importing Progress
and then modifying the code on the Rich Documentation I got:
from rich.progress import Progress
import time
with Progress() as progress:
task1 = progress.add_task("[red]Scraping", total=100)
while not progress.finished:
progress.update(task1, advance=0.5)
time.sleep(0.5)
Essentially:
- At
task1 = progress.add_task("[red]Scraping", total=100)
a bar is created with a maximum value of 100 - The code indented under
while not progress.finished:
will loop until the bar is at 100% - At
progress.update(task1, advance=0.5)
the bar's total will be increased by a value of 0.5.
Therefore, for my specific example, my end result code was:
theme = Theme({'success': 'bold green',
'error': 'bold red', 'enter': 'bold blue'})
console = Console(theme=(theme))
bartotal = 100
with Progress() as progress:
task1 = progress.add_task("[magenta bold]Scraping...", total=bartotal)
while not progress.finished:
console.print("\nDeclaring global variables", style='success')
global pfp
progress.update(task1, advance=4)
global target_id
progress.update(task1, advance=4)
console.print("\nSetting up Chrome driver", style='success')
chrome_options = Options()
progress.update(task1, advance=4)
chrome_options.add_argument("--headless")
progress.update(task1, advance=4)
driver = webdriver.Chrome(options=chrome_options)
progress.update(task1, advance=4)
console.print("\nCreating url for lookup.guru",
style='success')
begining_of_url = "https://lookup.guru/"
progress.update(task1, advance=4)
whole_url = begining_of_url + str(target_id)
progress.update(task1, advance=4)
driver.get(whole_url)
progress.update(task1, advance=4)
console.print(
"\nWaiting up to 10 seconds for lookup.guru to load", style='success')
wait = WebDriverWait(driver, 10)
progress.update(task1, advance=4)
wait.until(EC.visibility_of_element_located(
(By.XPATH, "//img")))
progress.update(task1, advance=4)
console.print("\nScraping images", style='success')
images = driver.find_elements_by_tag_name('img')
progress.update(task1, advance=4)
for image in images:
global pfp
pfp = (image.get_attribute('src'))
break
progress.update(task1, advance=4)
if pfp == "a":
console.print("User not found \n", style='error')
userInput()
progress.update(task1, advance=4)
console.print(
"\nDownloading image to current directory", style='success')
img_data = requests.get(pfp).content
progress.update(task1, advance=4)
with open('pfpimage.png', 'wb') as handler:
handler.write(img_data)
progress.update(task1, advance=4)
filePath = "pfpimage.png"
progress.update(task1, advance=4)
console.print("\nUploading to yandex.com", style='success')
searchUrl = 'https://yandex.com/images/search'
progress.update(task1, advance=4)
files = {'upfile': ('blob', open(
filePath, 'rb'), 'image/jpeg')}
progress.update(task1, advance=4)
params = {'rpt': 'imageview', 'format': 'json',
'request': '{"blocks":[{"block":"b-page_type_search-by-image__link"}]}'}
progress.update(task1, advance=4)
response = requests.post(searchUrl, params=params, files=files)
progress.update(task1, advance=4)
query_string = json.loads(response.content)[
'blocks'][0]['params']['url']
progress.update(task1, advance=4)
img_search_url = searchUrl + '?' + query_string
progress.update(task1, advance=4)
console.print("\nOpening lookup.guru", style='success')
webbrowser.open(whole_url)
progress.update(task1, advance=4)
console.print("\nOpening yandex images", style='success')
webbrowser.open(img_search_url)
progress.update(task1, advance=4)
console.print("\nDone!", style='success')
progress.update(task1, advance=4)
Progress bar with multiprocessing
First a few general comments concerning your code. In your main process you use a path to a file to open zip archive just to retrieve back the original file name. That really does not make too much sense. Then in count_files_7z
you iterate the return value from zf.namelist()
to build a list of the files within the archive when zf.namelist()
is already a list of those files. That does not make too much sense either. You also use the context manager function closing
to ensure that the archive is closed at the end of the block, but the with
block itself is a context manager that serves the same purpose.
I tried installing alive-progress and the progress bars were a mess. This is a task better suited to multithreading rather than multiprocessing. Actually, it is probably better suited to serial processing since doing concurrent I/O operations to your disk, unless it is a solid state drive, is probably going to hurt performance. You will gain performance if there is heavy CPU-intensive processing involved of the files you read. If that is the case, I have passed to each thread a multiprocessing pool to which you can execute a calls to apply
specifying functions in which you have placed CPU-intensive code. But the progress bars will should work better when done under multithreading rather than multiprocessing. Even then I could not get any sort of decent display with alive-progress, which admittedly I did not spend too much time on. So I have switched to using the more common tqdm module available from the PyPi repository.
Even with tqdm there is a problem in that when a progress bar reaches 100%, tqdm must be writing something (a newline?) that relocates the other progress bars. Therefore, what I have done is specified leave=False, which causes the bar to disappear when it reaches 100%. But at least you can see all the progress bars without distortion as they are progressing.
from multiprocessing.pool import Pool, ThreadPool
from threading import Lock
import tqdm
from zipfile import ZipFile
import os
import heapq
def get_filepaths(directory):
file_paths = [] # List which will store all of the full filepaths.
# Walk the tree.
for root, directories, files in os.walk(directory):
for filename in files:
# Join the two strings in order to form the full filepath.
filepath = os.path.join(root, filename)
file_paths.append(filepath) # Add it to the list.
return file_paths # Self-explanatory.
def get_free_position():
""" Return the minimum possible position """
with lock:
free_position = heapq.heappop(free_positions)
return free_position
def return_free_position(position):
with lock:
heapq.heappush(free_positions, position)
def run_performance(zip_file):
position = get_free_position()
with ZipFile(zip_file) as zf:
file_list = zf.namelist()
with tqdm.tqdm(total=len(file_list), position=position, leave=False) as bar:
for f in file_list:
with zf.open(f) as myfile:
... # do things with myfile (perhaps myfile.read())
# for CPU-intensive tasks: result = pool.apply(some_function, args=(arg1, arg2, ... argn))
import time
time.sleep(.005) # simulate doing something
bar.update()
return_free_position(position)
def generate_zip_files():
list_dir = ['path1', 'path2']
for folder in list_dir:
get_all_zips = get_filepaths(folder)
for zip_file in get_all_zips:
yield zip_file
# Required for Windows:
if __name__ == '__main__':
N_THREADS = 5
free_positions = list(range(N_THREADS)) # already a heap
lock = Lock()
pool = Pool()
thread_pool = ThreadPool(N_THREADS)
for result in thread_pool.imap_unordered(run_performance, generate_zip_files()):
pass
pool.close()
pool.join()
thread_pool.close()
thread_pool.join()
The code above uses a multiprocessing thread pool arbitrarily limited in size to 5 just as a demo. You can increase or decrease N_THREADS
to whatever value you want, but as I said, it may or may not help performance. If you want one thread per zip file then:
if __name__ == '__main__':
zip_files = list(generate_zip_files())
N_THREADS = len(zip_files)
free_positions = list(range(N_THREADS)) # already a heap
lock = Lock()
pool = Pool()
thread_pool = ThreadPool(N_THREADS)
for result in thread_pool.imap_unordered(run_performance, zip_files):
pass
pool.close()
pool.join()
thread_pool.close()
thread_pool.join()
Related Topics
Should I Put #! (Shebang) in Python Scripts, and What Form Should It Take
Why Does Integer Division Yield a Float Instead of Another Integer
Remove All Occurrences of a Value from a List
What Is a Mixin and Why Is It Useful
How to Create a Text Input Box With Pygame
"Pip Install Unroll": "Python Setup.Py Egg_Info" Failed With Error Code 1
How to Print a Single Backslash
What's the Difference Between Eval, Exec, and Compile
How to Capture Sigint in Python
Retrieve Links from Web Page Using Python and Beautifulsoup
How to Check If a String Represents an Int, Without Using Try/Except
How to Install Python Packages [Ssl: Tlsv1_Alert_Protocol_Version]
Set Value For Particular Cell in Pandas Dataframe Using Index
How to Use Pickle to Save a Dict (Or Any Other Python Object)
Append Existing Excel Sheet With New Dataframe Using Python Pandas