Use Tqdm Progress Bar With Pandas

Progress indicator during pandas operations

Due to popular demand, I've added pandas support in tqdm (pip install "tqdm>=4.9.0"). Unlike the other answers, this will not noticeably slow pandas down -- here's an example for DataFrameGroupBy.progress_apply:

import pandas as pd
import numpy as np
from tqdm import tqdm
# from tqdm.auto import tqdm # for notebooks

# Create new `pandas` methods which use `tqdm` progress
# (can use tqdm_gui, optional kwargs, etc.)
tqdm.pandas()

df = pd.DataFrame(np.random.randint(0, int(1e8), (10000, 1000)))
# Now you can use `progress_apply` instead of `apply`
df.groupby(0).progress_apply(lambda x: x**2)

In case you're interested in how this works (and how to modify it for your own callbacks), see the examples on GitHub, the full documentation on PyPI, or import the module and run help(tqdm). Other supported functions include map, applymap, aggregate, and transform.

EDIT


To directly answer the original question, replace:

df_users.groupby(['userID', 'requestDate']).apply(feature_rollup)

with:

from tqdm import tqdm
tqdm.pandas()
df_users.groupby(['userID', 'requestDate']).progress_apply(feature_rollup)

Note: tqdm <= v4.8:
For versions of tqdm below 4.8, instead of tqdm.pandas() you had to do:

from tqdm import tqdm, tqdm_pandas
tqdm_pandas(tqdm())

How do I use tqdm to show progress bars when using read_csv in a Jupiter notebook using jupyterlab

Yes. You could abuse any of the number of arguments that accept a callable and call it at each row:

from tqdm.auto import tqdm

with tqdm() as bar:
# do not skip any of the rows, but update the progress bar instead
pd.read_csv('data.csv', skiprows=lambda x: bar.update(1) and False)

If you use Linux, you can get the total number of lines to get a more meaningful progress bar:

from tqdm.auto import tqdm

lines_number = !cat 'data.csv' | wc -l

with tqdm(total=int(lines_number[0])) as bar:
pd.read_csv('data.csv', skiprows=lambda x: bar.update(1) and False)

But if you do not like for-loops, you may also dislike context managers. You could get away with:

def none_but_please_show_progress_bar(*args, **kwargs):
bar = tqdm(*args, **kwargs)

def checker(x):
bar.update(1)
return False

return checker

pd.read_csv('data.csv', skiprows=none_but_please_show_progress_bar())

But I find it less stable - I do recommend to use the context manager based approach.

How to use tqdm with pandas in a jupyter notebook?

You can use:

tqdm_notebook().pandas(*args, **kwargs)

This is because tqdm_notebook has a delayer adapter, so it's necessary to instanciate it before accessing its methods (including class methods).

In the future (>v5.1), you should be able to use a more uniform API:

tqdm_pandas(tqdm_notebook, *args, **kwargs)

Is it possible to use tqdm for pandas merge operation?

tqdm supports pandas and various operations within it. For merging two large dataframes and showing the progress, you could do it this way:

import pandas as pd
from tqdm import tqdm

df1 = pd.DataFrame({'lkey': 1000*['a', 'b', 'c', 'd'],'lvalue': np.random.randint(0,int(1e8),4000)})
df2 = pd.DataFrame({'rkey': 1000*['a', 'b', 'c', 'd'],'rvalue': np.random.randint(0, int(1e8),4000)})

#this is how you activate the pandas features in tqdm
tqdm.pandas()
#call the progress_apply feature with a dummy lambda
df1.merge(df2, left_on='lkey', right_on='rkey').progress_apply(lambda x: x)

More details are available on this thread:
Progress indicator during pandas operations (python)

how to use tqdm progress bar in dask_cudf and cudf

Until progress_apply is available, you would have to implement an equivalent yourself (e.g. using apply_chunks). Just a sketch of the code:

full_size = 100
t = tqdm(total=full_size)
def chunks_generator():
chunk_size = 5
for s in range(0,full_size,chunk_size):
yield s
t.update(s)

df.apply_chunks(..., chunks=chunks_generator())

TQDM on pandas df.describe()

You can use it like this:

tqdm.pandas(desc="my bar!")
df.progress_apply(lambda x: x.describe())

Although it doesn't seem to be useful.



Related Topics



Leave a reply



Submit