Add Leading Zeros to Strings in Pandas Dataframe

Add Leading Zeros to Strings in Pandas Dataframe

Try:

df['ID'] = df['ID'].apply(lambda x: '{0:0>15}'.format(x))

or even

df['ID'] = df['ID'].apply(lambda x: x.zfill(15))

Pandas check column and add leading zeros

You can use .str.replace:

df["user"] = df["user"].str.replace(
r"^\d{1,5}$", lambda g: "{:0>6}".format(g.group(0)), regex=True
)
print(df)

This will add leading zeros to cells that contains only 1 to 5 digits:

  @timestamp.    message.       name.    user
0 time. something. something. 123456
1 time. something something 001234
2 time. something. something. hello1

Python add a leading zero to column with str and int

You can use str.zfill:

#numeric as string
df = pd.DataFrame({'Section':['1', '2', '3', '4', 'SS', '15', 'S1', 'A1']})

df['Section'] = df['Section'].str.zfill(2)
print (df)
Section
0 01
1 02
2 03
3 04
4 SS
5 15
6 S1
7 A1

If mixed numeric with strings first cast to string:

df = pd.DataFrame({'Section':[1, 2, 3, 4, 'SS', 15, 'S1', 'A1']})

df['Section'] = df['Section'].astype(str).str.zfill(2)
print (df)
Section
0 01
1 02
2 03
3 04
4 SS
5 15
6 S1
7 A1

how to add leading zeros to a series of numbers in a dataframe, and then add a suffix?

You can create a string from your number by applying a lambda (or map see bigbounty's answer ) to calculate a formatted string column:

import pandas as pd

df = pd.DataFrame(({ "nums": range(100,201)}))

# format the string in one go
df["modded"] = df["nums"].apply(lambda x:f"{x:06n}xxx")
print(df)

Output:

     nums     modded
0 100 000100xxx
1 101 000101xxx
2 102 000102xxx
.. ... ...
98 198 000198xxx
99 199 000199xxx
100 200 000200xxx

How to keep leading zeroes from a panda column post operation?

The leading zeros are being dropped because of a misunderstanding about the use of slice notation in Python.

Try changing your code to:

df['period'] = df['Date'].str[:4] + df['Date'].str[5:7]

Note the change from [6:7] to [5:7].

Python Adding leading zero to Time field

Use Series built in Series.str.zfill method:

df.ColumnName.astype(str).str.zfill(4)

#0 100012
#1 225434
#2 0030
#3 0045
#4 0036
#5 80200
#Name: ColumnName, dtype: object

Time efficient way for add leading zeros in pandas series

s = pd.Series(map(lambda x: '%010d' %x, s))

where s is your series.

Add leading zeros based on condition in python

You need vectorize this; select the columns using a boolean index and use .str.zfill() on the resulting subsets:

# select the right rows to avoid wasting time operating on longer strings
shorter = df.Random.str.len() < 9
longer = ~shorter
df.Random[shorter] = df.Random[shorter].str.zfill(9)
df.Random[longer] = df.Random[longer].str.zfill(20)

Note: I did not use np.where() because we wouldn't want to double the work. A vectorized df.Random.str.zfill() is faster than looping over the rows, but doing it twice still takes more time than doing it just once for each set of rows.

Speed comparison on 1 million rows of strings with values of random lengths (from 5 characters all the way up to 30):

In [1]: import numpy as np, pandas as pd

In [2]: import platform; print(platform.python_version_tuple(), platform.platform(), pd.__version__, np.__version__, sep="\n")
('3', '7', '3')
Darwin-17.7.0-x86_64-i386-64bit
0.24.2
1.16.4

In [3]: !sysctl -n machdep.cpu.brand_string
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

In [4]: from random import choices, randrange

In [5]: def randvalue(chars="0123456789", _c=choices, _r=randrange):
...: return "".join(_c(chars, k=randrange(5, 30))).lstrip("0")
...:

In [6]: df = pd.DataFrame(data={"Random": [randvalue() for _ in range(10**6)]})

In [7]: %%timeit
...: target = df.copy()
...: shorter = target.Random.str.len() < 9
...: longer = ~shorter
...: target.Random[shorter] = target.Random[shorter].str.zfill(9)
...: target.Random[longer] = target.Random[longer].str.zfill(20)
...:
...:
825 ms ± 22.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [8]: %%timeit
...: target = df.copy()
...: target.Random = np.where(target.Random.str.len()<9,target.Random.str.zfill(9),target.Random.str.zfill(20))
...:
...:
929 ms ± 69.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

(The target = df.copy() line is needed to make sure that each repeated test run is isolated from the one before.)

Conclusion: on 1 million rows, using np.where() is about 10% slower.

However, using df.Row.apply(), as proposed by jackbicknell14, beats either method by a huge margin:

In [9]: def fill_zeros(x, _len=len, _zfill=str.zfill):
...: # len() and str.zfill() are cached as parameters for performance
...: return _zfill(x, 9 if _len(x) < 9 else 20)

In [10]: %%timeit
...: target = df.copy()
...: target.Random = target.Random.apply(fill_zeros)
...:
...:
299 ms ± 2.55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

That's about 3 times faster!

Format a pandas dataframe with floats and leading zeros

You can set the display float_format option as '{:09.2f}'.format:

pd.options.display.float_format = '{:09.2f}'.format
df
amount
row1 001000.50
row2 100000.78
row3 -90000.00
row4 -00900.40

But this will only change the current display. If you need to create a new column, you can use an f string:

df['newamount'] = df.amount.apply(lambda x: f'{x:09.2f}')
df
amount newamount
row1 1000.50 001000.50
row2 100000.78 100000.78
row3 -90000.00 -90000.00
row4 -900.40 -00900.40

Add leading zeroes only if it begin with digit in pandas dataframe

you can get what you want by:

df_merge['output'] = df_merge['input'].apply(lambda x : f'{x:0>3}' if x[0] in '0123456789' else x)

Verification

Assume the followin example to simulate the case you gave:

import pandas as pd
df_merge = pd.DataFrame({'input':['1','$500','333','2','(8','?8']})

print(df_merge['input'])

0       1
1 $500
2 333
3 2
4 (8
5 ?8
Name: input, dtype: object

now apply:

df_merge['output'] = df_merge['input'].apply(lambda x : f'{x:0>3}' if x[0] in '0123456789' else x)

print(df_merge['output'])

0     001
1 $500
2 333
3 002
4 (8
5 ?8
Name: output, dtype: object

Good Luck



Related Topics



Leave a reply



Submit