Add Leading Zeros to Strings in Pandas Dataframe
Try:
df['ID'] = df['ID'].apply(lambda x: '{0:0>15}'.format(x))
or even
df['ID'] = df['ID'].apply(lambda x: x.zfill(15))
Pandas check column and add leading zeros
You can use .str.replace
:
df["user"] = df["user"].str.replace(
r"^\d{1,5}$", lambda g: "{:0>6}".format(g.group(0)), regex=True
)
print(df)
This will add leading zeros to cells that contains only 1 to 5 digits:
@timestamp. message. name. user
0 time. something. something. 123456
1 time. something something 001234
2 time. something. something. hello1
Python add a leading zero to column with str and int
You can use str.zfill
:
#numeric as string
df = pd.DataFrame({'Section':['1', '2', '3', '4', 'SS', '15', 'S1', 'A1']})
df['Section'] = df['Section'].str.zfill(2)
print (df)
Section
0 01
1 02
2 03
3 04
4 SS
5 15
6 S1
7 A1
If mixed numeric
with strings
first cast to string
:
df = pd.DataFrame({'Section':[1, 2, 3, 4, 'SS', 15, 'S1', 'A1']})
df['Section'] = df['Section'].astype(str).str.zfill(2)
print (df)
Section
0 01
1 02
2 03
3 04
4 SS
5 15
6 S1
7 A1
how to add leading zeros to a series of numbers in a dataframe, and then add a suffix?
You can create a string from your number by applying a lambda (or map see bigbounty's answer ) to calculate a formatted string column:
import pandas as pd
df = pd.DataFrame(({ "nums": range(100,201)}))
# format the string in one go
df["modded"] = df["nums"].apply(lambda x:f"{x:06n}xxx")
print(df)
Output:
nums modded
0 100 000100xxx
1 101 000101xxx
2 102 000102xxx
.. ... ...
98 198 000198xxx
99 199 000199xxx
100 200 000200xxx
How to keep leading zeroes from a panda column post operation?
The leading zeros are being dropped because of a misunderstanding about the use of slice notation in Python.
Try changing your code to:
df['period'] = df['Date'].str[:4] + df['Date'].str[5:7]
Note the change from [6:7] to [5:7].
Python Adding leading zero to Time field
Use Series built in Series.str.zfill
method:
df.ColumnName.astype(str).str.zfill(4)
#0 100012
#1 225434
#2 0030
#3 0045
#4 0036
#5 80200
#Name: ColumnName, dtype: object
Time efficient way for add leading zeros in pandas series
s = pd.Series(map(lambda x: '%010d' %x, s))
where s
is your series.
Add leading zeros based on condition in python
You need vectorize this; select the columns using a boolean index and use .str.zfill()
on the resulting subsets:
# select the right rows to avoid wasting time operating on longer strings
shorter = df.Random.str.len() < 9
longer = ~shorter
df.Random[shorter] = df.Random[shorter].str.zfill(9)
df.Random[longer] = df.Random[longer].str.zfill(20)
Note: I did not use np.where()
because we wouldn't want to double the work. A vectorized df.Random.str.zfill()
is faster than looping over the rows, but doing it twice still takes more time than doing it just once for each set of rows.
Speed comparison on 1 million rows of strings with values of random lengths (from 5 characters all the way up to 30):
In [1]: import numpy as np, pandas as pd
In [2]: import platform; print(platform.python_version_tuple(), platform.platform(), pd.__version__, np.__version__, sep="\n")
('3', '7', '3')
Darwin-17.7.0-x86_64-i386-64bit
0.24.2
1.16.4
In [3]: !sysctl -n machdep.cpu.brand_string
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
In [4]: from random import choices, randrange
In [5]: def randvalue(chars="0123456789", _c=choices, _r=randrange):
...: return "".join(_c(chars, k=randrange(5, 30))).lstrip("0")
...:
In [6]: df = pd.DataFrame(data={"Random": [randvalue() for _ in range(10**6)]})
In [7]: %%timeit
...: target = df.copy()
...: shorter = target.Random.str.len() < 9
...: longer = ~shorter
...: target.Random[shorter] = target.Random[shorter].str.zfill(9)
...: target.Random[longer] = target.Random[longer].str.zfill(20)
...:
...:
825 ms ± 22.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [8]: %%timeit
...: target = df.copy()
...: target.Random = np.where(target.Random.str.len()<9,target.Random.str.zfill(9),target.Random.str.zfill(20))
...:
...:
929 ms ± 69.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
(The target = df.copy()
line is needed to make sure that each repeated test run is isolated from the one before.)
Conclusion: on 1 million rows, using np.where()
is about 10% slower.
However, using df.Row.apply()
, as proposed by jackbicknell14, beats either method by a huge margin:
In [9]: def fill_zeros(x, _len=len, _zfill=str.zfill):
...: # len() and str.zfill() are cached as parameters for performance
...: return _zfill(x, 9 if _len(x) < 9 else 20)
In [10]: %%timeit
...: target = df.copy()
...: target.Random = target.Random.apply(fill_zeros)
...:
...:
299 ms ± 2.55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
That's about 3 times faster!
Format a pandas dataframe with floats and leading zeros
You can set the display float_format
option as '{:09.2f}'.format
:
pd.options.display.float_format = '{:09.2f}'.format
df
amount
row1 001000.50
row2 100000.78
row3 -90000.00
row4 -00900.40
But this will only change the current display. If you need to create a new column, you can use an f
string:
df['newamount'] = df.amount.apply(lambda x: f'{x:09.2f}')
df
amount newamount
row1 1000.50 001000.50
row2 100000.78 100000.78
row3 -90000.00 -90000.00
row4 -900.40 -00900.40
Add leading zeroes only if it begin with digit in pandas dataframe
you can get what you want by:
df_merge['output'] = df_merge['input'].apply(lambda x : f'{x:0>3}' if x[0] in '0123456789' else x)
Verification
Assume the followin example to simulate the case you gave:
import pandas as pd
df_merge = pd.DataFrame({'input':['1','$500','333','2','(8','?8']})
print(df_merge['input'])
0 1
1 $500
2 333
3 2
4 (8
5 ?8
Name: input, dtype: object
now apply:
df_merge['output'] = df_merge['input'].apply(lambda x : f'{x:0>3}' if x[0] in '0123456789' else x)
print(df_merge['output'])
0 001
1 $500
2 333
3 002
4 (8
5 ?8
Name: output, dtype: object
Good Luck
Related Topics
How to Log While Using Multiprocessing in Python
Keyboard Interrupts with Python's Multiprocessing Pool
How to Execute Python File in Linux
Extract Text from Xml Documents in Python
In Python, How to Convert a 'Datetime' Object to Seconds
How to Make a Call to an Executable from Python Script
The Correct Cmakelists.Txt File to Call a Maxon Libarary in a Python Script Using Pybind11
How to Take a Screenshot/Image of a Website Using Python
Running a Linux Command from Python
How to Open a File for Both Reading and Writing
What Does a Python Process Return Code -9 Mean
Run a Linux System Command as a Superuser, Using a Python Script