Find records with leading zero in Python Pandas
This should work:
df1[df1['Acct_no'].str[0] == '0']
Add leading zeros based on condition in python
You need vectorize this; select the columns using a boolean index and use .str.zfill()
on the resulting subsets:
# select the right rows to avoid wasting time operating on longer strings
shorter = df.Random.str.len() < 9
longer = ~shorter
df.Random[shorter] = df.Random[shorter].str.zfill(9)
df.Random[longer] = df.Random[longer].str.zfill(20)
Note: I did not use np.where()
because we wouldn't want to double the work. A vectorized df.Random.str.zfill()
is faster than looping over the rows, but doing it twice still takes more time than doing it just once for each set of rows.
Speed comparison on 1 million rows of strings with values of random lengths (from 5 characters all the way up to 30):
In [1]: import numpy as np, pandas as pd
In [2]: import platform; print(platform.python_version_tuple(), platform.platform(), pd.__version__, np.__version__, sep="\n")
('3', '7', '3')
Darwin-17.7.0-x86_64-i386-64bit
0.24.2
1.16.4
In [3]: !sysctl -n machdep.cpu.brand_string
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
In [4]: from random import choices, randrange
In [5]: def randvalue(chars="0123456789", _c=choices, _r=randrange):
...: return "".join(_c(chars, k=randrange(5, 30))).lstrip("0")
...:
In [6]: df = pd.DataFrame(data={"Random": [randvalue() for _ in range(10**6)]})
In [7]: %%timeit
...: target = df.copy()
...: shorter = target.Random.str.len() < 9
...: longer = ~shorter
...: target.Random[shorter] = target.Random[shorter].str.zfill(9)
...: target.Random[longer] = target.Random[longer].str.zfill(20)
...:
...:
825 ms ± 22.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [8]: %%timeit
...: target = df.copy()
...: target.Random = np.where(target.Random.str.len()<9,target.Random.str.zfill(9),target.Random.str.zfill(20))
...:
...:
929 ms ± 69.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
(The target = df.copy()
line is needed to make sure that each repeated test run is isolated from the one before.)
Conclusion: on 1 million rows, using np.where()
is about 10% slower.
However, using df.Row.apply()
, as proposed by jackbicknell14, beats either method by a huge margin:
In [9]: def fill_zeros(x, _len=len, _zfill=str.zfill):
...: # len() and str.zfill() are cached as parameters for performance
...: return _zfill(x, 9 if _len(x) < 9 else 20)
In [10]: %%timeit
...: target = df.copy()
...: target.Random = target.Random.apply(fill_zeros)
...:
...:
299 ms ± 2.55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
That's about 3 times faster!
Remove leading zeroes pandas
you can try str.replace
df['amount'].str.replace(r'^(0+)', '').fillna('0')
0 324
1 S123
2 10
3 0
4 30
5 SA40
6 SA24
Name: amount, dtype: object
Time efficient way for add leading zeros in pandas series
s = pd.Series(map(lambda x: '%010d' %x, s))
where s
is your series.
Why does pandas remove leading zero when writing to a csv?
Pandas doesn't strip padded zeros. You're liking seeing this when opening in Excel. Open the csv in a text editor like notepad++ and you'll see they're still zero padded.
How to save a CSV from dataframe, to keep zeros left in column with numbers?
Specify dtype as string while reading the csv file as below:
# if you are reading data with leading zeros
candidatos_2014 = pd.read_csv('candidatos_2014.csv', dtype ='str')
or convert data column into string
# if data is generated in python you can convert column into string first
candidatos_2014['cpf'] = candidatos_2014['cpf'].astype('str')
candidatos_2014.to_csv('candidatos_2014.csv')
How can I keep leading zeros in a column, when I export to CSV?
This is an excel problem as @EdChum suggested. You'll want to wrap your column in =""
with apply('="{}".format)
. This will tell excel to treat the entry as a formula that returns the text within quotes. That text will be your values with leading zeros.
Consider the following example.
df = pd.DataFrame(dict(A=['001', '002']))
df.A = df.A.apply('="{}"'.format)
df.to_excel('test_leading_zeros.xlsx')
Using multindex resample in pandas with zeros results in NaN
Don't resample
, but use the date in the groupby
:
df['datetime'] = pd.to_datetime(df['datetime'])
df.groupby(['name', df['datetime'].dt.date]).sum()
Or, using pandas.Grouper
for flexibility:
df.groupby(['name', pd.Grouper(key='datetime', freq='D')]).sum()
Output:
value
name datetime
Excalibur1 2013-12-25 3
2014-12-25 914
Janus 2014-01-11 8129
Michael 2012-01-11 3999
rectangular shape and missing dates:
For a rectangular shape use:
df2 = df.groupby(['name', pd.Grouper(key='datetime', freq='D')])['value'].sum().unstack(level='name', fill_value=0)
Output:
name Excalibur1 Janus Michael
datetime
2013-12-25 3 0 0
2014-12-25 914 0 0
2014-01-11 0 8129 0
2012-01-11 0 0 3999
And to add missing dates, reindex
:
df2 = df.groupby(['name', pd.Grouper(key='datetime', freq='D')])['value'].sum().unstack(level='name', fill_value=0)
df2 = df2.reindex(pd.date_range(df['datetime'].dt.date.min(), df['datetime'].max()), fill_value=0)
Output:
name Excalibur1 Janus Michael
2012-01-11 0 0 3999
2012-01-12 0 0 0
2012-01-13 0 0 0
2012-01-14 0 0 0
2012-01-15 0 0 0
...
Related Topics
Replacing Pandas or Numpy Nan With a None to Use With Mysqldb
How to Write a Lambda Function That Is Conditional on Two Variables (Columns) in Python
How to Save Xlsm File With Macro, Using Openpyxl
How to Split Folder of Images into Test/Training/Validation Sets With Stratified Sampling
Django Login - Missing 1 Required Positional Argument
How to Convert .Dat to .Csv Using Python
Replace Empty Strings With None/Null Values in Dataframe
Python - How to Sort a List of Alpha and Numeric Values
Json.Loads() Decodes Only With Raw String Literal
Reading Contents of a Gzip File from a Aws S3 in Python
How to Click on an Element from the Dropdown Menu Through Python and Selenium
Python List - Only Keep Only-Positive or Only-Negative Values
How to Change a Two Dimensional Array to One Dimensional
How to Make a Discord Bot Leave a Server from a Command in Another Server
How to Assign Values to a Numpy Array as a Function of Index
How to Append Data Using Openpyxl Python to Excel File from a Specified Row