How to determine the length of lists in a pandas dataframe column
You can use the str
accessor for some list operations as well. In this example,
df['CreationDate'].str.len()
returns the length of each list. See the docs for str.len
.
df['Length'] = df['CreationDate'].str.len()
df
Out:
CreationDate Length
2013-12-22 15:25:02 [ubuntu, mac-osx, syslinux] 3
2009-12-14 14:29:32 [ubuntu, mod-rewrite, laconica, apache-2.2] 4
2013-12-22 15:42:00 [ubuntu, nat, squid, mikrotik] 4
For these operations, vanilla Python is generally faster. pandas handles NaNs though. Here are timings:
ser = pd.Series([random.sample(string.ascii_letters,
random.randint(1, 20)) for _ in range(10**6)])
%timeit ser.apply(lambda x: len(x))
1 loop, best of 3: 425 ms per loop
%timeit ser.str.len()
1 loop, best of 3: 248 ms per loop
%timeit [len(x) for x in ser]
10 loops, best of 3: 84 ms per loop
%timeit pd.Series([len(x) for x in ser], index=ser.index)
1 loop, best of 3: 236 ms per loop
How to determine the length of lists of list in a pandas dataframe column?
You can create 2 functions, one for each column and apply them with apply() method. See below:
def length_of_paths(l):
k=[len(i) for i in l]
minlen=min(k)
return [i-1 for i in k if i!=minlen]
def total_value(l):
return sum([1/i for i in l])
df['length of paths']=df['list of paths'].apply(lambda x: length_of_paths(x))
df['total value']=df['length of paths'].apply(lambda x: total_value(x))
Output:
>>> print(df)
list of paths length of paths total value
0 [[a, c, b], [a, c, d, b], [a, e, f, g, b]] [3, 4] 0.583333
1 [[g, z], [g, l, z]] [2] 0.500000
Python: Efficient way to get the length of lists for a Pandas Series
Use str.len()
only:
a.str.len()
And for columns of DataFrame
:
df['col'].str.len()
But if no NaN
s values apply(len)
working more efficient:
a.apply(len)
df['col'].apply(len)
List comprehension solutions:
pd.Series([len(x) for x in a], index=a.index)
pd.Series([len(x) for x in df['col']], index=df.index)
Count list length in a column of a DataFrame
See if this works:
df["InvoiceCount"] = df['InvoiceNo'].str.len()
Calculating Length of List of List in Pandas
I think maybe your column B is actually of type string, since it's giving you a length of the characters, so try first with ast.literal_eval
to change the column from string type to lists, and then use df['B'].apply(len)
or df['B'].str.len()
import ast
df['B']=df['B'].apply(ast.literal_eval)
df['C']=df['B'].apply(len)
Or if you only need the length without changing the type of column B, try with a single apply
df['C']=df['B'].apply(lambda x:ast.literal_eval(x)).str.len()
Output:
df
A B C
0 1 [["Thing_1"]] 1
1 2 [["Thing_1"], ["Thing_2"]] 2
2 3 [["Thing_1", "Thing_2"], ["Thing_2"]] 2
3 4 [["Thing_1"], ["Thing_1", "Thing_2"]] 2
4 5 [["Thing_1", "Thing_2"], ["Thing_1", "Thing_2"]] 2
Calculate Product of length of lists in dataframe and store in a new column
Try using DataFrame.applymap
and DataFrame.product
:
df['product of len(lists)'] = df[['a', 'b', 'c']].applymap(len).product(axis=1)
[out]
a b c product of len(lists)
0 [Protocol, SCADA, SHM System] [CM, Finances] [RBA, PBA] 12
1 [CM, Finances] [CM, Finances] [RBA, PBA] 8
2 [RBA, PBA] [CM, Finances] [RBA, PBA] 8
How to find the longest list in a pandas series?
Let's say you're having following dataframe :
values = [['a','a'], ['a','b','b','d','e'],
['a','b','b','a'], ['a','b','c','a'],
['a','b','b'],['a','b','b']
df = pd.DataFrame({'listoflists' :values })
For the longest list, you can try :
max(df.listoflists, key=len)
and for the top n list, you can try (n = 3 in this example) :
df['count'] = df.listoflists.map(len)
df.nlargest(3, ['count'])
Pandas column of lists: How to get the average, max length, and standard deviation of the list lengths of that column
Yes, df['lists'].str.len()
gives you lengths of lists in your series. To get the stats:
df['lists'].str.len().agg(['mean','max','std'])
Related Topics
How to Tell If Numpy Creates a View or a Copy
Windows- Pyinstaller Error "Failed to Execute Script " When App Clicked
Adding Meta-Information/Metadata to Pandas Dataframe
How to Display Tooltips in Tkinter
What Is the Maximum Float in Python
Why Doesn't Os.Path.Join() Work in This Case
How to Add a Qvideowidget in Qt Designer
How to Set Window Size in Selenium Chrome Python
Cannot Install Python 3.7 on Osx-Arm64
Find the End of the Month of a Pandas Dataframe Series
Number of Days Between 2 Dates, Excluding Weekends
Link Atlas/Mkl to an Installed Numpy