How to Determine the Length of Lists in a Pandas Dataframe Column

How to determine the length of lists in a pandas dataframe column

You can use the str accessor for some list operations as well. In this example,

df['CreationDate'].str.len()

returns the length of each list. See the docs for str.len.

df['Length'] = df['CreationDate'].str.len()
df
Out:
CreationDate Length
2013-12-22 15:25:02 [ubuntu, mac-osx, syslinux] 3
2009-12-14 14:29:32 [ubuntu, mod-rewrite, laconica, apache-2.2] 4
2013-12-22 15:42:00 [ubuntu, nat, squid, mikrotik] 4

For these operations, vanilla Python is generally faster. pandas handles NaNs though. Here are timings:

ser = pd.Series([random.sample(string.ascii_letters, 
random.randint(1, 20)) for _ in range(10**6)])

%timeit ser.apply(lambda x: len(x))
1 loop, best of 3: 425 ms per loop

%timeit ser.str.len()
1 loop, best of 3: 248 ms per loop

%timeit [len(x) for x in ser]
10 loops, best of 3: 84 ms per loop

%timeit pd.Series([len(x) for x in ser], index=ser.index)
1 loop, best of 3: 236 ms per loop

How to determine the length of lists of list in a pandas dataframe column?

You can create 2 functions, one for each column and apply them with apply() method. See below:

def length_of_paths(l):
k=[len(i) for i in l]
minlen=min(k)
return [i-1 for i in k if i!=minlen]

def total_value(l):
return sum([1/i for i in l])

df['length of paths']=df['list of paths'].apply(lambda x: length_of_paths(x))

df['total value']=df['length of paths'].apply(lambda x: total_value(x))

Output:

>>> print(df)

list of paths length of paths total value
0 [[a, c, b], [a, c, d, b], [a, e, f, g, b]] [3, 4] 0.583333
1 [[g, z], [g, l, z]] [2] 0.500000

Python: Efficient way to get the length of lists for a Pandas Series

Use str.len() only:

a.str.len()

And for columns of DataFrame:

df['col'].str.len()

But if no NaNs values apply(len) working more efficient:

a.apply(len)

df['col'].apply(len)

List comprehension solutions:

pd.Series([len(x) for x in a], index=a.index)
pd.Series([len(x) for x in df['col']], index=df.index)

Count list length in a column of a DataFrame

See if this works:

df["InvoiceCount"] = df['InvoiceNo'].str.len()

Calculating Length of List of List in Pandas

I think maybe your column B is actually of type string, since it's giving you a length of the characters, so try first with ast.literal_eval to change the column from string type to lists, and then use df['B'].apply(len) or df['B'].str.len()

import ast
df['B']=df['B'].apply(ast.literal_eval)
df['C']=df['B'].apply(len)

Or if you only need the length without changing the type of column B, try with a single apply

df['C']=df['B'].apply(lambda x:ast.literal_eval(x)).str.len()

Output:

df
A B C
0 1 [["Thing_1"]] 1
1 2 [["Thing_1"], ["Thing_2"]] 2
2 3 [["Thing_1", "Thing_2"], ["Thing_2"]] 2
3 4 [["Thing_1"], ["Thing_1", "Thing_2"]] 2
4 5 [["Thing_1", "Thing_2"], ["Thing_1", "Thing_2"]] 2

Calculate Product of length of lists in dataframe and store in a new column

Try using DataFrame.applymap and DataFrame.product:

df['product of len(lists)'] = df[['a', 'b', 'c']].applymap(len).product(axis=1)

[out]

                              a               b           c  product of len(lists)
0 [Protocol, SCADA, SHM System] [CM, Finances] [RBA, PBA] 12
1 [CM, Finances] [CM, Finances] [RBA, PBA] 8
2 [RBA, PBA] [CM, Finances] [RBA, PBA] 8

How to find the longest list in a pandas series?

Let's say you're having following dataframe :

values = [['a','a'], ['a','b','b','d','e'],
['a','b','b','a'], ['a','b','c','a'],
['a','b','b'],['a','b','b']

df = pd.DataFrame({'listoflists' :values })

For the longest list, you can try :

max(df.listoflists, key=len)

and for the top n list, you can try (n = 3 in this example) :

df['count'] = df.listoflists.map(len)
df.nlargest(3, ['count'])

Pandas column of lists: How to get the average, max length, and standard deviation of the list lengths of that column

Yes, df['lists'].str.len() gives you lengths of lists in your series. To get the stats:

df['lists'].str.len().agg(['mean','max','std'])


Related Topics



Leave a reply



Submit