How to Get the Row Count of a Pandas Dataframe

How do I get the row count of a Pandas DataFrame?

For a dataframe df, one can use any of the following:

  • len(df.index)
  • df.shape[0]
  • df[df.columns[0]].count() (== number of non-NaN values in first column)

Performance plot


Code to reproduce the plot:

import numpy as np
import pandas as pd
import perfplot

perfplot.save(
"out.png",
setup=lambda n: pd.DataFrame(np.arange(n * 3).reshape(n, 3)),
n_range=[2**k for k in range(25)],
kernels=[
lambda df: len(df.index),
lambda df: df.shape[0],
lambda df: df[df.columns[0]].count(),
],
labels=["len(df.index)", "df.shape[0]", "df[df.columns[0]].count()"],
xlabel="Number of rows",
)

pandas python how to count the number of records or rows in a dataframe

Regards to your question... counting one Field? I decided to make it a question, but I hope it helps...

Say I have the following DataFrame

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.normal(0, 1, (5, 2)), columns=["A", "B"])

You could count a single column by

df.A.count()
#or
df['A'].count()

both evaluate to 5.

The cool thing (or one of many w.r.t. pandas) is that if you have NA values, count takes that into consideration.

So if I did

df['A'][1::2] = np.NAN
df.count()

The result would be

 A    3
B 5

How to get row number in dataframe in Pandas?

Note that a dataframe's index could be out of order, or not even numerical at all. If you don't want to use the current index and instead renumber the rows sequentially, then you can use df.reset_index() together with the suggestions below

To get all indices that matches 'Smith'

>>> df[df['LastName'] == 'Smith'].index
Int64Index([1], dtype='int64')

or as a numpy array

>>> df[df['LastName'] == 'Smith'].index.to_numpy()  # .values on older versions
array([1])

or if there is only one and you want the integer, you can subset

>>> df[df['LastName'] == 'Smith'].index[0]
1

You could use the same boolean expressions with .loc, but it is not needed unless you also want to select a certain column, which is redundant when you only want the row number/index.

get dataframe row count based on conditions

You are asking for the condition where all the conditions are true,
so len of the frame is the answer, unless I misunderstand what you are asking

In [17]: df = DataFrame(randn(20,4),columns=list('ABCD'))

In [18]: df[(df['A']>0) & (df['B']>0) & (df['C']>0)]
Out[18]:
A B C D
12 0.491683 0.137766 0.859753 -1.041487
13 0.376200 0.575667 1.534179 1.247358
14 0.428739 1.539973 1.057848 -1.254489

In [19]: df[(df['A']>0) & (df['B']>0) & (df['C']>0)].count()
Out[19]:
A 3
B 3
C 3
D 3
dtype: int64

In [20]: len(df[(df['A']>0) & (df['B']>0) & (df['C']>0)])
Out[20]: 3

how do I get the count of rows in a appended/merged DataFrame in pandas?

fnames is a list of pandas DataFrames, if you are trying to know how many of them there are, it is enough to call len(fnames)
On the other hand, if you want to know the lenght of each dataframe, you can get the list of their lenght by [len(df) for df in fnames].

To create a single dataframe you can use pd.concat, whcih concatenate pandas objects along a particular axis (default being index):

df = pd.concat(fnames)

At this point you will have that len(df) will coincide with sum([len(df) for df in fnames])

How to get count of items in rows as well as check one item is present or not, and finally keep the first row in python?

I think there is no powerful and simple solution for your question, but use the following code.

First, define a function count(x, a) which returns nan if x includes nan, the number of occurence of a in x, otherwise.
The function will be used for the apply function.

Then, use groupby and apply list function.

temp = df.copy().groupby('Column A')['Column B'].apply(list)

After that, temp becomes

Column A
12 [Apple, Apple, Orange]
13 [Apple]
15 [nan]
16 [Orange]
141 [Apple, nan, Apple, Apple]
Name: Column B, dtype: object

So, based on the temp, we can count the number of apples and oranges.

Since df has duplicates, I removed them and add new columns (Column C, D, and E).

df.drop_duplicates(subset = ['Column A'], keep = "first", inplace = True)
df['Column C'] = temp.apply(count, a = "Orange").values
df['Column D'] = temp.apply(count, a = "Apple").values
df['Column E'] = df['Column D'].apply(lambda x:1 if x>=1 else 0)

Edit

I am sorry. I missed the function count..

def count(x, a):
if type(x[0]) == float:
return np.nan
else:
return x.count(a)

Count conditions within each row in Pandas Dataframe

You can iterate over [1,2,3] and use eq on axis to identify the cells with a number and sum across columns to get the total:

for num in [1,2,3]:
df[f"_{num}"] = df[['Char1','Char2','Char3']].eq(num, axis=1).sum(axis=1)

Output:

   Char1  Char2  Char3  _1  _2  _3
0 2 2 3 0 2 1
1 2 3 3 0 1 2
2 2 3 3 0 1 2
3 2 2 2 0 3 0


Related Topics



Leave a reply



Submit