How do I get the row count of a Pandas DataFrame?
For a dataframe df
, one can use any of the following:
len(df.index)
df.shape[0]
df[df.columns[0]].count()
(== number of non-NaN values in first column)
Code to reproduce the plot:
import numpy as np
import pandas as pd
import perfplot
perfplot.save(
"out.png",
setup=lambda n: pd.DataFrame(np.arange(n * 3).reshape(n, 3)),
n_range=[2**k for k in range(25)],
kernels=[
lambda df: len(df.index),
lambda df: df.shape[0],
lambda df: df[df.columns[0]].count(),
],
labels=["len(df.index)", "df.shape[0]", "df[df.columns[0]].count()"],
xlabel="Number of rows",
)
pandas python how to count the number of records or rows in a dataframe
Regards to your question... counting one Field? I decided to make it a question, but I hope it helps...
Say I have the following DataFrame
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.normal(0, 1, (5, 2)), columns=["A", "B"])
You could count a single column by
df.A.count()
#or
df['A'].count()
both evaluate to 5.
The cool thing (or one of many w.r.t. pandas
) is that if you have NA
values, count takes that into consideration.
So if I did
df['A'][1::2] = np.NAN
df.count()
The result would be
A 3
B 5
How to get row number in dataframe in Pandas?
Note that a dataframe's index could be out of order, or not even numerical at all. If you don't want to use the current index and instead renumber the rows sequentially, then you can use df.reset_index()
together with the suggestions below
To get all indices that matches 'Smith'
>>> df[df['LastName'] == 'Smith'].index
Int64Index([1], dtype='int64')
or as a numpy array
>>> df[df['LastName'] == 'Smith'].index.to_numpy() # .values on older versions
array([1])
or if there is only one and you want the integer, you can subset
>>> df[df['LastName'] == 'Smith'].index[0]
1
You could use the same boolean expressions with .loc
, but it is not needed unless you also want to select a certain column, which is redundant when you only want the row number/index.
get dataframe row count based on conditions
You are asking for the condition where all the conditions are true,
so len of the frame is the answer, unless I misunderstand what you are asking
In [17]: df = DataFrame(randn(20,4),columns=list('ABCD'))
In [18]: df[(df['A']>0) & (df['B']>0) & (df['C']>0)]
Out[18]:
A B C D
12 0.491683 0.137766 0.859753 -1.041487
13 0.376200 0.575667 1.534179 1.247358
14 0.428739 1.539973 1.057848 -1.254489
In [19]: df[(df['A']>0) & (df['B']>0) & (df['C']>0)].count()
Out[19]:
A 3
B 3
C 3
D 3
dtype: int64
In [20]: len(df[(df['A']>0) & (df['B']>0) & (df['C']>0)])
Out[20]: 3
how do I get the count of rows in a appended/merged DataFrame in pandas?
fnames
is a list of pandas DataFrames, if you are trying to know how many of them there are, it is enough to call len(fnames)
On the other hand, if you want to know the lenght of each dataframe, you can get the list of their lenght by [len(df) for df in fnames]
.
To create a single dataframe you can use pd.concat
, whcih concatenate pandas objects along a particular axis (default being index):
df = pd.concat(fnames)
At this point you will have that len(df)
will coincide with sum([len(df) for df in fnames])
How to get count of items in rows as well as check one item is present or not, and finally keep the first row in python?
I think there is no powerful and simple solution for your question, but use the following code.
First, define a function count(x, a)
which returns nan
if x includes nan
, the number of occurence of a in x, otherwise.
The function will be used for the apply function.
Then, use groupby and apply list function.
temp = df.copy().groupby('Column A')['Column B'].apply(list)
After that, temp becomes
Column A
12 [Apple, Apple, Orange]
13 [Apple]
15 [nan]
16 [Orange]
141 [Apple, nan, Apple, Apple]
Name: Column B, dtype: object
So, based on the temp, we can count the number of apples and oranges.
Since df has duplicates, I removed them and add new columns (Column C, D, and E).
df.drop_duplicates(subset = ['Column A'], keep = "first", inplace = True)
df['Column C'] = temp.apply(count, a = "Orange").values
df['Column D'] = temp.apply(count, a = "Apple").values
df['Column E'] = df['Column D'].apply(lambda x:1 if x>=1 else 0)
Edit
I am sorry. I missed the function count..
def count(x, a):
if type(x[0]) == float:
return np.nan
else:
return x.count(a)
Count conditions within each row in Pandas Dataframe
You can iterate over [1,2,3]
and use eq
on axis to identify the cells with a number and sum
across columns to get the total:
for num in [1,2,3]:
df[f"_{num}"] = df[['Char1','Char2','Char3']].eq(num, axis=1).sum(axis=1)
Output:
Char1 Char2 Char3 _1 _2 _3
0 2 2 3 0 2 1
1 2 3 3 0 1 2
2 2 3 3 0 1 2
3 2 2 2 0 3 0
Related Topics
Cannot Pass an Argument to Python with "#!/Usr/Bin/Env Python"
How to Check If a Value Is in the List in Selection from Pandas Data Frame
Python Lookup Hostname from Ip with 1 Second Timeout
How to Run Python Script Without Typing 'Python ...'
Record Speakers Output with Pyaudio
How to Modify Variable in Python That Is in Outer, But Not Global, Scope
Pandas Out of Bounds Nanosecond Timestamp After Offset Rollforward Plus Adding a Month Offset
Numpy: Formal Definition of "Array_Like" Objects
How to Suppress the Newline After a Print Statement
How to Run an Ipython Magic from a Script (Or Timing a Python Script)
Plotting Grouped Data in Same Plot Using Pandas
Function with Varying Number of for Loops (Python)
Update Tkinter Label from Variable
How Can Strings Be Concatenated
Typeerror: Worker() Takes 0 Positional Arguments But 1 Was Given
Use Python's String.Replace VS Re.Sub