Pandas, Groupby and Count

Get statistics for each group (such as count, mean, etc) using pandas GroupBy?

On groupby object, the agg function can take a list to apply several aggregation methods at once. This should give you the result you need:

df[['col1', 'col2', 'col3', 'col4']].groupby(['col1', 'col2']).agg(['mean', 'count'])

Pandas, groupby and count

You seem to want to group by several columns at once:

df.groupby(['revenue','session','user_id'])['user_id'].count()

should give you what you want

Pandas create new column with count from groupby

That's not a new column, that's a new DataFrame:

In [11]: df.groupby(["item", "color"]).count()
Out[11]:
id
item color
car black 2
truck blue 1
red 2

To get the result you want is to use reset_index:

In [12]: df.groupby(["item", "color"])["id"].count().reset_index(name="count")
Out[12]:
item color count
0 car black 2
1 truck blue 1
2 truck red 2

To get a "new column" you could use transform:

In [13]: df.groupby(["item", "color"])["id"].transform("count")
Out[13]:
0 2
1 2
2 2
3 1
4 2
dtype: int64

I recommend reading the split-apply-combine section of the docs.

Pandas groupby agg - how to get counts?

You can use strings instead of the functions, like so:

df = pd.DataFrame(
{"id": list("ccdef"), "pushid": list("aabbc"),
"sess_length": [10, 20, 30, 40, 50]}
)

df.groupby(["id", "pushid"]).agg({"sess_length": ["sum", "mean", "count"]})

Which outputs:

           sess_length
sum mean count
id pushid
c a 30 15 2
d b 30 30 1
e b 40 40 1
f c 50 50 1

Python Pandas: GROUPBY AND COUNT OF VALUES OF DIFFERENT COLUMNS in minimal steps and in a very fast way

Easy solution

Let us use crosstabs to calculate frequency tables then concat the tables along columns axis:

s1 = pd.crosstab(df['CONTINENT'], df['AGE_GROUP'])
s2 = pd.crosstab(df['CONTINENT'], df['APPROVAL_STATUS'])

pd.concat([s1, s2, s2.sum(1).rename('USER_COUNT')], axis=1)


           18-20  21-25  26-30  31-35  36-40  41-45  46-50  Above 50  NO  YES  not_confirmed  USER_COUNT
CONTINENT
AMERICA 1 1 1 4 0 0 0 1 3 3 2 8
ASIA 0 0 7 0 3 0 3 0 2 8 3 13
EUROPE 1 1 0 1 1 4 0 1 6 1 2 9

Pandas Groupby: Count and mean combined

You can use groupby with aggregate:

df = df.groupby('source') \
.agg({'text':'size', 'sent':'mean'}) \
.rename(columns={'text':'count','sent':'mean_sent'}) \
.reset_index()
print (df)
source count mean_sent
0 bar 2 0.415
1 foo 3 -0.500

Pandas groupby two columns and count shared values in third

If I am understanding you correctly, I think you want to group by col3 instead of col2:

df = pd.read_html('https://stackoverflow.com/q/69419264/14277722')[0]

df = df.groupby(['col1','col3'])['col2'].apply(list).reset_index()
df['count'] = df['col2'].apply(len)

You can then remove rows where col2 is a subset of another row with the following:

arr = pd.get_dummies(df['col2'].explode()).max(level=0).to_numpy()
subsets = np.matmul(arr, arr.T)
np.fill_diagonal(subsets, 0)
mask = ~np.equal(subsets, np.sum(arr, 1)).any(0)

df = df[mask]
   col1 col3             col2  count
0 A 12 [ID1, ID2, ID4] 3
3 A 18 [ID3] 1

Need pandas groupby.count() or groupby.size.unstack() to output a dataframe I can use

Try:

x = df.pivot_table(
index=["Animal", "Year"], columns="Value", aggfunc="size", fill_value=0
).reset_index()
x.columns.name = None
print(x)

Prints:

   Animal  Year  A  B
0 1 2019 0 2
1 1 2020 2 0
2 2 2020 1 0

Pandas groupby count values in aggregate function

get_dummies, groupby and sum

Encode the columns OKEY and COLOR to convert the categorical values into indicator variables, then group the encoded frame by ID and 1 minute Grouper and sum the values per group

pd.get_dummies(df.set_index(['ID', "Time"]))\
.groupby(['ID', pd.Grouper(freq='1min', level=1)]).sum()


                        OKEY_NOT_OK  OKEY_OK  COLOR_BLUE  COLOR_RED  COLOR_YELLOW
ID Time
0 2021-05-05 19:16:00 0 1 1 0 0
1 2021-05-05 19:16:00 1 0 1 0 0
2021-05-05 19:17:00 0 1 0 1 0
2 2021-05-05 19:17:00 1 0 0 0 1


Related Topics



Leave a reply



Submit