Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
On groupby
object, the agg
function can take a list to apply several aggregation methods at once. This should give you the result you need:
df[['col1', 'col2', 'col3', 'col4']].groupby(['col1', 'col2']).agg(['mean', 'count'])
Pandas, groupby and count
You seem to want to group by several columns at once:
df.groupby(['revenue','session','user_id'])['user_id'].count()
should give you what you want
Pandas create new column with count from groupby
That's not a new column, that's a new DataFrame:
In [11]: df.groupby(["item", "color"]).count()
Out[11]:
id
item color
car black 2
truck blue 1
red 2
To get the result you want is to use reset_index
:
In [12]: df.groupby(["item", "color"])["id"].count().reset_index(name="count")
Out[12]:
item color count
0 car black 2
1 truck blue 1
2 truck red 2
To get a "new column" you could use transform:
In [13]: df.groupby(["item", "color"])["id"].transform("count")
Out[13]:
0 2
1 2
2 2
3 1
4 2
dtype: int64
I recommend reading the split-apply-combine section of the docs.
Pandas groupby agg - how to get counts?
You can use strings instead of the functions, like so:
df = pd.DataFrame(
{"id": list("ccdef"), "pushid": list("aabbc"),
"sess_length": [10, 20, 30, 40, 50]}
)
df.groupby(["id", "pushid"]).agg({"sess_length": ["sum", "mean", "count"]})
Which outputs:
sess_length
sum mean count
id pushid
c a 30 15 2
d b 30 30 1
e b 40 40 1
f c 50 50 1
Python Pandas: GROUPBY AND COUNT OF VALUES OF DIFFERENT COLUMNS in minimal steps and in a very fast way
Easy solution
Let us use crosstabs
to calculate frequency tables then concat
the tables along columns axis:
s1 = pd.crosstab(df['CONTINENT'], df['AGE_GROUP'])
s2 = pd.crosstab(df['CONTINENT'], df['APPROVAL_STATUS'])
pd.concat([s1, s2, s2.sum(1).rename('USER_COUNT')], axis=1)
18-20 21-25 26-30 31-35 36-40 41-45 46-50 Above 50 NO YES not_confirmed USER_COUNT
CONTINENT
AMERICA 1 1 1 4 0 0 0 1 3 3 2 8
ASIA 0 0 7 0 3 0 3 0 2 8 3 13
EUROPE 1 1 0 1 1 4 0 1 6 1 2 9
Pandas Groupby: Count and mean combined
You can use groupby
with aggregate
:
df = df.groupby('source') \
.agg({'text':'size', 'sent':'mean'}) \
.rename(columns={'text':'count','sent':'mean_sent'}) \
.reset_index()
print (df)
source count mean_sent
0 bar 2 0.415
1 foo 3 -0.500
Pandas groupby two columns and count shared values in third
If I am understanding you correctly, I think you want to group by col3
instead of col2
:
df = pd.read_html('https://stackoverflow.com/q/69419264/14277722')[0]
df = df.groupby(['col1','col3'])['col2'].apply(list).reset_index()
df['count'] = df['col2'].apply(len)
You can then remove rows where col2
is a subset of another row with the following:
arr = pd.get_dummies(df['col2'].explode()).max(level=0).to_numpy()
subsets = np.matmul(arr, arr.T)
np.fill_diagonal(subsets, 0)
mask = ~np.equal(subsets, np.sum(arr, 1)).any(0)
df = df[mask]
col1 col3 col2 count
0 A 12 [ID1, ID2, ID4] 3
3 A 18 [ID3] 1
Need pandas groupby.count() or groupby.size.unstack() to output a dataframe I can use
Try:
x = df.pivot_table(
index=["Animal", "Year"], columns="Value", aggfunc="size", fill_value=0
).reset_index()
x.columns.name = None
print(x)
Prints:
Animal Year A B
0 1 2019 0 2
1 1 2020 2 0
2 2 2020 1 0
Pandas groupby count values in aggregate function
get_dummies
, groupby
and sum
Encode the columns OKEY
and COLOR
to convert the categorical values into indicator variables, then group the encoded frame by ID
and 1 minute Grouper
and sum
the values per group
pd.get_dummies(df.set_index(['ID', "Time"]))\
.groupby(['ID', pd.Grouper(freq='1min', level=1)]).sum()
OKEY_NOT_OK OKEY_OK COLOR_BLUE COLOR_RED COLOR_YELLOW
ID Time
0 2021-05-05 19:16:00 0 1 1 0 0
1 2021-05-05 19:16:00 1 0 1 0 0
2021-05-05 19:17:00 0 1 0 1 0
2 2021-05-05 19:17:00 1 0 0 0 1
Related Topics
How to Get Integer Values from a String in Python
Reducing Size of Pyinstaller Exe
Django Smtpauthenticationerror
How to Config Nltk Data Directory from Code
Multiprocessing.Dummy in Python Is Not Utilising 100% Cpu
Python Regex Instantly Replace Groups
How to Read File N Lines at a Time
Access Item in a List of Lists
Python - Windows Shutdown Events
Where to Put Freeze_Support() in a Python Script
Regex Error - Nothing to Repeat
Why Are Slice and Range Upper-Bound Exclusive
Display Special Characters When Using Print Statement
How to Use Selenium with Python
How to Get the Difference Between Two Dictionaries in Python