Python pandas: Add a column to my dataframe that counts a variable
Call transform
this will return a Series aligned with the original df:
In [223]:
df['count'] = df.groupby('group')['group'].transform('count')
df
Out[223]:
org group count
0 org1 1 2
1 org2 1 2
2 org3 2 1
3 org4 3 3
4 org5 3 3
5 org6 3 3
adding a column to df that counts occurrence of a value in another column
You can use the following solution:
library(dplyr)
df %>%
group_by(id) %>%
add_count(name = "id_occurrence")
# A tibble: 10 x 3
# Groups: id [5]
id places id_occurrence
<dbl> <chr> <int>
1 204850 kitchen 3
2 204850 kitchen 3
3 204850 garden 3
4 312512 salon 2
5 312512 salon 2
6 452452 salon 1
7 285421 bathroom 1
8 758412 garden 3
9 758412 bathroom 3
10 758412 garden 3
Add column with counts of another
You may try ave
:
# first, convert 'gender' to class character
df$gender <- as.character(df$gender)
df$count <- as.numeric(ave(df$gender, df$gender, FUN = length))
df
# gender age count
# 1 m 18 4
# 2 f 14 2
# 3 m 18 4
# 4 m 18 4
# 5 m 15 4
# 6 f 15 2
Update following @flodel's comment - thanks!
df <- transform(df, count = ave(age, gender, FUN = length))
How to add columns to my data frame, including the counts of another columns , for two different column, in R?
Here is a tidyverse
approach. You can group_by
your ID
column, and count rows that is not NA
.
library(tidyverse)
df %>%
group_by(ID, l) %>%
summarize(n.x = sum(!is.na(x)), n.y = sum(!is.na(y)), .groups = "drop")
# A tibble: 4 x 4
ID l n.x n.y
<int> <chr> <int> <int>
1 1 s 5 4
2 2 ss 3 2
3 3 m 7 3
4 4 mm 2 2
Adding a column to df that contains count of a value of a different column in the df?
Use transform()
if you want to map back:
df['New_Col'] = df.groupby('Day')['Day'].transform('count')
Or you can use map
, and also, value_counts()
:
df['New_Col'] = df['Day'].map(df['Day'].value_counts())
Output:
Day New_Col
0 Morning 2
1 Day 4
2 Night 2
3 Night 2
4 Day 4
5 Morning 2
6 Day 4
7 Day 4
Add column with numbers based on count of value in other column in Pandas
Use groupby_cumcount
:
df['colB'] = df.groupby('colA').cumcount().add(1)
print(df)
# Output
colA colB
0 BJ02 1
1 BJ02 2
2 CJ02 1
3 CJ03 1
4 CJ02 2
5 DJ01 1
6 DJ02 1
7 DJ07 1
8 DJ07 2
9 DJ07 3
Suggested by @HenryEcker, use zfill
:
df['colB'] = df.groupby('colA').cumcount().add(1).astype(str).str.zfill(3)
print(df)
# Output:
colA colB
0 BJ02 001
1 BJ02 002
2 CJ02 001
3 CJ03 001
4 CJ02 002
5 DJ01 001
6 DJ02 001
7 DJ07 001
8 DJ07 002
9 DJ07 003
SQL : create column that count occurences of an other column values
You want count(*)
as a window function:
select t.*, count(*) over (partition by name) as name_count
from t;
Add column with counts of another, depending on another column
Using data.table
you could do something like the following:
library(data.table)
setDT(df)
merge(df, df[, WeeklyAT := .N, by = .(Contact.ID, Week)])
Contact.ID Date Time Week Attendance X.WeeklyAT WeeklyAT
1: A 2012-10-06 18:54:48 44 30 *2 2
2: A 2012-10-08 20:50:18 44 30 *2 2
3: A 2013-05-24 20:18:44 21 30 *1 1
4: B 2012-11-15 16:58:15 46 40 *1 1
5: B 2013-01-09 10:57:02 2 40 *3 3
6: B 2013-01-11 17:31:22 2 40 *3 3
7: B 2013-01-14 18:37:00 2 40 *3 3
8: C 2013-02-22 17:46:07 8 5 *1 1
9: C 2013-02-27 11:21:00 9 5 *1 1
10: D 2012-10-28 14:48:33 43 12 *1 1
EDIT:
Apparently dplyr
can do something very similar:
library(dplyr)
merge(df,
df %>% group_by(Contact.ID, Week) %>% summarise(WeeklyAT = n()))
How to create new column that counts and reset based on a string value in another column
You can shift()
the Trend
column to get trending indexes and then cumsum()
within the trending groups:
trending = df.Trend.eq(df.Trend.shift())
df['Counter'] = trending.groupby(trending).cumsum().add(1)
Output:
A B C B_shifted C_shifted Trend Counter
0 553.666667 533.50 574.00 NaN NaN Flat 1
1 590.818182 575.50 595.50 533.50 574.00 Up 1
2 531.333333 527.50 536.50 575.50 595.50 Down 1
3 562.000000 562.00 562.00 527.50 536.50 Up 1
4 551.857143 538.50 557.50 562.00 562.00 Down 1
5 592.000000 585.50 598.50 538.50 557.50 Up 1
6 511.000000 511.00 511.00 585.50 598.50 Down 1
7 564.333333 548.00 590.50 511.00 511.00 Up 1
8 574.333333 552.00 580.00 548.00 590.50 Flat 1
9 537.500000 513.25 574.50 552.00 580.00 Down 1
10 609.500000 582.25 636.75 513.25 574.50 Up 1
11 535.000000 531.00 565.00 582.25 636.75 Down 1
12 567.142857 539.50 588.50 531.00 565.00 Up 1
13 566.625000 546.25 594.25 539.50 588.50 Up 2
14 576.631579 556.00 598.00 546.25 594.25 Up 3
15 558.333333 538.00 584.00 556.00 598.00 Down 1
Related Topics
Compute the Minimum of a Pair of Vectors
Dplyr Replacing Na Values in a Column Based on Multiple Conditions
How to Perform Pairwise Operation Like '%In%' and Set Operations for a List of Vectors
Save a Ggplot2 Time Series Plot Grob Generated by Ggplotgrob
R - How to Make Barplot Plot Zeros for Missing Values Over the Data Range
Paste All Combinations of a Vector in R
R:Pass Argument to Glm Inside an R Function
How to Subset a Matrix with Different Column Positions for Each Row
Calculate Total Miles Traveled from Vectors of Lat/Lon
Add Column with Counts of Another
Count Number of Non-Na Values for Every Column in a Dataframe
How to Remove + (Plus Sign) from String in R
Keyed Lookup on Data.Table Without 'With'
Selecting a Subset of Columns in a Data.Table
R - What Algorithm Does Geom_Density() Use and How to Extract Points/Equation of Curves