Aggregating all unique values of each column of data frame
Moved from comments:
library(data.table)
dt <- as.data.table(data)
dt[, lapply(.SD, function(x) toString(unique(x))), by = a]
giving:
a b c d
1: 1 apples, oranges 12, 22 Monday
2: 2 apples 45, 67 Tuesday, Wednesday
3: 3 grapefruit 28 Tuesday
Aggregate unique values from multiple columns with pandas GroupBy
Use groupby
and agg
, and aggregate only unique values by calling Series.unique
:
df.astype(str).groupby('prop1').agg(lambda x: ','.join(x.unique()))
prop2 prop3 prop4
prop1
K20 12,1,66 travis,leo 10.0,4.0
L30 3,54,11,10 bob,john 11.2,10.0
df.astype(str).groupby('prop1', sort=False).agg(lambda x: ','.join(x.unique()))
prop2 prop3 prop4
prop1
L30 3,54,11,10 bob,john 11.2,10.0
K20 12,1,66 travis,leo 10.0,4.0
If handling NaNs is important, call fillna
in advance:
import re
df.fillna('').astype(str).groupby('prop1').agg(
lambda x: re.sub(',+', ',', ','.join(x.unique()))
)
prop2 prop3 prop4
prop1
K20 12,1,66 travis,leo 10.0,4.0
L30 3,54,11,10 bob,john 11.2,10.0
How to get unique values from multiple columns in a pandas groupby
You can do it with apply
:
import numpy as np
g = df.groupby('c')['l1','l2'].apply(lambda x: list(np.unique(x)))
Aggregate in R based on unique values in column
We need to get the length
on the unique
elements
aggregate(store~item, data = df,FUN = function(x) length(unique(x)))
Or if we are using dplyr
library(dplyr)
df %>%
group_by(item) %>%
summarise(storen = n_distinct(store))
Pandas, for each unique value in one column, get unique values in another column
Here are two strategies to do it. No doubt, there are other ways.
Assuming your dataframe looks something like this (obviously with more columns):
df = pd.DataFrame({'author':['a', 'a', 'b'], 'subreddit':['sr1', 'sr2', 'sr2']})
>>> df
author subreddit
0 a sr1
1 a sr2
2 b sr2
...
SOLUTION 1: groupby
More straightforward than solution 2, and similar to your first attempt:
group = df.groupby('author')
df2 = group.apply(lambda x: x['subreddit'].unique())
# Alternatively, same thing as a one liner:
# df2 = df.groupby('author').apply(lambda x: x['subreddit'].unique())
Result:
>>> df2
author
a [sr1, sr2]
b [sr2]
The author is the index, and the single column is the list of all subreddits they are active in (this is how I interpreted how you wanted your output, according to your description).
If you wanted the subreddits each in a separate column, which might be more useable, depending on what you want to do with it, you could just do this after:
df2 = df2.apply(pd.Series)
Result:
>>> df2
0 1
author
a sr1 sr2
b sr2 NaN
Solution 2: Iterate through dataframe
you can make a new dataframe with all unique authors:
df2 = pd.DataFrame({'author':df.author.unique()})
And then just get the list of all unique subreddits they are active in, assigning it to a new column:
df2['subreddits'] = [list(set(df['subreddit'].loc[df['author'] == x['author']]))
for _, x in df2.iterrows()]
This gives you this:
>>> df2
author subreddits
0 a [sr2, sr1]
1 b [sr2]
Grouping unique column values to get average of each unique value in pandas dataframe column
Try this, maybe -
df_Paid['Days'] = df_Paid['Days'].astype(int)
df_Paid.groupby(['Charge Code'])['Days'].mean()
list unique values for each column in a data frame
Let dat
be your data frame after reading in the csv
file, you can do
ulst <- lapply(dat, unique)
If you further want to know the number of unique values for each column, do
k <- lengths(ulst)
show unique values for each column
The number of unique values are different in those two columns. You need to reduce them to a single element.
df2 <- map(df, ~str_c(unique(.x),collapse = ",")) %>%
bind_rows() %>%
gather(key = col_name, value = col_unique)
> df2
# A tibble: 2 x 2
col_name col_class
<chr> <chr>
1 v1 1,2,3
2 v2 a,b
Related Topics
Changing Format of Some Axis Labels in Ggplot2 According to Condition
Dealing with Spaces and "Weird" Characters in Column Names with Dplyr::Rename()
How to Calculate Mean of All Columns, by Group
Ggplot2: Using Gtable to Move Strip Labels to Top of Panel for Facet_Grid
R Shiny, How to Make Datatable React to Checkboxes in Datatable
How to Change Angle of Line in Customized Legend in Ggplot2
Create Url Hyperlink in R Shiny
Generate All Possible Permutations (Or N-Tuples)
Number Format, Writing 1E-5 Instead of 0.00001
Assign Names to Data Frame with As.Data.Frame Function
Numbers as Column Names of Data Frames
Union of Intersecting Vectors in a List in R
Using R to Download Zipped Data File, Extract, and Import .Csv
Convert Quarter/Year Format to a Date
Si Prefixes in Ggplot2 Axis Labels